Document Processing and Metadata Generation

Name: Document Processing and Metadata Generation
Rating: 5 (5 reviews)
Author: Free N8N

Beginner

8 nodes connected

detail.loadingPreview

Free N8N Temples

148 views

47 downloads

Document ProcessingAIChunkr.aiGoogle GeminiLangchainMetadataMulti-languagePDFTable of Contents

Automates PDF processing, extracting structured metadata in multiple languages using AI.

About This Workflow

This workflow leverages Chunkr.ai for document parsing and an AI agent (Google Gemini) to generate structured metadata, primarily a Table of Contents, from PDF documents. It supports both English and Chinese output and can be triggered manually or via an external workflow. The process includes downloading the PDF, chunking it for analysis, using AI to generate a structured Table of Contents, and providing fallback mechanisms if the AI analysis is insufficient. The final output can be used to create structured data representations of document content.

Key Features

Automated PDF Processing: Handles PDF documents for content extraction.
AI-Powered Metadata Generation: Uses Google Gemini to create structured data like a Table of Contents.
Multi-Language Support: Generates output in both English (en) and Chinese (zh).
Flexible Triggering: Can be executed manually or triggered by other n8n workflows.
Chunking and Analysis: Utilizes Chunkr.ai for efficient document segmentation and OCR.
Structured Output: Employs Langchain output parsers to ensure data conforms to a defined schema.
Fallback Mechanisms: Includes logic to extract section headers if AI fails to generate a clear ToC.
Error Handling: Includes nodes to stop and report errors.

How To Use

Prerequisites: Obtain a Chunkr.ai API key and configure Google Gemini credentials in n8n.
Triggering:
- Manual: Click 'Execute workflow' on the When clicking ‘Execute workflow’ node.
- Automated: Provide a URL of a PDF document to the When Executed by Another Workflow node.
Configuration: Ensure the Google Gemini Chat Model nodes are correctly configured with your API key and desired model. The Structured Output Parser node uses a JSON schema to define the output structure for the Table of Contents.
Execution: The workflow will download the PDF, send it to Chunkr.ai for processing, and then use the AI model to extract and structure the Table of Contents.
Output: The generated structured metadata (in JSON format) will be available in subsequent nodes for further processing or storage. The workflow aims to produce output in both English and Chinese.

Apps Used

Chunkr.ai

Google Gemini

Langchain

Metadata

Multi-language

PDF

Table of Contents

Workflow JSON

{
  "id": "1664832b-696a-463b-9280-d67fb7ee0a0a",
  "name": "Document Processing and Metadata Generation",
  "nodes": 8,
  "category": "Document Processing",
  "status": "active",
  "version": "1.0.0"
}

Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.

Get This Workflow

ID: 1664832b-696a...

About the Author

Crypto_Watcher

Web3 Developer

Automated trading bots and blockchain monitoring workflows.

Statistics

Downloads47

Rating5/5

Get Custom Workflow

Need a specific automation? Our experts can build it for you.

Trusted by top companies
7+ years experience

Related Workflows

Discover more workflows you might like

Advanced

Document ProcessingDOCXPDFConvertAPI

Convert DOCX to PDF with Metadata Generation

This workflow converts a DOCX file to PDF and aims to generate structured metadata in English and Chinese. Currently, it focuses on the conversion and saving of the PDF.

28 nodes

105

View Workflow

Prerequisites: Obtain a Chunkr.ai API key and configure Google Gemini credentials in n8n.
Triggering:
- Manual: Click 'Execute workflow' on the When clicking ‘Execute workflow’ node.
- Automated: Provide a URL of a PDF document to the When Executed by Another Workflow node.
Configuration: Ensure the Google Gemini Chat Model nodes are correctly configured with your API key and desired model. The Structured Output Parser node uses a JSON schema to define the output structure for the Table of Contents.
Execution: The workflow will download the PDF, send it to Chunkr.ai for processing, and then use the AI model to extract and structure the Table of Contents.
Output: The generated structured metadata (in JSON format) will be available in subsequent nodes for further processing or storage. The workflow aims to produce output in both English and Chinese.

{ "id": "1664832b-696a-463b-9280-d67fb7ee0a0a", "name": "Document Processing and Metadata Generation", "nodes": 8, "category": "Document Processing", "status": "active", "version": "1.0.0" }