Document Processing and Metadata Generation
detail.loadingPreview
Automates PDF processing, extracting structured metadata in multiple languages using AI.
About This Workflow
This workflow leverages Chunkr.ai for document parsing and an AI agent (Google Gemini) to generate structured metadata, primarily a Table of Contents, from PDF documents. It supports both English and Chinese output and can be triggered manually or via an external workflow. The process includes downloading the PDF, chunking it for analysis, using AI to generate a structured Table of Contents, and providing fallback mechanisms if the AI analysis is insufficient. The final output can be used to create structured data representations of document content.
Key Features
- Automated PDF Processing: Handles PDF documents for content extraction.
- AI-Powered Metadata Generation: Uses Google Gemini to create structured data like a Table of Contents.
- Multi-Language Support: Generates output in both English (en) and Chinese (zh).
- Flexible Triggering: Can be executed manually or triggered by other n8n workflows.
- Chunking and Analysis: Utilizes Chunkr.ai for efficient document segmentation and OCR.
- Structured Output: Employs Langchain output parsers to ensure data conforms to a defined schema.
- Fallback Mechanisms: Includes logic to extract section headers if AI fails to generate a clear ToC.
- Error Handling: Includes nodes to stop and report errors.
How To Use
- Prerequisites: Obtain a Chunkr.ai API key and configure Google Gemini credentials in n8n.
- Triggering:
- Manual: Click 'Execute workflow' on the
When clicking ‘Execute workflow’node. - Automated: Provide a
URLof a PDF document to theWhen Executed by Another Workflownode.
- Manual: Click 'Execute workflow' on the
- Configuration: Ensure the
Google Gemini Chat Modelnodes are correctly configured with your API key and desired model. TheStructured Output Parsernode uses a JSON schema to define the output structure for the Table of Contents. - Execution: The workflow will download the PDF, send it to Chunkr.ai for processing, and then use the AI model to extract and structure the Table of Contents.
- Output: The generated structured metadata (in JSON format) will be available in subsequent nodes for further processing or storage. The workflow aims to produce output in both English and Chinese.
Apps Used
Workflow JSON
{
"id": "1664832b-696a-463b-9280-d67fb7ee0a0a",
"name": "Document Processing and Metadata Generation",
"nodes": 8,
"category": "Document Processing",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 1664832b-696a...
About the Author
Crypto_Watcher
Web3 Developer
Automated trading bots and blockchain monitoring workflows.
Statistics
Related Workflows
Discover more workflows you might like
Convert DOCX to PDF with Metadata Generation
This workflow converts a DOCX file to PDF and aims to generate structured metadata in English and Chinese. Currently, it focuses on the conversion and saving of the PDF.