AI-Powered Local Document Ingestion for RAG Knowledge Bases
detail.loadingPreview
This workflow automates the ingestion, summarization, and vectorization of local documents. It monitors a specified folder for new files, extracts their content, generates AI summaries, and creates vector embeddings, storing them in Qdrant for Retrieval Augmented Generation (RAG).
About This Workflow
Unlock the power of your local documents with this AI-driven n8n workflow. Designed for seamless automation, it continuously monitors a designated local folder for new file additions. Upon detection, it intelligently extracts document content, handles various file types (e.g., HTML to Markdown conversion), and leverages Ollama to generate concise summaries. Crucially, the workflow then processes the content by splitting it into manageable chunks, creating sophisticated vector embeddings using Ollama's embedding models. Finally, these embeddings are securely stored in a Qdrant vector database, laying the foundation for advanced Retrieval Augmented Generation (RAG) applications and intelligent knowledge retrieval from your data.
Key Features
- Automated Local File Monitoring: Triggers instantly when new files appear in a specified local folder.
- Intelligent Content Extraction: Extracts text from various document types, including HTML to Markdown conversion.
- AI-Powered Summarization: Utilizes Ollama and a summarization chain to generate concise document summaries.
- Advanced Text Chunking & Embedding: Splits documents into optimal chunks and creates vector embeddings using Ollama for efficient retrieval.
- Qdrant Vector Database Integration: Seamlessly stores document embeddings in Qdrant, ready for RAG applications.
How To Use
- Configure Local File Trigger: Specify the
pathto the local folder you want to monitor (e.g.,C:\your_documents). Ensureeventsis set toaddfor new files. - Set Up Ollama Credentials: Provide your Ollama API credentials for both the
Embeddings OllamaandOllama Summarizernodes to connect to your local Ollama instance. - Configure Qdrant Vector Store: Enter your Qdrant API credentials and confirm the
qdrantCollectionname (e.g.,test_rag) where embeddings will be stored. - Adjust Text Processing (Optional): Modify
Recursive Character Text Splitterparameters likechunkSize(default 40) andchunkOverlap(default 10) to optimize chunking for your specific document types. - Review File Type Handling: The workflow includes a
Switchnode toGet FileType. Extend or modify this node if you need to handle additional file types beyond the default text extraction and HTML conversion.
Apps Used
Workflow JSON
{
"id": "1e1ed789-094c-4b15-95a2-c8a2ca50a818",
"name": "AI-Powered Local Document Ingestion for RAG Knowledge Bases",
"nodes": 25,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 1e1ed789-094c...
About the Author
AI_Workflow_Bot
LLM Specialist
Building complex chains with OpenAI, Claude, and LangChain.
Statistics
Related Workflows
Discover more workflows you might like
Google Sheets to Icypeas: Automated Bulk Domain Scanning
This workflow streamlines the process of performing bulk domain scans by integrating your Google Sheets data directly with the Icypeas platform. Automate the submission of company names from your spreadsheet to Icypeas for comprehensive domain information, saving valuable time and effort.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.