Automated Document Processing and Knowledge Base Creation
detail.loadingPreview
This workflow automates the ingestion of documents from Supabase storage, extracts text, splits it into manageable chunks, and prepares it for AI embedding. It efficiently builds a searchable knowledge base from your stored files.
About This Workflow
This n8n workflow is designed to streamline the process of transforming raw documents stored in Supabase into a structured format suitable for AI-powered applications. It begins by fetching all files from a specified Supabase storage bucket. For each file, it checks if it's already processed to avoid duplication. If not, the workflow downloads the file, extracts text content (initially supporting PDF), and then loads this data using a default loader. This data is then split into smaller, contextually relevant chunks using a recursive text splitter. These chunks are then prepared for embedding by associating them with their original file metadata. The final step prepares the data for vectorization, enabling efficient searching and retrieval within a knowledge base.
Key Features
- Automated File Ingestion: Connects directly to Supabase storage to automatically discover and process files.
- Duplicate Prevention: Includes logic to check for and skip already processed files, ensuring data integrity.
- Content Extraction: Capable of extracting text content from various document types, starting with PDF.
- Intelligent Text Splitting: Utilizes a recursive character text splitter to break down large documents into optimal chunks for AI processing.
- Metadata Association: Preserves essential file metadata throughout the processing pipeline for accurate context.
How To Use
- Configure Supabase Connection: Ensure your Supabase API credentials are set up correctly in n8n.
- Set 'Get All Files' Node: Configure the 'Get All files' node ( httpRequest) to point to your Supabase storage URL and authenticate using your Supabase API key. The JSON body should be set to query your desired bucket.
- Set 'Get All Files' Node (Supabase): Configure the 'Get All Files' node (Supabase) to fetch existing records from your 'files' table to check for already processed items.
- Configure 'Loop Over Items' Node: The 'Loop Over Items' node is set to process one file at a time (
batchSize: 1). - Configure 'If' Node: The 'If' node is crucial for duplicate checking. It verifies if the file's
storage_idalready exists in the aggregated data from the 'files' table and also checks that the file name is not.emptyFolderPlaceholder. - Configure 'Download' Node: The 'Download' node (httpRequest) should be configured with the URL to download your files from Supabase storage, using the file name from the previous step.
- Configure 'Extract Document PDF' Node: Set this node to extract text content from the downloaded PDF files.
- Configure 'Default Data Loader' Node: This node takes the extracted text and associates it with metadata, including a
file_idderived from the Supabase file ID. - Configure 'Embeddings OpenAI' Node: Connect your OpenAI API credentials and select an appropriate embedding model (e.g.,
text-embedding-3-small). - Configure 'Recursive Character Text Splitter' Node: Adjust
chunkSizeandchunkOverlapto optimize the splitting of your document text based on your needs. - Configure 'Create File record2' Node: Set up this node to write the processed file's name and
storage_idto your 'files' table in Supabase, marking it as processed. Ensure this node is placed within the 'true' branch of the 'If' node.
Apps Used
Workflow JSON
{
"id": "2941d0ac-66bf-42d6-86dc-e6e587250e4f",
"name": "Automated Document Processing and Knowledge Base Creation",
"nodes": 11,
"category": "DevOps",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 2941d0ac-66bf...
About the Author
SaaS_Connector
Integration Guru
Connecting CRM, Notion, and Slack to automate your life.
Statistics
Related Workflows
Discover more workflows you might like
Automated PR Merged QA Notifications
Streamline your QA process with this automated workflow that notifies your team upon successful Pull Request merges. Leverage AI and vector stores to enrich notifications and ensure seamless integration into your development pipeline.
Automate Qualys Report Generation and Retrieval
Streamline your Qualys security reporting by automating the generation and retrieval of reports. This workflow ensures timely access to crucial security data without manual intervention.
Visualize Your n8n Workflows: Interactive Dashboard with Mermaid.js
Gain unparalleled visibility into your n8n automation landscape. This workflow transforms your n8n instance into a dynamic, interactive dashboard, leveraging Mermaid.js to visualize all your workflows in one accessible place.