Intelligent Document Processing and Embedding Workflow
detail.loadingPreview
Automate the extraction, processing, and embedding of documents from Supabase storage. This workflow ensures your documents are prepared for advanced AI applications by chunking, cleaning, and generating vector embeddings.
About This Workflow
This n8n workflow intelligently manages your documents stored in Supabase. It begins by fetching all files from your private Supabase storage, then filters out any placeholder files. For each valid document, it downloads the content, extracts text, and prepares it for AI processing using a default data loader. The text is then systematically split into manageable chunks with overlap to preserve context. Finally, these processed text chunks are embedded using OpenAI's embedding models, creating vector representations ready for advanced search, analysis, or retrieval augmented generation (RAG) systems. The workflow also creates a record in your Supabase 'files' table to track processed documents.
Key Features
- Automated Document Ingestion: Seamlessly pull documents from Supabase storage.
- Intelligent Text Extraction: Extracts text content from various file formats.
- Context-Aware Text Splitting: Divides documents into optimal chunks for AI processing.
- Powerful AI Embeddings: Leverages OpenAI for high-quality vector representations.
- Data Synchronization: Updates your Supabase 'files' table with processing status.
How To Use
- Connect Supabase: Configure the 'Supabase account' credential with your Supabase API key and URL.
- Configure OpenAI: Set up your 'OpenAi account' credential for embedding generation.
- Set Supabase Storage URL: Ensure the 'Get All files' and 'Download' nodes use your correct Supabase storage endpoint.
- Define Storage Path: Adjust the
prefixin the 'Get All files' node if your files are in a sub-directory. - Configure Text Splitting: Adjust
chunkSizeandchunkOverlapin the 'Recursive Character Text Splitter' node based on your document content. - Map Metadata: In the 'Default Data Loader' node, ensure the
file_idmetadata correctly maps to the file's ID from Supabase. - Trigger Workflow: Use the 'When clicking ‘Test workflow’' node for manual testing or integrate with other triggers for automated execution.
Apps Used
Workflow JSON
{
"id": "a5dafbe9-d36d-4953-8e4f-5db56e2f8d17",
"name": "Intelligent Document Processing and Embedding Workflow",
"nodes": 15,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: a5dafbe9-d36d...
About the Author
SaaS_Connector
Integration Guru
Connecting CRM, Notion, and Slack to automate your life.
Statistics
Related Workflows
Discover more workflows you might like
Google Sheets to Icypeas: Automated Bulk Domain Scanning
This workflow streamlines the process of performing bulk domain scans by integrating your Google Sheets data directly with the Icypeas platform. Automate the submission of company names from your spreadsheet to Icypeas for comprehensive domain information, saving valuable time and effort.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.