Generate AI-Ready LLM Files from Screaming Frog Data
detail.loadingPreview
Automate the creation of `llms.txt` files optimized for Large Language Models by processing your Screaming Frog website crawl data. This workflow intelligently extracts and filters crucial page information, preparing it for AI consumption.
About This Workflow
This n8n workflow streamlines the process of transforming Screaming Frog website crawl data into a format readily usable by Large Language Models (LLMs). It begins by accepting a Screaming Frog export (preferably internal_html.csv or internal_all.csv) via a user-friendly form, along with basic website details. The workflow then meticulously extracts essential data points such as URL, title, meta description, status code, indexability, content type, and word count. Advanced filtering ensures only valuable, indexable HTML content is retained, significantly enhancing the quality of your llms.txt file. This prepared data is crucial for training and fine-tuning AI models, enabling them to understand and generate content based on your website's structure and topical relevance.
Key Features
- Automated Data Extraction: Effortlessly pull key SEO data (URL, title, description, etc.) from Screaming Frog CSV exports.
- Intelligent Filtering: Automatically filters for indexable, text/html content, ensuring high-quality data for LLMs.
- Multi-language Support: Handles different language variations of Screaming Frog column headers.
- Customizable Data Selection: Easily modify filters to include or exclude specific content types, URL paths, or meta descriptions.
- AI-Ready Output: Generates a clean
llms.txtfile perfect for LLM training and analysis.
How To Use
- Upload Screaming Frog Data: Use the initial form node to upload your Screaming Frog
internal_html.csvorinternal_all.csvfile. Provide the website name and a short description. - Data Extraction: The workflow automatically parses the uploaded CSV, extracting crucial fields like URL, Title, and Meta Description.
- Set Useful Fields: Key data points are standardized and prepared for subsequent processing.
- Filter URLs: Configure the filter node to retain only indexable
text/htmlpages with a200status code. You can add further filters based on word count, URL path, or the presence of meta descriptions. - Optional AI Classification: The
Text Classifiernode (currently disabled) can be enabled to categorize pages as 'useful_content' or 'other_content' using an AI model, allowing for even finer tuning. - AI Model Integration (Optional): Connect to an
OpenAI Chat Modelto further process or enrich the data if needed. - Final Output: The processed data is ready to be saved or exported as an
llms.txtfile for your AI projects. The 'No Operation' node signifies the end of the primary data preparation pipeline.
Apps Used
Workflow JSON
{
"id": "8e87f8f3-07bd-44c6-81be-d71a36fd7c78",
"name": "Generate AI-Ready LLM Files from Screaming Frog Data",
"nodes": 24,
"category": "DevOps",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 8e87f8f3-07bd...
About the Author
Crypto_Watcher
Web3 Developer
Automated trading bots and blockchain monitoring workflows.
Statistics
Related Workflows
Discover more workflows you might like
Effortless Bug Reporting: Slack Slash Command to Linear Issue
Streamline your bug reporting process by instantly creating Linear issues directly from Slack using a simple slash command. This workflow enhances team collaboration by providing immediate feedback and a structured approach to logging defects, saving valuable time for development and QA teams.
Automated PR Merged QA Notifications
Streamline your QA process with this automated workflow that notifies your team upon successful Pull Request merges. Leverage AI and vector stores to enrich notifications and ensure seamless integration into your development pipeline.
Visualize Your n8n Workflows: Interactive Dashboard with Mermaid.js
Gain unparalleled visibility into your n8n automation landscape. This workflow transforms your n8n instance into a dynamic, interactive dashboard, leveraging Mermaid.js to visualize all your workflows in one accessible place.