Automate Web Scraping with Firecrawl and n8n
detail.loadingPreview
Effortlessly extract website content, including titles, descriptions, and links, in Markdown format. This workflow leverages Firecrawl's powerful scraping API with n8n to automate data collection from your specified URLs and save it to your preferred data source.
About This Workflow
This n8n workflow empowers you to automate the process of scraping web pages and extracting valuable content in Markdown format. By integrating with Firecrawl.dev, you can programmatically retrieve not only the main content but also the page's title, description, and any embedded links. The workflow is designed for flexibility, allowing you to connect to your own data source for a list of URLs to scrape. It handles batching for efficient processing, respects API rate limits, and provides clear instructions for authentication and outputting the extracted data to your chosen destination, such as Airtable.
Key Features:
- Automated Web Scraping: Collect data from multiple URLs without manual intervention.
- Rich Content Extraction: Retrieve Markdown content, titles, descriptions, and links.
- Flexible Data Input: Connect to your own data source for a list of URLs.
- Batch Processing: Efficiently handle large numbers of URLs with customizable batch sizes.
- Firecrawl Integration: Utilize the powerful Firecrawl API for robust scraping.
- Configurable Output: Store extracted data in your preferred data destinations.
This workflow is ideal for businesses and individuals looking to streamline content aggregation, competitive analysis, or research projects by automating the tedious task of web scraping.
Key Features
- Comprehensive Data Retrieval: Extracts Markdown content, title, description, and links from web pages.
- Customizable URL Input: Easily define the target URLs by connecting to your own data source.
- Intelligent Batching: Processes URLs in manageable batches to optimize performance and avoid overwhelming the API.
- API Rate Limit Awareness: Includes a 'Wait' node to respect Firecrawl's API limits, ensuring smooth operation.
- Direct Data Output: Facilitates direct integration with your existing data storage solutions (e.g., Airtable) for seamless data management.
How To Use
- Connect to Your Data Source: Configure the
Connect to your own data sourcenode to fetch the list of URLs you wish to scrape. Ensure these URLs are provided in a column namedPage, with each URL on a new row. - Define URLs: The
Example fields from data sourcenode shows how to structure your URL input. If not using a separate data source node, define your URLs as an array within this node. - Split URLs: The
Split out page URLsnode will separate each individual URL for processing. - Set Batch Size: Use the
40 items at a timeand10 at a timenodes to control how many URLs are processed concurrently. Adjust themaxItemsandbatchSizeaccording to your server's memory and Firecrawl's API limits (recommendation is 40 items due to server memory limits). - Retrieve Page Data: The
Retrieve Page Markdown and Linksnode makes the API call to Firecrawl. Ensure you update theHeader Auth parameterwith your actual Firecrawl API key. ThejsonBodyis dynamically set using the scrapedPageURL. - Format Extracted Data: The
Markdown data and Linksnode structures the data returned by Firecrawl, mapping fields like title, description, content, and links. - Wait for API Limits: The
Waitnode is crucial for respecting Firecrawl's API rate limits (10 requests per minute). It adds a delay to prevent exceeding these limits. - Output to Data Source: Configure the
Connect to your own data sourcenode (or a similar output node like Airtable) to store the processed Markdown content, title, description, and links in your desired location.
Apps Used
Workflow JSON
{
"id": "98afbdec-f1a5-4771-a86f-33ace08a8188",
"name": "Automate Web Scraping with Firecrawl and n8n",
"nodes": 8,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 98afbdec-f1a5...
About the Author
Free n8n Workflows Official
System Admin
The official repository for verified enterprise-grade workflows.
Statistics
Related Workflows
Discover more workflows you might like
Universal CSV to JSON API Converter
Effortlessly transform CSV data into structured JSON with this versatile n8n workflow. Integrate it into any application as a custom API endpoint, supporting various input methods including file uploads and raw text.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.