Convert Webpage HTML to Markdown and Extract Links using Firecrawl
detail.loadingPreview
This workflow leverages the Firecrawl.dev API to scrape webpages, converting their HTML content into Markdown format and extracting all associated links. It includes batch processing and rate limiting for efficient API usage.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This n8n workflow automates the process of transforming raw HTML from web pages into clean Markdown content, while also extracting all hyperlinks present on those pages. It utilizes the Firecrawl.dev API, a powerful tool for web scraping and data extraction. The workflow is designed to handle multiple URLs, process them in batches to manage server memory and API rate limits, and store the extracted Markdown content and links. This is particularly useful for preparing web content for AI analysis, content management systems, or further processing.
Key Features
- Scrapes web pages using the Firecrawl.dev API.
- Converts HTML content to Markdown format.
- Extracts all links from the scraped web pages.
- Processes URLs in batches to manage resource limits.
- Includes a wait node to respect API rate limits (e.g., 10 requests per minute).
- Allows customization of input data source (e.g., a database or array of URLs).
- Provides options to output the processed data to a custom data source.
How To Use
- Obtain an API key from Firecrawl.dev.
- Configure the 'Retrieve Page Markdown and Links' HTTP Request node: update the
Authorizationheader with your Firecrawl API key. - Define your input URLs: either by connecting your own data source to the 'Get urls from own data source' node or by updating the
Pagearray in the 'Example fields from data source' node. - Adjust the batch size in the '10 items at a time' node if needed, considering server memory and API limits.
- Configure the 'Wait' node to align with Firecrawl's API rate limits (e.g., 45 seconds for 10 requests per minute).
- Connect the 'Markdown data and Links' node to your desired output destination, such as an Airtable or another database node.
Apps Used
Workflow JSON
{
"id": "55152ecc-ba97-4170-ac30-e0bade6a6176",
"name": "Convert Webpage HTML to Markdown and Extract Links using Firecrawl",
"nodes": 0,
"category": "PDF and Document Processing",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 55152ecc-ba97...
About the Author
N8N_Community_Pick
Curator
Hand-picked high quality workflows from the global community.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
Automated Audio Transcription and Summarization from Google Drive to Notion
Automatically transcribe audio files from Google Drive using OpenAI Whisper, then summarize and send structured data to Notion.
Chat with Documents Using LangChain and Pinecone
Ingest documents from Google Drive, vectorize them with OpenAI, store in Pinecone, and enable chat interactions with LangChain nodes. This workflow automates the process of creating a searchable knowledge base.
Automated PII Removal from CSV Files on Google Drive using OpenAI
This workflow automatically detects new CSV files in a Google Drive folder, uses OpenAI to identify and remove Personally Identifiable Information (PII) columns, and uploads the cleaned file back to Google Drive. It leverages Google Drive Trigger, Google Drive, OpenAI, and code nodes for robust data sanitization.