Convert Webpage HTML to Markdown and Extract Links using Firecrawl

Name: Convert Webpage HTML to Markdown and Extract Links using Firecrawl
Rating: 5 (6 reviews)
Author: Free N8N

Community Verified

Beginner

0 nodes connected

detail.loadingPreview

Free N8N Temples

50 views

0 downloads

PDF and Document Processingautomationcontent processingfirecrawllink extractionmarkdownseoweb scraping

This workflow leverages the Firecrawl.dev API to scrape webpages, converting their HTML content into Markdown format and extracting all associated links. It includes batch processing and rate limiting for efficient API usage.

🚀Ready to Deploy This Workflow?

⚡Deploy on Zeabur 🎁Get $200 Credit on DigitalOcean

About This Workflow

Overview

This n8n workflow automates the process of transforming raw HTML from web pages into clean Markdown content, while also extracting all hyperlinks present on those pages. It utilizes the Firecrawl.dev API, a powerful tool for web scraping and data extraction. The workflow is designed to handle multiple URLs, process them in batches to manage server memory and API rate limits, and store the extracted Markdown content and links. This is particularly useful for preparing web content for AI analysis, content management systems, or further processing.

Key Features

Scrapes web pages using the Firecrawl.dev API.
Converts HTML content to Markdown format.
Extracts all links from the scraped web pages.
Processes URLs in batches to manage resource limits.
Includes a wait node to respect API rate limits (e.g., 10 requests per minute).
Allows customization of input data source (e.g., a database or array of URLs).
Provides options to output the processed data to a custom data source.

How To Use

Obtain an API key from Firecrawl.dev.
Configure the 'Retrieve Page Markdown and Links' HTTP Request node: update the Authorization header with your Firecrawl API key.
Define your input URLs: either by connecting your own data source to the 'Get urls from own data source' node or by updating the Page array in the 'Example fields from data source' node.
Adjust the batch size in the '10 items at a time' node if needed, considering server memory and API limits.
Configure the 'Wait' node to align with Firecrawl's API rate limits (e.g., 45 seconds for 10 requests per minute).
Connect the 'Markdown data and Links' node to your desired output destination, such as an Airtable or another database node.

Apps Used

automation

content processing

firecrawl

link extraction

markdown

seo

web scraping

Workflow JSON

{
  "id": "55152ecc-ba97-4170-ac30-e0bade6a6176",
  "name": "Convert Webpage HTML to Markdown and Extract Links using Firecrawl",
  "nodes": 0,
  "category": "PDF and Document Processing",
  "status": "active",
  "version": "1.0.0"
}

Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.

Get This Workflow

ID: 55152ecc-ba97...

About the Author

N8N_Community_Pick

Curator

Hand-picked high quality workflows from the global community.

Statistics

Downloads0

Rating

6/5

Verification Info

Community Verified

This workflow has been verified by the community

📄

Source

awesome-n8n-templates

Get Custom Workflow

Need a specific automation? Our experts can build it for you.

Trusted by top companies
7+ years experience

Related Workflows

Discover more workflows you might like

Browse All n8n Workflows

Beginner✓ Verified

PDF and Document ProcessingaudiotranscriptionOpenAI

Automated Audio Transcription and Summarization from Google Drive to Notion

Automatically transcribe audio files from Google Drive using OpenAI Whisper, then summarize and send structured data to Notion.

0 nodes

150

View Workflow

Beginner✓ Verified

PDF and Document Processinglangchainopenaipinecone

Chat with Documents Using LangChain and Pinecone

Ingest documents from Google Drive, vectorize them with OpenAI, store in Pinecone, and enable chat interactions with LangChain nodes. This workflow automates the process of creating a searchable knowledge base.

0 nodes

143

View Workflow

Beginner✓ Verified

PDF and Document Processinggoogle driveautomationpii removal

Automated PII Removal from CSV Files on Google Drive using OpenAI

This workflow automatically detects new CSV files in a Google Drive folder, uses OpenAI to identify and remove Personally Identifiable Information (PII) columns, and uploads the cleaned file back to Google Drive. It leverages Google Drive Trigger, Google Drive, OpenAI, and code nodes for robust data sanitization.

0 nodes

102

View Workflow

Browse All n8n Workflows