Extract Website Content and URLs with n8n
detail.loadingPreview
Automate the extraction of text content and all URLs from any given website. This workflow utilizes the 'Text' and 'URLs' n8n tools to retrieve and process website data efficiently.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This n8n workflow is designed to automate the process of extracting valuable information from websites. It leverages two primary tools: 'Text' for retrieving the full textual content of a webpage and 'URLs' for gathering all the hyperlinks present on that page. The workflow is structured to accept a website URL as input, process it to ensure it has a valid protocol, fetch the content, convert it to markdown, and then extract and clean up all associated URLs, removing duplicates and ensuring they are absolute paths.
This is particularly useful for technical SEO analysis, content scraping for RAG (Retrieval-Augmented Generation) models, or any task requiring bulk data extraction from the web.
Key Features
- Extracts the complete text content from a specified website URL.
- Retrieves all internal and external URLs from a given webpage.
- Cleans and formats extracted URLs, ensuring they are absolute paths.
- Removes duplicate URLs to provide a unique list.
- Converts raw HTML content into Markdown format for easier readability and processing.
- Includes error handling for invalid or missing URL protocols.
How To Use
- Input Website URL: In the 'Execute workflow' node (manual trigger), provide the full website URL you want to analyze in the 'query' field.
- Run Workflow: Trigger the workflow manually.
- Review Text Output: The 'Text' tool will output the extracted text content of the website in Markdown format.
- Review URL Output: The 'URLs' tool will output a list of all unique, absolute URLs found on the website, along with their associated titles.
Apps Used
Workflow JSON
{
"id": "65025d82-efc8-4441-b03e-e4ee5296543a",
"name": "Extract Website Content and URLs with n8n",
"nodes": 0,
"category": "Web Scraping & Data Extraction",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 65025d82-efc8...
About the Author
DevOps_Master_X
Infrastructure Expert
Specializing in CI/CD pipelines, Docker, and Kubernetes automations.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
AI-Powered Web Scraping and Content Extraction with Firecrawl
Automate web scraping and extract structured content like articles and images using the Firecrawl API via an HTTP Request node. This workflow handles URL input and processes the response for further use.
Automated Web Scraping with Jina AI and Saving to Google Sheets
Scrape data from a website using Jina AI's fetch capabilities and extract specific information using an Information Extractor node. The extracted data is then saved to a Google Sheets document.
Selenium Ultimate Scraper Workflow for Advanced Web Scraping
Automate advanced web scraping tasks with this n8n workflow. It leverages Selenium for browser automation, extracts specific data using HTML selectors, and processes cookies before sending them to an OpenAI Chat Model for analysis or further action.