Unlock Web Content & Links with This n8n Automation
detail.loadingPreview
Automate the extraction of text content and all internal/external links from any website. This n8n workflow simplifies web data retrieval for research, analysis, or content aggregation.
About This Workflow
This n8n workflow provides a powerful solution for extracting valuable data from websites. The text_retrieval_tool allows you to capture the entire textual content of a webpage, converting it from HTML to Markdown for easier processing. Simultaneously, the url_retrieval_tool systematically scans a website to identify and collect all associated URLs, both internal and external. It intelligently handles relative and absolute links, removes duplicates, and filters out invalid entries, ensuring you receive a clean and comprehensive list of all discoverable links. This dual-purpose automation is ideal for market research, competitive analysis, content auditing, or building comprehensive site maps.
Key Features
- Full Text Extraction: Retrieves all textual content from a given website URL.
- HTML to Markdown Conversion: Converts raw HTML content into a more usable Markdown format.
- Comprehensive URL Discovery: Identifies and collects all internal and external links.
- Duplicate and Invalid URL Filtering: Ensures a clean dataset by removing redundancies and non-functional links.
- Relative Link Resolution: Correctly resolves relative URLs into absolute links.
How To Use
- Configure Text Retrieval: In the
text_retrieval_toolnode, ensure thedescriptionclearly defines its purpose. TheworkflowJsonwill handle the actual fetching and conversion. - Configure URL Retrieval: Similarly, set up the
url_retrieval_toolnode with an accuratedescription. The includedworkflowJsonmanages the extraction, cleaning, and resolution of URLs. - Trigger the Workflow: Both tools are designed to be triggered manually. When you execute the workflow, provide the full website URL as the query parameter.
- Review Outputs: The results will be available in the output of each respective tool node, providing you with the extracted text (in Markdown) and a list of clean, valid URLs.
Apps Used
Workflow JSON
{
"id": "c72ae5ee-3f8e-413c-b8c6-28b1d22c5cdb",
"name": "Unlock Web Content & Links with This n8n Automation",
"nodes": 12,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: c72ae5ee-3f8e...
About the Author
N8N_Community_Pick
Curator
Hand-picked high quality workflows from the global community.
Statistics
Related Workflows
Discover more workflows you might like
Universal CSV to JSON API Converter
Effortlessly transform CSV data into structured JSON with this versatile n8n workflow. Integrate it into any application as a custom API endpoint, supporting various input methods including file uploads and raw text.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.