Read Sitemap and Filter URLs
detail.loadingPreview
Reads an XML sitemap, converts it to JSON, and filters the URLs based on specified criteria.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This workflow is designed to fetch an XML sitemap from a given URL, parse it into a JSON format, and then filter the resulting URLs based on user-defined conditions. It's particularly useful for extracting specific types of links, such as those pointing to PDF documents.
Key Features
- Fetches sitemap.xml from a specified URL.
- Converts XML sitemap data to a processable JSON format.
- Splits out individual URLs from the sitemap.
- Allows filtering of URLs based on custom conditions (e.g., ending with '.pdf').
- Includes sticky notes for setup guidance and workflow explanation.
How To Use
- Set Sitemap URL: Configure the
Set sitemap URLnode with the URL of the sitemap.xml you wish to process. - Define Filter: Adjust the
Filter URLsnode's conditions to match the desired URL pattern (e.g.,endsWith: .pdf). - Execute Workflow: Run the workflow manually or via a trigger.
Apps Used
Workflow JSON
{
"id": "037a2b08-cfca-498c-86cc-8421699da943",
"name": "Read Sitemap and Filter URLs",
"nodes": 0,
"category": "Data Extraction",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 037a2b08-cfca...
About the Author
Free n8n Workflows Official
System Admin
The official repository for verified enterprise-grade workflows.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
URL Metadata Scraper
Scrapes metadata (title, description, image) from a given URL.
Community Contributed Recipe Fetcher
Fetches and processes recipe data from a community-contributed website.
Extract Personal Data with Mistral NeMo
Extracts personal data from chat messages using a self-hosted Mistral NeMo LLM and a structured JSON output.
Image-Based Data Extraction API using Gemini AI
Extracts structured data from images using Gemini AI via a webhook.
Firecrawl HTML to Markdown (Community)
Convert web page HTML to Markdown and extract links.
Trustpilot Reviews Scraper and Analyzer with DeepSeek and OpenAI
Scrapes Trustpilot reviews, extracts detailed information using DeepSeek, analyzes sentiment with OpenAI, and stores results in Google Sheets.