AI-Powered News & Blog Content Archiver
detail.loadingPreview
This workflow automatically fetches news and blog articles from specified RSS feeds, leverages Firecrawl.dev for AI-powered content scraping to extract only the main content in Markdown, and then archives these clean articles directly into Google Drive. It’s an ideal solution for efficient content curation and intelligence.
About This Workflow
The "The Recap AI - News Scraping Pipeline" is a sophisticated n8n workflow designed for automated content acquisition and intelligent archiving. It periodically triggers to pull the latest articles from multiple RSS feeds, including general news sources and specialized blogs like OpenAI. The core intelligence lies in its integration with Firecrawl.dev, an AI-powered web scraper. This API precisely identifies and extracts only the main body content of articles, stripping out extraneous elements like ads, navigation, and headers, and converts it into clean Markdown format. Finally, these meticulously scraped articles are converted into .md files and seamlessly uploaded to a designated Google Drive folder, creating a well-organized, AI-curated content archive.
Key Features
- Automated Multi-Source Content Fetching: Continuously pulls new articles from various RSS feeds (e.g., Google News, specific blogs) on a configurable schedule.
- AI-Powered Smart Content Scraping: Utilizes Firecrawl.dev to intelligently identify and extract only the main content of web pages, ensuring clean, article-focused data.
- Markdown Conversion & Archiving: Converts scraped article content into standard Markdown files and uploads them directly to Google Drive for easy access and organization.
- Customizable Content Filtering: Excludes specified HTML tags (iframes, nav, header, footer) and focuses on main content, enhanced by AI prompts for precise extraction.
- Robust Error Handling: Includes retry mechanisms for web scraping requests, ensuring workflow resilience against temporary issues.
How To Use
- Configure RSS Feeds: Update the
httpRequestnodes (fetch_google_news_feed,fetch_blog_open_ai_feed, etc.) with your desiredrss.appJSON feed URLs. You can add morescheduleTriggerandhttpRequestpairs for additional sources. - Set Up Firecrawl.dev Credentials: For the
scrape_urlnode, create a new "Generic Credential Type" (HTTP Header Auth) in n8n and input your Firecrawl.dev API key. This is required for authenticating with the scraping API. - Customize Scraping Parameters: In the
scrape_urlnode's JSON body, adjustexcludeTags,onlyMainContent, and especially thejsonOptions.promptto fine-tune how the AI extracts content. - Configure Google Drive Credentials: For the
upload_markdownnode, set up your Google Drive OAuth2 API credentials in n8n. - Specify Google Drive Destination: In the
upload_markdownnode, select the specific Google DrivefolderIdwhere you want your markdown articles to be archived. - Adjust Scheduling: Modify the
rule.intervalin thescheduleTriggernodes to define how frequently the workflow should run and fetch new articles.
Apps Used
Workflow JSON
{
"id": "779eb124-8c60-4ef8-9292-5183341a82a8",
"name": "AI-Powered News & Blog Content Archiver",
"nodes": 26,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 779eb124-8c60...
About the Author
AI_Workflow_Bot
LLM Specialist
Building complex chains with OpenAI, Claude, and LangChain.
Statistics
Related Workflows
Discover more workflows you might like
Google Sheets to Icypeas: Automated Bulk Domain Scanning
This workflow streamlines the process of performing bulk domain scans by integrating your Google Sheets data directly with the Icypeas platform. Automate the submission of company names from your spreadsheet to Icypeas for comprehensive domain information, saving valuable time and effort.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.