Vision-Based AI Agent Scraper with Google Sheets, ScrapingBee, and Gemini
detail.loadingPreview
Scrape structured data from web pages using a vision-based AI agent, leveraging screenshots for primary extraction and falling back to HTML when necessary. Integrates with Google Sheets for data management and Gemini-1.5-Pro for advanced visual analysis.
About This Workflow
This workflow automates the extraction of structured data from web pages by employing a sophisticated vision-based AI agent. It prioritizes using screenshots for data retrieval, allowing the agent to 'see' and interpret visual content. When direct image analysis is insufficient or ambiguous, the agent seamlessly transitions to fetching and parsing the HTML content as a fallback mechanism. This dual approach ensures robust and accurate data collection.
The workflow is intricately designed to work with:
- Google Sheets: For managing the list of URLs to be scraped and for storing the extracted structured data in a 'Results' sheet.
- ScrapingBee: To capture high-resolution, full-page screenshots of target URLs and to retrieve the raw HTML content when needed.
- Gemini-1.5-Pro: The AI model chosen for its superior capabilities in visual understanding and analysis, enabling effective data extraction from screenshots.
- Structured Output Parser: To format the AI-generated data into a predefined JSON structure, making it readily usable.
This template is particularly useful for e-commerce product scraping but can be adapted for various data extraction needs.
Key Features
- Vision-Based Data Extraction: Primarily uses screenshots for data extraction, powered by Gemini-1.5-Pro.
- Fallback HTML Scraping: Automatically retrieves and parses HTML content when screenshot analysis is insufficient.
- Google Sheets Integration: Manages input URLs and outputs structured results.
- ScrapingBee Utilization: Captures screenshots and fetches HTML.
- Structured Output: Organizes extracted data into a predefined JSON format.
- Token Efficiency: Converts HTML to Markdown for AI processing to reduce token usage.
- Configurable JSON Schema: Allows customization of the output structure via the
Structured Output Parsernode.
How To Use
- Setup Credentials: Ensure your Google Sheets API and Google Gemini API credentials are correctly configured in n8n.
- Configure Google Sheets:
- Create a Google Sheet with two sheets: one for a list of URLs (e.g., 'List of URLs to scrape') and another for the results (e.g., 'Results').
- Update the
Google Sheets - Get list of URLsnode with your Google Sheet's document ID and the correct sheet name for URLs. - Update the
Google Sheets - Create Rowsnode with your Google Sheet's document ID and the correct sheet name for results.
- Configure ScrapingBee API Key: Replace
<your_scrapingbee_apikey>in theScrapingBee- Get page HTMLandScrapingBee - Get page screenshotnodes with your actual ScrapingBee API key. - Define Output Schema: Adjust the
jsonSchemaExamplein theStructured Output Parsernode to match the specific data fields you wish to extract (e.g.,product_title,product_price,product_brand,promo,promo_percentage). The current schema is tailored for e-commerce. - Customize AI System Prompt: Modify the
systemMessagein theVision-based Scraping Agentnode to precisely instruct the AI on what data to extract and how to process it (e.g., desired product attributes, specific promotional details). - Populate URLs: Add the URLs you want to scrape into the designated Google Sheet.
- Run the Workflow: Trigger the workflow manually using the 'Test workflow' button. The extracted data will be written to the 'Results' sheet in your Google Sheet.
Apps Used
Workflow JSON
{
"id": "08fe9359-1cb8-49df-be05-99fdf68dac11",
"name": "Vision-Based AI Agent Scraper with Google Sheets, ScrapingBee, and Gemini",
"nodes": 20,
"category": "Marketing",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 08fe9359-1cb8...
About the Author
Free n8n Workflows Official
System Admin
The official repository for verified enterprise-grade workflows.
Statistics
Related Workflows
Discover more workflows you might like
Automated Multi-Platform Social Media Publisher
Streamline your social media content creation and publishing with this n8n workflow. Simply fill out a web form with your caption, media (image or video), and target platforms, and let n8n automate the posting process across multiple social networks.
WhatsApp AI Assistant: LLaMA 4 & Google Search for Real-Time Insights
Instantly deploy a smart AI assistant on WhatsApp, powered by Groq's lightning-fast LLaMA 4 model. This workflow enables real-time conversations, remembers context, and provides up-to-date answers by integrating live Google Search results.
AI-Powered On-Page SEO Audit & Report Automation
Instantly generate comprehensive on-page SEO technical and content audits for any website URL. This AI-powered workflow automates the entire process, from scraping the page to delivering a detailed report directly to your inbox, empowering you to optimize for better search rankings and user engagement.