Vision-Based AI Agent for Smart E-commerce Data Extraction
detail.loadingPreview
This workflow leverages Google Gemini's vision AI and ScrapingBee to intelligently scrape structured product data from e-commerce websites. It streamlines the process of collecting product titles, prices, brands, and promotion details directly from webpage screenshots, storing results in Google Sheets for easy analysis.
About This Workflow
This powerful n8n workflow revolutionizes web data extraction by integrating a cutting-edge Vision-Based AI Agent with traditional scraping tools. By utilizing ScrapingBee to capture full-page website screenshots, the workflow feeds visual information to Google's Gemini 1.5 Pro AI model. Gemini then intelligently analyzes these screenshots, much like a human, to identify and extract specific e-commerce product details such as titles, prices, brands, and promotional information. The extracted data is meticulously structured using a dedicated output parser and is seamlessly managed through Google Sheets, serving both as the source for URLs and the destination for your organized results. This approach ensures robust data collection, even from dynamic or complex websites, making it ideal for competitive analysis, market research, and automated catalog building.
Key Features
- Vision-Based AI Extraction: Harnesses Google Gemini's advanced vision capabilities to "see" and interpret web content from screenshots, enabling robust data extraction from visually rich or dynamic pages.
- Structured Data Output: Automatically parses AI-extracted information into a clean, predefined JSON schema, ensuring consistent and ready-to-use data for analysis.
- Scalable URL Management: Reads a list of target URLs directly from Google Sheets, allowing for easy management and scaling of scraping tasks.
- Full-Page Screenshot Capture: Integrates with ScrapingBee to reliably capture full-page screenshots, providing comprehensive visual context for the AI agent.
- E-commerce Ready: Pre-configured for common e-commerce data points like product title, price, brand, and promotions, making it perfect for competitive intelligence or product catalog aggregation.
How To Use
- Set Up Trigger: The workflow defaults to a manual trigger. You can replace this with any n8n trigger (e.g., cron, webhook) to automate scheduling.
- Configure Google Sheets:
- Provide your Google Sheets credentials.
- Specify the Google Sheet containing your 'List of URLs' (first column should be
url). - Ensure a 'Results' sheet is present to receive the extracted data (adjust columns as needed). An example Google Sheet is provided in the workflow notes.
- ScrapingBee API Key:
- Obtain an API key from ScrapingBee.
- Update the
api_keyparameter in both "ScrapingBee - Get page HTML" (if used) and "ScrapingBee - Get page screenshot" nodes.
- Google Gemini Credentials:
- Provide your Google Gemini (PaLM) API credentials to the "Google Gemini Chat Model" node.
- Adjust Structured Output:
- Review and modify the
jsonSchemaExamplein the "Structured Output Parser" node to match the exact data fields you want the AI to extract (e.g.,product_title,product_price). Remember to align this with your Google Sheet 'Results' columns.
- Review and modify the
- Test and Activate: Run a test to ensure data flows correctly, then activate your workflow to begin automated scraping.
Apps Used
Workflow JSON
{
"id": "3c0d2263-2dac-4963-b33b-d8993ce75b32",
"name": "Vision-Based AI Agent for Smart E-commerce Data Extraction",
"nodes": 24,
"category": "Operations",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 3c0d2263-2dac...
About the Author
DevOps_Master_X
Infrastructure Expert
Specializing in CI/CD pipelines, Docker, and Kubernetes automations.
Statistics
Related Workflows
Discover more workflows you might like
Google Sheets to Icypeas: Automated Bulk Domain Scanning
This workflow streamlines the process of performing bulk domain scans by integrating your Google Sheets data directly with the Icypeas platform. Automate the submission of company names from your spreadsheet to Icypeas for comprehensive domain information, saving valuable time and effort.
Instant WooCommerce Order Notifications via Telegram
When a new order is placed on your WooCommerce store, instantly receive detailed notifications directly to your Telegram chat. Stay on top of your e-commerce operations with real-time alerts, including order specifics and a direct link to view the order.
On-Demand Microsoft SQL Query Execution
This workflow allows you to manually trigger and execute any SQL query against your Microsoft SQL Server database. Perfect for ad-hoc data lookups, administrative tasks, or quick tests, giving you direct control over your database operations.