Web Scraper and Data Extractor for Products
detail.loadingPreview
Scrapes product data from web pages and saves it to Google Sheets.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This workflow automates the process of extracting product information from specified URLs. It utilizes a web scraping service to fetch raw HTML content, cleans it to isolate relevant product details, and then employs an AI model to extract structured data such as name, description, rating, reviews, and price. Finally, the extracted data is appended to a Google Sheet for further analysis and use.
Key Features
- Fetches web page content using a web scraping API.
- Cleans raw HTML by removing unnecessary tags, scripts, styles, and comments.
- Extracts structured product data (name, description, rating, reviews, price) using an AI model.
- Appends extracted data to a Google Sheet.
- Configurable to scrape multiple URLs from a Google Sheet.
How To Use
- Import this workflow into your n8n instance.
- Configure the following environment variables:
API_BASE_URL: The base URL for your web scraping service.WEBHOOK_URL: A URL for a webhook, potentially for notifications.
- Configure the following Google Sheets credentials:
- Create a Google Cloud Project and enable the Google Sheets API.
- Create OAuth 2.0 Client IDs and download the client secret JSON file.
- In n8n, create a new Google Sheets OAuth2 API credential and upload the client secret JSON file.
- Update the Google Sheets node (
get urls to scrapeandadd results) with your specific:WEB_SHEET_ID: The ID of your Google Sheet.TRACK_SHEET_GID: The GID of the sheet containing URLs to scrape.RESULTS_SHEET_GID: The GID of the sheet where results will be appended.
- Ensure the Google Sheets node has the correct
documentIdandsheetNamefor both input and output. - Configure the
OpenRouter Chat ModelandStructured Output Parsernodes if you need to use a different AI model or schema. - Map the relevant input URL to the
urlparameter in thescrap urlnode (e.g.,={{ $json.url }}). - Ensure the
BRIGHTDATA_TOKENenvironment variable is set for theAuthorizationheader in thescrap urlnode.
Apps Used
Workflow JSON
{
"id": "0b21949b-8aa5-4f2d-aa3e-e5655694e9d5",
"name": "Web Scraper and Data Extractor for Products",
"nodes": 0,
"category": "Web Scraping",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 0b21949b-8aa5...
About the Author
Crypto_Watcher
Web3 Developer
Automated trading bots and blockchain monitoring workflows.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
Scrappey Web Scraper
Scrapes websites using Scrappey's API to bypass anti-bot measures.
LinkedIn Web Scraping with Bright Data and Google Gemini
Scrape LinkedIn person and company profiles using Bright Data MCP and generate stories with Google Gemini.
Selenium Ultimate Scraper Workflow
A comprehensive workflow to scrape websites using Selenium and process the extracted data.
Selenium Ultimate Scraper Workflow
A comprehensive workflow for scraping web content using Selenium, including advanced features like cookie handling and driver cleanup.
Web Scraping and Content Processing
This workflow scrapes a webpage, processes its content, and prepares it for further use.
Community Contributed Web Scraper (Unverified)
Scrapes web page content and returns it in Markdown format.