Web Scraper and Data Extractor for Products

Name: Web Scraper and Data Extractor for Products
Rating: 2.0 (1 reviews)
Author: Free N8N

Beginner

0 nodes connected

detail.loadingPreview

Free N8N Temples

44 views

0 downloads

Web Scrapingaiautomationdata extractiongoogle sheetsweb scraping

Scrapes product data from web pages and saves it to Google Sheets.

🚀Ready to Deploy This Workflow?

⚡Deploy on Zeabur 🎁Get $200 Credit on DigitalOcean

About This Workflow

Overview

This workflow automates the process of extracting product information from specified URLs. It utilizes a web scraping service to fetch raw HTML content, cleans it to isolate relevant product details, and then employs an AI model to extract structured data such as name, description, rating, reviews, and price. Finally, the extracted data is appended to a Google Sheet for further analysis and use.

Key Features

Fetches web page content using a web scraping API.
Cleans raw HTML by removing unnecessary tags, scripts, styles, and comments.
Extracts structured product data (name, description, rating, reviews, price) using an AI model.
Appends extracted data to a Google Sheet.
Configurable to scrape multiple URLs from a Google Sheet.

How To Use

Import this workflow into your n8n instance.
Configure the following environment variables:
- API_BASE_URL: The base URL for your web scraping service.
- WEBHOOK_URL: A URL for a webhook, potentially for notifications.
Configure the following Google Sheets credentials:
- Create a Google Cloud Project and enable the Google Sheets API.
- Create OAuth 2.0 Client IDs and download the client secret JSON file.
- In n8n, create a new Google Sheets OAuth2 API credential and upload the client secret JSON file.
Update the Google Sheets node (get urls to scrape and add results) with your specific:
- WEB_SHEET_ID: The ID of your Google Sheet.
- TRACK_SHEET_GID: The GID of the sheet containing URLs to scrape.
- RESULTS_SHEET_GID: The GID of the sheet where results will be appended.
Ensure the Google Sheets node has the correct documentId and sheetName for both input and output.
Configure the OpenRouter Chat Model and Structured Output Parser nodes if you need to use a different AI model or schema.
Map the relevant input URL to the url parameter in the scrap url node (e.g., ={{ $json.url }}).
Ensure the BRIGHTDATA_TOKEN environment variable is set for the Authorization header in the scrap url node.

Apps Used

automation

data extraction

google sheets

web scraping

Workflow JSON

{
  "id": "0b21949b-8aa5-4f2d-aa3e-e5655694e9d5",
  "name": "Web Scraper and Data Extractor for Products",
  "nodes": 0,
  "category": "Web Scraping",
  "status": "active",
  "version": "1.0.0"
}

Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.