Paul Graham Essay Scraper and Text Extractor
detail.loadingPreview
Scrapes the list of Paul Graham's essays, fetches the first few, and extracts their plain text content.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This workflow is designed to scrape the titles and links of Paul Graham's essays from a web page, then fetch the content of the first few essays, and finally extract the plain text from the HTML content.
Key Features
- Fetches a list of essay titles and their corresponding URLs.
- Limits the number of essays to process (e.g., the first 3).
- Extracts plain text content from the HTML of the essays, excluding specific elements like images and navigation.
- Provides a mechanism to prepare essay text for further processing (e.g., embedding, vector storage).
How To Use
- Import the workflow: Load the provided n8n workflow JSON into your n8n instance.
- Configure Webhook URL: Set the
WEBHOOK_URLenvironment variable to the URL of the page listing Paul Graham's essays. - Configure Base URL: Set the
BASE_URLenvironment variable to the base URL of the website where the essays are hosted (if different from the listing page). - Execute the workflow: Trigger the workflow manually or via an external event.
- Review the output: The extracted essay titles and plain text content will be available in the output of the respective nodes.
Apps Used
Workflow JSON
{
"id": "e7e243ca-e5e4-4d93-b1c9-74ea5618aaf1",
"name": "Paul Graham Essay Scraper and Text Extractor",
"nodes": 0,
"category": "Web Scraping",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: e7e243ca-e5e4...
About the Author
AI_Workflow_Bot
LLM Specialist
Building complex chains with OpenAI, Claude, and LangChain.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
Jina.ai Multipage Website Scraper
Scrape entire websites without an API key using Jina.ai.
Selenium Ultimate Scraper Workflow
A comprehensive workflow for scraping web content using Selenium, including advanced features like cookie handling and driver cleanup.
Scrappey Web Scraper
Scrapes websites using Scrappey's API to bypass anti-bot measures.
Scrape Trustpilot Reviews to Google Sheets
Scrapes reviews from Trustpilot for a specified company and saves them to a Google Sheet.
Community Webpage Crawler
Crawls a given URL and returns its content in Markdown format.
Community Contributed News Extraction (Unverified)
Extracts news articles from a website without an RSS feed, filters by date, and generates summaries and keywords.