Paul Graham Essay Scraper and Text Extractor
detail.loadingPreview
Scrapes the list of Paul Graham's essays, fetches the first few, and extracts their plain text content.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This workflow is designed to scrape the titles and links of Paul Graham's essays from a web page, then fetch the content of the first few essays, and finally extract the plain text from the HTML content.
Key Features
- Fetches a list of essay titles and their corresponding URLs.
- Limits the number of essays to process (e.g., the first 3).
- Extracts plain text content from the HTML of the essays, excluding specific elements like images and navigation.
- Provides a mechanism to prepare essay text for further processing (e.g., embedding, vector storage).
How To Use
- Import the workflow: Load the provided n8n workflow JSON into your n8n instance.
- Configure Webhook URL: Set the
WEBHOOK_URLenvironment variable to the URL of the page listing Paul Graham's essays. - Configure Base URL: Set the
BASE_URLenvironment variable to the base URL of the website where the essays are hosted (if different from the listing page). - Execute the workflow: Trigger the workflow manually or via an external event.
- Review the output: The extracted essay titles and plain text content will be available in the output of the respective nodes.
Apps Used
Workflow JSON
{
"id": "e7e243ca-e5e4-4d93-b1c9-74ea5618aaf1",
"name": "Paul Graham Essay Scraper and Text Extractor",
"nodes": 0,
"category": "Web Scraping",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: e7e243ca-e5e4...
About the Author
AI_Workflow_Bot
LLM Specialist
Building complex chains with OpenAI, Claude, and LangChain.
Statistics
Verification Info
Related Integrations
- Email Send + Html Extract(6 workflows)
- Html Extract + RSS Feed Read(5 workflows)
- Google Sheets + Html Extract(5 workflows)
- Gmail + Html Extract(4 workflows)
- Html Extract + Schedule Trigger(4 workflows)
- Execute Command + Html Extract(3 workflows)
- Html Extract + Send Grid(3 workflows)
- Baserow + Html Extract(3 workflows)
- Date Time + Html Extract(3 workflows)
- Html Extract + Open Ai(3 workflows)
Related Workflows
Discover more workflows you might like
Jina.ai Multipage Website Scraper
Scrape entire websites without an API key using Jina.ai.
Selenium Ultimate Scraper Workflow
A comprehensive workflow for scraping web content using Selenium, including advanced features like cookie handling and driver cleanup.
Scrappey Web Scraper
Scrapes websites using Scrappey's API to bypass anti-bot measures.
Scrape Trustpilot Reviews to Google Sheets
Scrapes reviews from Trustpilot for a specified company and saves them to a Google Sheet.
Community Webpage Crawler
Crawls a given URL and returns its content in Markdown format.
Community Contributed News Extraction (Unverified)
Extracts news articles from a website without an RSS feed, filters by date, and generates summaries and keywords.