AI Image Captioning and Overlay Workflow
detail.loadingPreview
This n8n workflow uses Google Gemini to generate captions for images and then overlays them onto the original image using the Edit Image node. It demonstrates multimodal AI capabilities within n8n.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This n8n workflow automates the process of generating descriptive captions for images and then visually applying those captions back onto the image itself. It leverages the power of multimodal Large Language Models (LLMs) like Google Gemini to understand image content and produce relevant text. The workflow also utilizes n8n's image manipulation nodes to resize images, calculate text positioning, and draw the generated caption directly onto the image, creating a visually enhanced output.
This workflow solves the problem of manually captioning images and then separately editing them. It streamlines content creation, enhances image metadata, and provides a foundation for automated image annotation, watermarking, or descriptive overlays.
Key Features
- Utilizes Google Gemini's multimodal vision capabilities for image understanding.
- Generates structured JSON output for caption titles and text.
- Dynamically calculates optimal text and background rectangle positioning for overlays.
- Applies AI-generated captions as text overlays on the original image using the Edit Image node.
- Includes image resizing for AI processing.
- Demonstrates a practical application of LLMs and image editing within n8n.
How To Use
- Import Image: Use the
HTTP Requestnode (Get Image) to fetch an image from a URL. Replace the default URL with your desired image source. - Process Image for AI: The
Get InfoandResize For AInodes prepare the image for the multimodal LLM.Get Inforetrieves dimensions, andResize For AIresizes it to a suitable resolution. - Generate Caption: The
Google Gemini Chat ModelandStructured Output Parsernodes work together to send the image to Gemini and receive a structured caption (title and text). - Calculate Positioning: The
Calculate Positioning(Code node) takes the image dimensions and the generated caption to determine the best placement, size, and number of lines for the caption text and its background rectangle. - Apply Caption: The
Apply Caption to Imagenode uses the calculated positioning and the generated caption text to draw a semi-transparent background rectangle and then the caption text onto the image. - Merge Results: The
Merge Image & CaptionandMerge Caption & Positionsnodes help combine the processed image data with the captioning results.
Apps Used
Workflow JSON
{
"id": "4333b61d-5e83-4bdd-b0b7-881a10a16dd5",
"name": "AI Image Captioning and Overlay Workflow",
"nodes": 0,
"category": "AI & LLMs",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: 4333b61d-5e83...
About the Author
Free n8n Workflows Official
System Admin
The official repository for verified enterprise-grade workflows.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
Automated YouTube Trending Video Analysis with AI
Leverage the AI Agent node and YouTube Search tool to identify and analyze trending YouTube videos based on a niche. Get insights into content popularity and patterns.
Build a Voice RAG Chatbot with ElevenLabs and OpenAI
Create an interactive voice-enabled RAG chatbot using ElevenLabs for speech synthesis and OpenAI for AI agent capabilities. This workflow integrates with Qdrant Vector Store and Google Drive for knowledge retrieval, enabling intelligent responses to user queries.
AI-Powered Conversational Agent with Tools
This n8n workflow creates an AI conversational agent that leverages multiple tools, including Wikipedia and a weather API, to answer complex user queries. It utilizes a buffer memory to maintain conversation context.