AI Image Captioning and Overlay Workflow

Name: AI Image Captioning and Overlay Workflow
Rating: 5 (9 reviews)
Author: Free N8N

Community Verified

Beginner

0 nodes connected

detail.loadingPreview

Free N8N Temples

191 views

0 downloads

AI & LLMsautomationcaptioninggeminiimage overlayimage processinglangchainllmmultimodal ai

This n8n workflow uses Google Gemini to generate captions for images and then overlays them onto the original image using the Edit Image node. It demonstrates multimodal AI capabilities within n8n.

🚀Ready to Deploy This Workflow?

⚡Deploy on Zeabur 🎁Get $200 Credit on DigitalOcean

About This Workflow

Overview

This n8n workflow automates the process of generating descriptive captions for images and then visually applying those captions back onto the image itself. It leverages the power of multimodal Large Language Models (LLMs) like Google Gemini to understand image content and produce relevant text. The workflow also utilizes n8n's image manipulation nodes to resize images, calculate text positioning, and draw the generated caption directly onto the image, creating a visually enhanced output.

This workflow solves the problem of manually captioning images and then separately editing them. It streamlines content creation, enhances image metadata, and provides a foundation for automated image annotation, watermarking, or descriptive overlays.

Key Features

Utilizes Google Gemini's multimodal vision capabilities for image understanding.
Generates structured JSON output for caption titles and text.
Dynamically calculates optimal text and background rectangle positioning for overlays.
Applies AI-generated captions as text overlays on the original image using the Edit Image node.
Includes image resizing for AI processing.
Demonstrates a practical application of LLMs and image editing within n8n.

How To Use

Import Image: Use the HTTP Request node (Get Image) to fetch an image from a URL. Replace the default URL with your desired image source.
Process Image for AI: The Get Info and Resize For AI nodes prepare the image for the multimodal LLM. Get Info retrieves dimensions, and Resize For AI resizes it to a suitable resolution.
Generate Caption: The Google Gemini Chat Model and Structured Output Parser nodes work together to send the image to Gemini and receive a structured caption (title and text).
Calculate Positioning: The Calculate Positioning (Code node) takes the image dimensions and the generated caption to determine the best placement, size, and number of lines for the caption text and its background rectangle.
Apply Caption: The Apply Caption to Image node uses the calculated positioning and the generated caption text to draw a semi-transparent background rectangle and then the caption text onto the image.
Merge Results: The Merge Image & Caption and Merge Caption & Positions nodes help combine the processed image data with the captioning results.

Apps Used

automation

captioning

gemini

image overlay

image processing

langchain

llm

multimodal ai

Workflow JSON

{
  "id": "4333b61d-5e83-4bdd-b0b7-881a10a16dd5",
  "name": "AI Image Captioning and Overlay Workflow",
  "nodes": 0,
  "category": "AI & LLMs",
  "status": "active",
  "version": "1.0.0"
}

Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.

Get This Workflow

ID: 4333b61d-5e83...

About the Author

Free n8n Workflows Official

System Admin

The official repository for verified enterprise-grade workflows.

Statistics

Downloads0

Rating

9/5

Verification Info

Community Verified

This workflow has been verified by the community

📄

Source

awesome-n8n-templates

Get Custom Workflow

Need a specific automation? Our experts can build it for you.

Trusted by top companies
7+ years experience

Related Workflows

Discover more workflows you might like

Browse All n8n Workflows

Beginner✓ Verified

AI & LLMsyoutubetrending videosai

Automated YouTube Trending Video Analysis with AI

Leverage the AI Agent node and YouTube Search tool to identify and analyze trending YouTube videos based on a niche. Get insights into content popularity and patterns.

0 nodes

View Workflow

Beginner✓ Verified

AI & LLMsRAGVoice AIElevenLabs

Build a Voice RAG Chatbot with ElevenLabs and OpenAI

Create an interactive voice-enabled RAG chatbot using ElevenLabs for speech synthesis and OpenAI for AI agent capabilities. This workflow integrates with Qdrant Vector Store and Google Drive for knowledge retrieval, enabling intelligent responses to user queries.

0 nodes

130

View Workflow

Beginner✓ Verified

AI & LLMsAILLMLangchain

AI-Powered Conversational Agent with Tools

This n8n workflow creates an AI conversational agent that leverages multiple tools, including Wikipedia and a weather API, to answer complex user queries. It utilizes a buffer memory to maintain conversation context.

0 nodes

129

View Workflow

Browse All n8n Workflows