Generate Voiceover from Video Using OpenAI Vision and TTS
detail.loadingPreview
This workflow downloads a video, extracts frames using OpenCV, and uses OpenAI's Vision model to generate a script. The script is then converted into a voiceover using OpenAI's TTS and uploaded to Google Drive.
🚀Ready to Deploy This Workflow?
About This Workflow
Overview
This n8n workflow automates the process of creating a voiceover from a video file. It leverages OpenAI's multimodal capabilities to understand the visual content of the video and then generates an audio narration. The workflow begins by downloading a video, followed by extracting key frames using Python and OpenCV. These frames are then processed in batches by the OpenAI Chat Model (specifically designed for vision tasks) to generate a descriptive script. Finally, the generated script is used by another OpenAI Chat Model call to produce an MP3 voiceover, which is then uploaded to Google Drive. This template solves the problem of quickly generating audio narratives for video content without manual transcription or voice acting.
Key Features
- Downloads video from a URL using HTTP Request node.
- Extracts evenly distributed frames from video using Python and OpenCV (Code node).
- Processes video frames with OpenAI's multimodal Vision capabilities to generate a descriptive script.
- Uses OpenAI's text-to-speech (TTS) to convert the generated script into an audio file.
- Uploads the final voiceover audio file to Google Drive.
How To Use
- Trigger Workflow: Start the workflow by clicking 'Test workflow' on the manual trigger node.
- Download Video: The HTTP Request node downloads a sample video. Replace the URL with your desired video.
- Extract Frames: The Code node (Python) processes the video and extracts a series of evenly distributed frames. Adjust
max_framesif needed, but be mindful of memory usage. - Generate Script: The OpenAI Chat Model node analyzes the extracted frames and generates a script describing the video's content.
- Convert Script to Voiceover: The OpenAI Chat Model node (configured for TTS) takes the generated script and converts it into an audio file (MP3).
- Upload to Google Drive: The Google Drive node uploads the generated voiceover MP3 file to your specified Google Drive folder.
Apps Used
Workflow JSON
{
"id": "cf35aed5-c751-42de-b777-cfa594867376",
"name": "Generate Voiceover from Video Using OpenAI Vision and TTS",
"nodes": 0,
"category": "AI & Machine Learning",
"status": "active",
"version": "1.0.0"
}Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.
Get This Workflow
ID: cf35aed5-c751...
About the Author
N8N_Community_Pick
Curator
Hand-picked high quality workflows from the global community.
Statistics
Verification Info
Related Workflows
Discover more workflows you might like
Chat with Local LLMs via Ollama
Integrate and chat with your local Large Language Models using Ollama and n8n.
Telegram AI Langchain Bot with DALL-E 3 Image Generation
An n8n workflow that acts as a Telegram bot, powered by Langchain, for AI chat interactions and image generation using DALL-E 3.
Visa Requirement Checker
A workflow to check visa requirements based on user input, leveraging Langchain, Cohere embeddings, Weaviate vector store, and Anthropic LLM.