Generate Voiceover from Video Using OpenAI Vision and TTS

Name: Generate Voiceover from Video Using OpenAI Vision and TTS
Rating: 5 (9 reviews)
Author: Free N8N

Community Verified

Beginner

0 nodes connected

detail.loadingPreview

Free N8N Temples

144 views

0 downloads

AI & Machine Learningaudio generationautomationllmopenaiopencvpythonttsvideovision

This workflow downloads a video, extracts frames using OpenCV, and uses OpenAI's Vision model to generate a script. The script is then converted into a voiceover using OpenAI's TTS and uploaded to Google Drive.

🚀Ready to Deploy This Workflow?

⚡Deploy on Zeabur 🎁Get $200 Credit on DigitalOcean

About This Workflow

Overview

This n8n workflow automates the process of creating a voiceover from a video file. It leverages OpenAI's multimodal capabilities to understand the visual content of the video and then generates an audio narration. The workflow begins by downloading a video, followed by extracting key frames using Python and OpenCV. These frames are then processed in batches by the OpenAI Chat Model (specifically designed for vision tasks) to generate a descriptive script. Finally, the generated script is used by another OpenAI Chat Model call to produce an MP3 voiceover, which is then uploaded to Google Drive. This template solves the problem of quickly generating audio narratives for video content without manual transcription or voice acting.

Key Features

Downloads video from a URL using HTTP Request node.
Extracts evenly distributed frames from video using Python and OpenCV (Code node).
Processes video frames with OpenAI's multimodal Vision capabilities to generate a descriptive script.
Uses OpenAI's text-to-speech (TTS) to convert the generated script into an audio file.
Uploads the final voiceover audio file to Google Drive.

How To Use

Trigger Workflow: Start the workflow by clicking 'Test workflow' on the manual trigger node.
Download Video: The HTTP Request node downloads a sample video. Replace the URL with your desired video.
Extract Frames: The Code node (Python) processes the video and extracts a series of evenly distributed frames. Adjust max_frames if needed, but be mindful of memory usage.
Generate Script: The OpenAI Chat Model node analyzes the extracted frames and generates a script describing the video's content.
Convert Script to Voiceover: The OpenAI Chat Model node (configured for TTS) takes the generated script and converts it into an audio file (MP3).
Upload to Google Drive: The Google Drive node uploads the generated voiceover MP3 file to your specified Google Drive folder.

Apps Used

audio generation

automation

llm

openai

opencv

python

tts

video

vision

Workflow JSON

{
  "id": "cf35aed5-c751-42de-b777-cfa594867376",
  "name": "Generate Voiceover from Video Using OpenAI Vision and TTS",
  "nodes": 0,
  "category": "AI & Machine Learning",
  "status": "active",
  "version": "1.0.0"
}

Note: This is a sample preview. The full workflow JSON contains node configurations, credentials placeholders, and execution logic.