Text-to-Video in Python
1092 Words ~6 Minute Reading Time • Subscribe to receive updates on Tutorials
Tutorial: Text-to-Video using Python
Video content is king. However, producing engaging video content can be time-consuming or require expensive software. Luckily, with Python and a few libraries, we can automate the creation of dynamic text-to-video content. This post will guide you through creating a Python script that transforms a text file into a video with a changing background color.
Dependencies
Before we start, make sure to install the following Python libraries:
- Pillow (PIL): For image manipulation, especially creating images with text.
- gTTS (Google Text-to-Speech): For converting text into speech.
- moviepy: For creating the final video.
- pydub: For handling audio segments, specifically for getting the duration of the audio.
- colorsys: For converting between different color systems.
You can install these libraries with pip:
pip install pillow gTTS moviepy pydub colorsys
The Python Script
Create a file called text_to_video.py
or something you want and copy the following code into it:
from PIL import ImageFont, ImageDraw, Image
from gtts import gTTS
from moviepy.editor import ImageSequenceClip, AudioFileClip
import argparse
from pydub import AudioSegment
import colorsys
import numpy as np
import os
# Variables for customization
TEXT_SPEED = 24 # frames per second
TEXT_COLOR = (255, 255, 255)
FONT_PATH = "DMSerifDisplay-Regular.ttf" # Path to .ttf font file (change this to your font file)
FONT_SIZE = 180
BACKGROUND_SPEED = 0.8 # Background color change speed (lower value means slower)
TIMING_ADJUSTMENT = -0.3 # Adjusts the duration of each word in the video
START_BG_COLOR = "#000000" # Start color in HEX
END_BG_COLOR = "#6638f0" # End color in HEX
# Function to convert HEX color to RGB
def hex_to_rgb(hex_color):
hex_color = hex_color.lstrip("#")
return tuple(int(hex_color[i : i + 2], 16) for i in (0, 2, 4))
# interpolate color
def interpolate_color(start_color, end_color, progress):
start_color = hex_to_rgb(start_color)
end_color = hex_to_rgb(end_color)
start_h, start_s, start_v = colorsys.rgb_to_hsv(
start_color[0] / 255, start_color[1] / 255, start_color[2] / 255
)
end_h, end_s, end_v = colorsys.rgb_to_hsv(
end_color[0] / 255, end_color[1] / 255, end_color[2] / 255
)
interpolated_h = start_h + (end_h - start_h) * progress
interpolated_s = start_s + (end_s - start_s) * progress
interpolated_v = start_v + (end_v - start_v) * progress
r, g, b = colorsys.hsv_to_rgb(interpolated_h, interpolated_s, interpolated_v)
return int(r * 255), int(g * 255), int(b * 255)
def text_to_video(textfile, outputfile):
with open(textfile, "r") as f:
lines = f.read()
words = lines.split()
images = []
durations = []
fnt = ImageFont.truetype(FONT_PATH, FONT_SIZE)
# Generate speech for the whole text and save as a temporary file
tts = gTTS(text=lines, lang="en")
tts.save("temp.mp3")
# Measure the speech duration using pydub
full_audio = AudioSegment.from_file("temp.mp3")
full_audio_duration = len(full_audio) / 1000 # duration in seconds
avg_word_duration = full_audio_duration / len(words) # average duration per word
# Inside your text_to_video function, when setting frame duration:
durations.append(
avg_word_duration + TIMING_ADJUSTMENT
) # Adjust frame duration based on average word duration and timing adjustment
for i, word in enumerate(words):
# Calculate text size and position only once per word
text_width, text_height = fnt.getsize(word)
position = ((VIDEO_SIZE[0] - text_width) / 2, (VIDEO_SIZE[1] - text_height) / 2)
# Calculate background color based on word index and total number of words
background_progress = i / len(words)
background_color = interpolate_color(
START_BG_COLOR, END_BG_COLOR, background_progress
)
img = Image.new(
"RGB", VIDEO_SIZE, color=background_color
) # Set background color
d = ImageDraw.Draw(img)
d.text(position, word, font=fnt, fill=TEXT_COLOR)
images.append(np.array(img))
durations.append(
avg_word_duration
) # Set frame duration based on average word duration
audioclip = AudioFileClip("temp.mp3")
clip = ImageSequenceClip(images, durations=durations)
clip = clip.set_audio(audioclip)
clip.fps = TEXT_SPEED
clip.write_videofile(outputfile, codec="libx264")
# Remove the temporary file
os.remove("temp.mp3")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Convert text file to video")
parser.add_argument("textfile", help="The name of the text file to convert")
parser.add_argument("outputfile", help="The name of the output mp4 file")
parser.add_argument(
"--format-short",
action="store_true",
help="Set this flag if you want a short video",
)
args = parser.parse_args()
VIDEO_SHORT = args.format_short
VIDEO_SIZE = (1080, 1920) if VIDEO_SHORT else (1920, 1080) # width, height
text_to_video(args.textfile, args.outputfile)
How to Run
To execute the script, you need to have a text file ready with the content you want to transform into video. This script will read the text file, create a video where each word appears in sync with a spoken version of the text (using Google's Text-to-Speech service), and save it as an MP4 file.
To run the script, use the following command in your terminal:
python script_name.py input_text.txt output_video_name.mp4
To create Youtube Shorts, pass a flag like:
python script_name.py input_text.txt output_video_name_short.mp4 --format-short
Note: You need to have ffmpeg installed on your system to run this script. If you don't have it, you can install it with
brew install ffmpeg
on Mac orsudo apt install ffmpeg
on Linux.
also running the script without the --format-short
flag will create a 1080x1920 video. You can adjust the video dimensions by changing the VIDEO_SIZE
variable in the script.
Customization
There are several variables at the top of the script that you can adjust to customize the output:
VIDEO_SHORT
: I deprecated this, instead use the--format-short
flag when running the script.VIDEO_SIZE
: The resolution of the video. Depending onVIDEO_SHORT
, it's either portrait (1080x1920) or landscape (1920x1080).TEXT_SPEED
: The frames per second of the video.TEXT_COLOR
: The color of the text in RGB.FONT_PATH
: The path to the .ttf font file.FONT_SIZE
: The size of the font.BACKGROUND_SPEED
: The speed of the background color change. Lower values mean slower changes.TIMING_ADJUSTMENT
: Adjusts the duration of each word in the video.START_BG_COLOR
andEND_BG_COLOR
: The start and end colors for the background color interpolation, specified in HEX format.
Feel free to experiment with these values to create a video that suits your needs.
Example Videos
Here are a few example videos created with this script:
Text File Used
Create a text file with the content you want to transform into video. Name it input_text.txt
or something you want.
Here is the text file I used to create the example videos:
Technology is a useful servant but a dangerous master. This is a quote by Christian Lous Lange. Thank you for watching. Have a great day!
As you can see, the possibilities are endless. You can create engaging and dynamic videos from simple text files using this Python script.
Happy coding!
Supporting My Work
Please consider Buying Me A Coffee. I work hard to bring you my best content and any support would be greatly appreciated. Thank you for your support!