How to Convert Voice to Video with No Equipment

Karina Kasparova
October 5, 2022

Yes, you read that right.

This isn't a tutorial teaching you how to record voice overs and combine them with video clips.

Instead, we are looking into the world of text-to-speech, AI-generated voices, and AI avatars.

We know what you're thinking: "But I don't want to have a robot-like voice over in my video!" 🤖

Well, we don't want that either. That's why we're going to show you how to create stunning videos with human-sounding voiceovers with a little help from AI.

Here's what you'll learn in this guide:

  • 3 different voice over types
  • pros and cons of each type
  • how to make a voice over video using text-to-speech
  • 3 tips for creating a voice over video

Let's dive in!

3 types of voice overs for video

1. Voice actor voice overs

If you're looking for an exceptional voice over for your video and have a budget to spend, professional voice actors are the way to go.

These voice actors are trained and experienced in evoking the perfect emotion, vocal tone, and inflection for any video type.

You can find professional voice actors online quite easily, either through a professional agency or a freelancer site.

2. Self-recorded voice overs

Another method is recording voice overs by yourself.

This allows you to add a personal touch to your video and be in charge of the process from start to finish.

If you have the necessary equipment - a microphone to record voice overs and software to process audio - a self-made voice over is a cost-effective and simple way to create a voice over video.

But if you, just like most people, cringe when you hear your own voice, this may not be the best solution. 😉

3. AI voice overs (text-to-speech)

This sounds like something out of a futuristic sci-fi movie, but AI-generated voices are more common than you think.

Think Siri, Alexa, Cortana, and the voice on your GPS. All of these voices are based on real humans but generated using deep learning technology.

And using text-to-speech (TTS) for voice over videos is becoming more and more commonplace.

There are many many videos online we could show as examples, but because we're a little biased we'll show our own 😅

Learn more about our text-to-speech feature | Synthesia

Voice over recording vs. text-to-speech: pros and cons

How do the 3 types of video voice over options compare? Which one fits your video needs better? Let's discuss.

Voiceovers using voice actors

✅ Pros:

  • likely high-quality results - if you find a voice actor with a proven track record and good experience, you are almost guaranteed to get good results
  • native-level pronunciation - if you find a native speaker for the specific language and locale, the pronunciation will be on point
  • voice overs in multiple languages - you can find voice actors for almost any language, and most agencies offer translation/localization services as well

❌ Cons:

  • expensive - the standard rate for 0-2 minutes of voice over is around $100 - $499
  • time-consuming - professional voice overs can take a few days, depending on the voice actor's workload
  • lack of control over details - besides the initial brief, there isn't much room for feedback and corrections

Self-recorded voiceovers

✅ Pros:

  • full control over pronunciation, tone, emotion
  • create voice overs and videos at your pace
  • re-record your voice as many times as needed to get it right

❌ Cons:

  • need for good equipment for better results, e.g. an external microphone
  • need for a quiet spot - ambient noise can decrease the quality dramatically
  • need for recording and editing software
  • not great if you aren't comfortable with speaking
  • voice over limited to the languages you speak

Text-to-speech voice overs

✅ Pros:

  • no microphones or software needed
  • affordable - the price depends on the service provider, but you can even find free TTS generators
  • instant results - upload/paste the text, click the export button, and download the audio file
  • voice overs in multiple languages - you don't need to speak a language to create a voice over
  • partial control over pronunciation and emphasis using SSML - if the pronunciation isn't quite right, you can adjust it using Speech Synthesis Markup Language
  • consistent audio quality - you don't need to worry about background noise

❌ Cons:

  • limited voice options for lesser-spoken languages - the fewer people in the world speak the language, the fewer voice options there are for that language
  • limited range of emotions - while the technology has gotten pretty good at sounding human-like, the range of emotional expression is still limited

Whatever option you decide to go with is down to your resources (time, money, equipment) and how comfortable you are recording yourself.

We are big fans of text-to-speech voiceovers, so that's the process we will be discussing in this guide.

How to create stunning videos with text-to-speech voiceovers

For demonstration purposes, we will be using a text-to-speech video tool called Synthesia, but most of the steps will apply to any video maker with similar functionalities.

We also have a video tutorial on the topic, if you're more of a visual learner:

How to Make a Voiceover Video | Synthesia STUDIO

Step #1: Start with a template

If you're not an experienced video maker or are starting from scratch, it can be hard to know what the content structure should be and how to best visualize your voice over.

Before even thinking about making a voice recording, find a template that fits the topic of your video.

The benefit of starting with a ready-made video template is that it already has the informational and visual structure in place that you can use as a starting point.

For example, Synthesia has more than 50 video templates with topics ranging from how-to videos, sales videos, training and internal communications, and more.

To get started with a template in Synthesia STUDIO, choose one of the 3 options:

1. In your STUDIO dashboard, click on 'Templates' on the left-hand side, choose one and select 'Create video'

2. In your video canvas, click 'Templates' on the right-hand-side, choose one and either add all slides or individual ones

3. In your dashboard, preview and choose templates directly from the top bar on the screen

Step #2: Create your video script

Now that you have an idea of the video structure, it's time to create a video script to go along with that structure.

For every video slide, write 4-5 sentences of text at most.


Because it's crucial to summarize your ideas into bite-sized pieces that only include the most important information, otherwise you risk overwhelming the audience with information.

Step #3: Paste text and choose language

Now that you have your video script, it's time to paste it into the video maker slide by slide so that it matches the video layout.

Synthesia will recognize the language automatically, but if it doesn't, or you want to choose another accent/voice, you can do that right in the script box.

To add a voice over in STUDIO, head over to the script box in the video canvas and click on the voice selector.

You can either scroll through the list of available voices or look for one using the 'Search' function.

Tip 💡

Did you know we have more than 60 different TTS languages to choose from?
See the full overview of our AI voices.

Step #4: Add visuals

Even if you started with a ready-made template, it's a good idea to add a personal touch and align your voice over with custom visuals.

This includes:

  • animations
  • transitions
  • screen recordings
  • images, videos, shapes
  • on-screen text
  • human or AI presenters

We have a whole course on using visuals in Synthesia STUDIO to help you get started.

Here are the 3 main options we like to use when we edit videos:

1. Text on screen

One way to make your viewers absorb and remember information is by duplicating it through visuals and audio.

You can do that by breaking down your video script into bullet points on screen and timing the animation with the voice over recording.

To add text to your Synthesia video, click on 'Text' on the right side of the video canvas, and choose between adding a Title, Subtitle, or Body Text.

To edit the text, simply click on it. The 'Format' tab will appear, allowing you to edit everything from font to color to spacing, and more.

To time animation to the voice over, click on 'Animate,' choose the animation type and style, and edit the 'Delay' and 'Duration' fields.

2. (Almost) Human presenter

Seeing a talking human is a natural extension of hearing a voice, so it's only logical to use human presenters to visualize a voice over.

Hiring an actor is quite expensive, so we recommend using AI avatars for the job.

Yes, we know what comes to mind - deepfake videos on YouTube with a poor lip-to-voice synchronization. 🤐

But that's not what we're talking about. Modern video tools like Synthesia offer 50+ realistic AI avatars based on real humans, which are used by 8000+ companies to create professional videos.

It's important to note, that according to an eye-tracking research study from 2017, talking head videos can lose engagement quite quickly if there is no variation.

To prevent that, you can add graphical elements, and vary the position of the presenter and the camera angles. We have a whole lesson on creating camera angles for AI avatars to help with that.

To add an AI avatar to your video, click on 'Avatar' on the right-hand side of the video canvas and choose one that fits your topic and style.

3. Stock (or own) footage

The most popular way of visualizing a voice over is by including images and video footage.

If you have your own photo and video content, you can easily upload it to Synthesia.

Otherwise, take advantage of stock footage.

Whatever you use, make sure the different video segments align well with the voice over.

To add and upload images and videos in Synthesia, you can go to 'Images' or 'Background' and browse the selection or upload your own photo or video file.

3 tips for creating a voice over video

If you want to record your voice over or use text-to-speech, here are a few tips to help you get the best quality audio.

Tip #1: Add background music

A simple voiceover video might sound a bit too plain without any music, so include it if you can.

Just make sure the music is at a comfortable volume that doesn't overshadow the narration.

To add music in Synthesia, simply click on the 'Music' tab on the right-hand side and choose one that fits the video or upload your own high-quality audio track.

Tip #2: Add an (almost) human face

A video with a presenter (otherwise known as a talking head video) is always more engaging to watch than a simple voice over video.

If you don't have the equipment or don't feel comfortable filming yourself, you can use an AI presenter to narrate the video for you.

In Synthesia STUDIO, you can choose from 65+ diverse human-like AI avatars.

Tip #3: Avoid mumbling

Whether you're recording your own voice or using an AI voice over, make sure that the pronunciation is clear.

Re-record the voice over if you struggle to pronounce the words clearly, or, if you're using TTS, adjust the pronunciation with SSML until it's just right.

You can find more information on adjusting pronunciation with SSML on Synthesia's knowledge base.

Are you ready to create a voice over video?

Whatever voice over option you choose to go with for your video, we hope this post gave you some insights, ideas, and inspiration.

And if you do decide to go with an AI text-to-speech voice over, check out Synthesia STUDIO.

Frequently Asked Questions