How to Make Text-to-Speech Videos in 8 Minutes

Karina Kasparova
September 8, 2022

Easily scale your video production in 60+ languages.

Text-to-speech is a brilliant solution when you need a voiceover for your video, but don't have the time, equipment or the confidence to record it yourself.

Making text-to-speech videos can be a bit of a hassle - you have to create an audio file, then add text-to-speech to a video editor and piece it together to make a cohesive video.

It's not rocket science, but it's definitely not something a complete beginner can make in an hour.

What if you could convert text not only into speech but also into video with an (almost) human presenter only using one tool? No cameras, microphones, or editing skills required.

Well, you can.

In this blog post, you will learn how to easily create a professional-looking video with a text-to-speech voiceover, all in one browser window.

For all of your visual learners, we have a video tutorial:

Title of video

What are the benefits of using text-to-speech for videos?

Naturally, nothing beats a voiceover made by a real human.

But what if you need to translate your video into different languages? What if you don't like the sound of your own voice? What if you're working with a limited budget?

Let's discuss how text-to-speech can solve all of the above problems.

Benefit #1: No need to record separate audio files

Have you ever recorded your own voice and couldn't handle the cringe when listening to it? We definitely have. 😬

Also, recording audio for a voiceover requires decent equipment (a microphone and an editor software), which can cost quite a bit.

And let's be realistic, a voiceover recorded on your iPhone simply doesn't sound that great. 🙉

That's where text-to-speech software comes in handy: you don't need any equipment whatsoever, and you can avoid the oh-so-dreaded cringe.

Sounds like a win-win to us.

Benefit #2: Large variety of text-to-speech voices

A common fear is that text-to-speech voices sound robotic. 🤖

And that might have been the case 5 years ago, but in 2022 text-to-speech technology has gotten pretty damn good, and AI voices don't sound as robotic as you think.

The added benefit to text-to-speech sounding (almost) human is that you can choose from a large variety of accents, dialects, and other voice variations. You can make your voiceover narration sound professional, easy-going, calm, or lively, all at the click of a button.

Besides, if you aren't happy with the way it sounds, you can always adjust pronunciation using Speech Synthesis Markup Language (SSML for short).

Benefit #3: Quick and cheap localization and translation

If you have any experience with traditional video production, you know that translating/localizing a video into multiple languages is a hassle.

Unless you speak all the languages you want to translate your video into, hiring a translator and voiceover actor will be costly. 💸

Oh, and if you need to re-edit or re-film the video to localize it... Get the cash ready. And be prepared to wait a few weeks for the end result.

With text-to-speech, all you need is your translated text to generate audio in another language.

And if you're using a text-to-video maker, you can create voice overs and videos using only text.

But how??

Well, let us show you.

How to make text-to-speech videos in Synthesia STUDIO

Here's how you can transform text into speech and make engaging videos using only one tool - Synthesia STUDIO.

Step #1: Create a video script

First, make sure you have your video text ready.

Whether you're transforming an existing article into a video, or you're creating video content from scratch, you need to have all the information condensed into a video script.

Pro tip💡

Use no more than 3-4 sentences per video slide to keep the video short and engaging.

Step #2: Choose a template

The easiest way to get started with video creation is by using video templates.

You can of course start from scratch, but if you have no video editing or design experience, templates provide a solid structure and visual language to your video.

For example, Synthesia STUDIO has over 50 templates for various needs: explainer videos, how-to videos, training videos, marketing videos, and more.

To get started with a template in Synthesia STUDIO, click on 'Templates' on the left-hand side, choose a template and click on 'Create video'.

Step #3: Paste your text and choose a voice

This part is pretty straightforward.

Copy your text and paste it into the script box slide by slide.

You will notice that the video editor automatically detects the language of the text for the voiceover.

Feel free to click on the language selector, and choose the accent, dialect, and mood of the voice.

Just make sure that the language on the video editor matches the language of your text. Otherwise, we can't guarantee you will like the results. 😅

Step #4: Visualize your text

The voiceover audio part is now done, but narrated videos would be pretty boring without any visuals to accompany the voiceover.

Don't know how to edit videos? No biggie.

You can create stunning visuals for your videos in Synthesia without any special skills or knowledge.

There are 4 types of visuals you can add to make your text-to-speech videos engaging.

Option 1: AI presenter

Remember that audio file our text-to-speech engine generated in step #3?

Well, you can add a human-like AI presenter to your video that will narrate your speech.

Basically, you can make a talking head video with no real humans or cameras.

Here's how to add an AI presenter in just a few clicks:

Click on 'Avatar' on the right-hand side of the video maker, and choose the one you like best.

You can change its size and position, and choose between a Full-body or Circle view, or just go with the Voice only option.

Option 2: Text on screen

If you really want to emphasize a point, duplicate the voiceover with text on screen.

Add text to your video by clicking on 'Text' on the right side of the video canvas, and choose between adding a Title, Subtitle, or Body Text.

Option 3: Stock footage

Some ideas just need something extra to help bring them to life.

You can use stock videos and images in Synthesia to illustrate the information.

Or upload your own footage, if you have it.

To add images and videos in STUDIO, go to 'Images' or 'Background' and browse the selection, or upload your own images or video clips.

Option 4: Screen recordings

If you need to demonstrate a process on screen for a how-to video or show off your software's specks for an explainer video, screen recordings are essential.

To create a screen recording in STUDIO, go to 'Background' -> 'Uploads'-> 'Record screen'.

When you're done recording, you can crop, trim or loop it by clicking on 'Edit background'.

Watch our video tutorial for more details:

How to Create a Screen Recording Video

Step #5: Download the video

Woohoo! 🎉 Your text-to-speech video is almost ready!

All you have to do now is click on 'Generate video', add captions if needed and let the tool do its magic. 🪄

Once the video is generated, you can share it, download it or embed it.

Ready to create your first text-to-speech video?

If you want to create professional videos without breaking the bank and without spending hours editing video content, why not give Synthesia a go?

Try our text-to-speech video maker for free by creating your own demo video.

Frequently Asked Questions