
Create AI videos with 230+ avatars in 140+ languages.
Long training videos that ramble lose learners fast.
If you're short on time, drowning in SME detail, or building for global teams, scripting can feel slow and messy.
In this guide I'll share my tips on script writing, pacing, and timing so you can write tight, engaging, multilingual-ready training videos that are backed by real instructional design principles.
Alternatively, if this all feels like too much effort, you could check out our AI script generator or try one of our free training video script templates
Do these 2 things before you write a script
Don't pick up your pen and paper or laptop just yet. Every script writing process has to begin with pre-planning. These two tips are essential to making the actual writing process as smooth as possible.
1. Define the goals and learning objectives of your training video
Training videos are designed to help people learn new skills or improve existing ones. When creating a training video, it's important to define the goals and learning objectives you hope to achieve.
This will help you determine the content and style of the video, as well as the target audience you're hoping to reach.
For example, if your goal is to teach people how to use a new software program, you'll want to create a video that is clear and concise, with step-by-step instructions.
On the other hand, if your goal is to improve customer service skills, you might want to create a video that is more lighthearted and fun, while still providing useful information.
So before you start writing anything, answer these questions:
- Why do you want to create this video?
- Who is the target audience watching the video?
- What do you want your audience to take away from the video?
2. Choose the right video type
Now that you know what you want your training video to achieve, it's time to choose the right video type. Each format works best for different learning scenarios.
Talking head videos
Talking head videos are essentially videos where a person (or people) talk into the camera. In the context of training videos, the talking head would likely be a subject matter expert (SME), who acts as a presenter and narrator for the video.
Based on Clark & Mayer's personalization principle, humans love to learn from coaches and characters, so the talking head in the video acts like a coach of sorts, guiding the learner through the process in a friendly manner.
Best for: Building trust and connection, especially for soft skills training.
Visual tip: Keep backgrounds simple to avoid distraction. With Synthesia's expressive avatars and AI voices, you can create consistent presenter-led content across all your modules.
Screen recordings
Screen recordings are perfect for showing a step-by-step process on screen. Think software training, IT training, cybersecurity training, and the like.
Best for: Software tutorials and technical processes.
Visual tip: Crop or zoom to the action area to reduce visual noise; keep cursor movement steady. Capture exactly what matters with the AI screen recorder, then layer a concise voiceover for clarity.
If you're interested in a detailed breakdown of creating instructional videos using screen recordings, check out 'Eight Guidelines for the Design of Instructional Videos for Software Training' by Hans & Jan van der Meij.
Microlearning videos
Microlearning videos are becoming increasingly popular, as they offer a more concise and effective way of delivering information. These videos are typically less than 5 minutes in length, and they focus on one specific topic or skill.
Best for: Quick skill refreshers and just-in-time learning.
Visual tip: Start from a template to keep structure consistent across short modules. We have a whole blog post dedicated to microlearning videos if you're interested.
Scenario-based videos
These videos use role-plays for sales, customer service, and HR conversations. They're great for soft skills training where learners need to see best practices in action.
Best for: Demonstrating interpersonal skills and decision-making.
Visual tip: Use short scenes with one goal per scene and clear decision points. Build dialogue scenes with different avatars to model best-practice conversations.
Animated explainers
Use these for conceptual topics like compliance principles or company policies before diving into hands-on demonstrations.
Best for: Abstract concepts that need visual metaphors.
Visual tip: Keep animations simple and relevant to avoid cognitive overload.
Build your training video script outline
Before diving into the full script, I always create a quick outline. This prevents rambling and keeps your video focused. Here's my 5-point checklist:
- Hook: Why this matters in one sentence
- Promise: What they'll learn and time to complete
- Steps: 3-7 maximum (if you say "and" inside a step, split it)
- Recap: Quick summary of key points
- CTA: Practice task, next video, or quiz
This simple structure tackles the most common problem I see: scripts that try to cover too much at once.
How to write and structure a training video script: 6-step framework
Here's my 6-step framework for writing a script for a training video.
1. Introduction (Hook + Promise)
Writing an introduction for a training video is fairly straightforward - make the viewer aware of what they're watching and what they'll learn in the video. But here's the key: start with a hook that grabs attention.
Example opening: "Phishing emails are getting smarter. In the next 3 minutes, you'll learn the 3 checks that keep you safe."
This Hook + Promise model works because it creates urgency and sets clear expectations.
2. Why it's important
To capture and keep the attention of the viewer, you have to let them know why learning how to do X is important. If the viewers don't see value in your video, why would they watch it?
Call out a real consequence - save time, avoid an error, meet compliance. Keep it concrete. For instance: "Knowing how to export reports correctly will save you 20 minutes every Monday morning."
3. Demonstration
This is where you either demonstrate or explain how to do the task. If applicable, separate the process into distinct steps - this will make it easier for the learners to comprehend.
In terms of pacing, I like to aim for 110-150 words per minute and 60-180 seconds per micro-topic. Adding a "Pause and try it" beat after key steps to give learners time to practice is often a good idea.
I also like to write out the title of each step on the screen to create a distinct visual separation between the steps. This will also make it easier for viewers to rewind the video to a concrete step if needed.
4. Finished result
Show what the end result is supposed to look like. This will give the viewers a clear overview of the goal and what they're working towards. I actually like to show the end state briefly at the start (during your introduction), then again here to reinforce the goal. This preview technique helps learners understand the bigger picture.
5. Benefit + step recap
Take what you said in steps 2 and 3 and briefly mention them again:
- Remind them of the benefits of knowing how to do the task
- Do a quick recap of the steps (2-3 bullets on screen work well)
Consider using dynamic captions here for accessibility and to reinforce key points visually.
6. Call to action
A call to action (CTA) is an essential element of any training video. By clearly stating what you want your viewers to do after watching the video, you can help ensure they retain the information and put it into practice.
Match your CTA to your business goals:
- For skill practice: "Try this task yourself using the practice environment"
- For knowledge check: "Take the 2-minute quiz to test your understanding"
- For support: "Questions? Check our knowledge base or contact IT support"
Use custom CTAs and chapters to guide learners to their next action seamlessly.
Research-based tips for writing a training video script
There are a lot of unsupported claims and tips out there about how you should and shouldn't write video scripts. I went through the effort of finding tips based on research, so you don't have to.
Tip #1: Use conversational tone over formal tone
Remember the personalization principle we mentioned earlier? Well, this principle also suggests that it is better to use a conversational tone in e-learning.
Here's why:
- First, the familiarity of a conversational tone requires less cognitive effort to understand
- Second, a narration presented in the first or second voice is more appealing to the user and helps process instructions more actively
- Lastly, according to a research paper by Mayer, Fennell, Farmer, & Campbell, this conversational narration type greatly enhances learning and raises interest as compared to a formal style
Tip #2: Active voice over passive voice
This simple tip is a natural extension of the first one. If you're using a conversational tone, use active voice, like:
"In this video, I will show you how to..."
instead of using passive voice, like:
"In this video, you will be shown how to..."
Passive voice is generally only used in academic writing, and since we've established that it's not applicable for a training video, opt for active voice instead.
Tip #3: Use signaling and segmenting
Add on-screen labels, arrows, and short chapters to guide attention. Split complex flows into discrete segments to prevent cognitive overload.
This aligns with Mayer's Signaling and Segmenting principles from multimedia learning research. You can use features like Emphasis Animations and Zoom/Pan to highlight key UI actions at the right moment.
See Mayer's multimedia learning principles summarized here.
Tip #4: Keep the video short
Large-scale MOOC data from Guo, Kim, & Rubin (HarvardX/edX) found that engagement drops sharply after about 6 minutes.
Whichever length guideline you choose to follow, keeping the sentences short in your video scripts and avoiding lengthy, fluffy descriptions will positively contribute to information comprehension among learners.
Tip #5: Preview the end result in the beginning
Before getting into the nitty-gritty, a preview of the goal can be beneficial for a number of reasons. A preview can:
- Orient the user and show the bigger picture
- Illustrate the meaning of a task
- Raise user awareness
- Serve as a framework for the learning that lies ahead
So try to include preview sections when writing video scripts.
Tip #6: Explain new concepts by showing their use in context
According to the just-in-time principle developed by Merriënboer, Kirschner, & Kester, providing relevant information right when the user needs it to perform a task reduces the load on their memory.
So make sure to align the introduction of a new concept in the narration part of your script with what's being shown on the screen.
Just-in-time information allows users to complete tasks more efficiently because they don't have the burden on their minds of having too much data coming in at once.
Checklists
Accessibility and inclusion checklist
Making your training videos accessible isn't just the right thing to do - it improves learning for everyone. Here's my quick checklist:
- Use Dynamic Captions: Ensure high contrast and minimal on-screen text overlap
- Don't rely on color only: Name UI elements clearly ("Select 'Reports' then 'Monthly'")
- Avoid idioms: They confuse non-native speakers and don't translate well
- Describe key visuals briefly in voiceover: Help those who can't see the screen clearly
These simple adjustments make your content work for learners with different abilities and learning preferences.
Localization without the rework
If you're creating training for global teams, plan for translation from the start:
- Keep text concise: Translations often run 30% longer than English
- Avoid culture-specific references: Sports metaphors and local examples don't travel well
- Leave visual space: Text expansion can break your layouts
- Centralize terminology: Create a glossary and reuse consistent terms
- Use AI Voices: Quickly localize narration while keeping tone consistent across languages
With Synthesia's AI voices and expressive avatars, you can maintain presenter consistency across all language versions without re-recording.
SME collaboration and review
Subject matter experts often provide way too much detail. Here's how I manage SME input effectively:
- Ask SMEs for the 3-5 must-know points. Park everything else in a resources link
- Pilot test with 3-5 learners. Mark any confusion points for revision
- Final check: Remove filler words, ensure action verbs, verify the UI hasn't changed
This process prevents scope creep and speeds up approvals significantly.
Measure and iterate
Don't just publish and forget. Track these metrics to improve your scripts over time:
- Primary outcome: Task success rate (can learners complete the task?)
- Engagement metric: Completion rate or quiz accuracy
- Quick feedback: Add a simple question: "Were you able to complete X?"
Use a custom CTA to drive learners to a quick knowledge check or feedback form. This data helps you refine future scripts based on what actually works.
Take your training script to the next level
Well, there you have it! Everything you need to know about writing a training video script that actually works. By following our 6-step framework and incorporating the research-based tips I've shared, your training videos will be clearer, more engaging, and easier to produce.
Now that you know everything about writing video scripts, you're ready to create training content that your learners will actually want to watch. No more endless, boring videos - just clear, engaging content that gets results.
Ready to turn your script into a video? Pick a template, choose an AI avatar, and generate your video with our script to video functionality
About the author
Strategic Advisor
Kevin Alster
Kevin Alster heads up the learning team at Synthesia. He is focused on building Synthesia Academy and helping people figure out how to use generative AI videos in enterprise. His journey in the tech industry is driven by a decade-long experience in the education sector and various roles where he uses emerging technology to augment communication and creativity through video. He has been developing enterprise and branded learning solutions in organizations such as General Assembly, The School of The New York Times, and Sotheby's Institute of Art.

Frequently asked questions
What is a training script?
A training video script is a vital tool in the creation of an effective and engaging training video.
Just as a good screenplay is the foundation of a successful film, a well-written script is the key to making a great training video.
The script determines the overall structure of the video, including the sequence of scenes, the content of each scene, and the deliverables for each scene.
How should I start a training video script—what's an effective hook and promise?
An effective training video opening combines a compelling hook with a clear promise of what viewers will learn. Start with a hook that creates urgency or highlights a real consequence, such as "Did you know that one click on a phishing email could cost your company millions of dollars?" Then immediately follow with a promise that sets clear expectations: "In the next 3 minutes, you'll learn how to identify phishing emails and the correct steps to report them to IT." This Hook + Promise model works because it addresses a real pain point and tells viewers exactly what they'll gain.
Avoid generic introductions that simply state the topic. Instead, connect to your audience's immediate needs by calling out specific benefits like time saved, errors avoided, or compliance met. For example, "Knowing how to export reports correctly will save you 20 minutes every Monday morning" speaks directly to the viewer's daily experience. This approach ensures your training video captures attention from the first seconds and motivates learners to continue watching.
How do I structure a training video script to keep viewers engaged from start to finish?
A well-structured training video script follows a clear 6-step framework that maintains engagement throughout. Start with a hook and promise that grabs attention (like "Phishing emails are getting smarter. In the next 3 minutes, you'll learn the 3 checks that keep you safe"), explain why the topic matters with concrete benefits, demonstrate the process in clear steps, show the finished result, recap the key points, and end with a specific call to action. This structure works because it creates urgency, sets clear expectations, and guides learners through a logical progression.
Keep each section focused and concise by limiting scenes to 3-4 sentences and changing visuals every 10-20 seconds. Use the two-column script format with one column for visual instructions and another for voiceover to ensure your pacing stays tight. This structured approach helps you avoid the common pitfall of rambling content that loses viewers, instead creating training videos that learners actually want to complete.
How long should a training video script be, and what pacing works best for learning?
Research shows that engagement drops sharply after 6 minutes, making shorter videos more effective for learning retention. Aim for 60 seconds to 3 minutes for micro-skills, and up to 6 minutes for conceptual explanations. When writing your script, target 110-150 words per minute for optimal comprehension, which translates to roughly 165-450 words for a 90-second video or 660-900 words for a 6-minute video. Break longer content into multiple videos rather than cramming everything into one lengthy session.
Structure your pacing by changing scenes every 10-20 seconds and limiting each scene to 3-4 sentences of voiceover. Include "pause and try it" moments after key steps to give learners time to practice. This pacing strategy prevents cognitive overload while maintaining engagement, ensuring your training content delivers maximum impact without losing viewer attention.
How do I incorporate visuals, captions, and on-screen instructions into my script?
Create a two-column script format with one column for visual elements and another for audio narration to effectively coordinate all multimedia elements. In your visual column, specify exactly what appears on screen: whether it's a talking head, screen recording with highlighted UI elements, text overlays, or B-roll footage. For on-screen instructions, use the 4-part micro-pattern: orient viewers to where they are on screen, state the action clearly (Click, Type, Select), show the result, and add any helpful tips. Write step titles directly on screen to create visual separation between sections.
Make your content accessible by including dynamic captions with high contrast, avoiding color-only indicators, and describing key visuals briefly in the voiceover. When adding text on screen, duplicate only the most important points from your narration, such as key definitions or step numbers. This multi-modal approach reinforces learning while accommodating different learning styles and accessibility needs, creating training videos that work effectively for all viewers.
Can Synthesia help me write, translate, and produce training video scripts at scale?
Synthesia streamlines the entire training video creation process from script to final video, making it easy to produce content at scale. The platform includes an AI script generator that helps you write effective training scripts, plus over 60 training video templates designed for different learning scenarios like microlearning, scenario-based training, and step-by-step tutorials. You can convert existing training materials like PDFs or PowerPoint slides directly into engaging videos, then use AI avatars and voices to bring your script to life without any filming.
For global teams, Synthesia excels at multilingual training production. Write your script once and quickly localize it into 140+ languages while maintaining consistent presenter quality through AI avatars and voices. The platform handles the complexities of translation, including text expansion and cultural adaptation, allowing you to create training videos that resonate with learners worldwide. This combination of script assistance, templates, and multilingual capabilities enables L&D teams to produce professional training content faster and more cost-effectively than traditional video production methods.








.png)




