
Create AI videos with 230+ avatars in 140+ languages.
Creating videos is no easy task, but a good video script can provide a solid foundation to make it easier. But before you start writing a video script, you should take time for some deep, creative thinking.
Yes, you are probably tempted to cut right to the chase, write the script, and film the video. But trust me, these 5 pre-writing steps are essential if you want your video to actually make sense.
Here's my step-by-step guide to writing a video script.
Step 1: Define your audience and goal
After helping thousands of creators develop effective video scripts, I've learned that the actual writing is just one piece of the puzzle. The real magic happens in the preparation. And it all starts with knowing exactly who you're talking to and what you want them to do. Here's how I recommend approaching audience and goal definition. For your audience, ask yourself:
- What's their current knowledge level on this topic?
- What specific problem are they trying to solve?
- Where and how will they be watching—mobile during their commute or desktop at work?
- What objections or concerns might they have?
For your goals, choose ONE primary outcome: educate, persuade, or instruct. Define your success metrics (completion rate, engagement, conversions) and determine your call-to-action before you start writing.
I've seen too many scripts fail because creators tried to accomplish multiple goals in one video. Focus on one clear outcome, and your script will be infinitely more effective.
Step 2: Choose the right video type
Distinct video types require different approaches to script writing. The format you choose should align with both your goal and your audience's expectations. Let me walk you through the most common types and when to use each.
Training videos
Training videos follow a different script format than marketing or explainer content. The goal isn't to sell, but to educate and equip learners with knowledge. Your script should break down complex processes into digestible steps, use clear transitions between concepts, and include checkpoints for understanding.
In my experience working with training teams, the most effective training scripts focus on one skill or concept at a time. They also incorporate real-world scenarios that learners can relate to, making the content stick better.
Explainer videos
Creating an explainer video demonstrates the value of your product or service to your target audience. An explainer video script needs to be short and catchy to keep attention without sounding salesy. Start with the problem your audience faces, introduce your solution as the logical answer, and show exactly how it works in practice.
Marketing videos
A marketing video script needs to strike the right balance between showcasing value and not being too pushy. These can include promo videos, product demos, case studies, or testimonials. The key is leading with benefits rather than features, and always answering the viewer's question: "What's in it for me?"
How-to videos
How-to videos are essential for customer support and onboarding. A well-scripted how-to video demonstrates a solution concisely, without overwhelming detail. These work particularly well as knowledge base videos or customer onboarding content.
Step 3: Pick your visual delivery method
The visuals you choose determine how you write your script. A talking head video needs first-person narration ("I'll show you..."), while screen recordings work better with second-person instructions ("You'll click on..."). Let's break down each approach.
Talking head or AI avatar
When using a talking head or AI avatar, write in first person to create a personal connection. Keep sentences between 8-16 words for natural delivery, and include brief pause markers between complex ideas. Add phonetic spelling for brand names or technical terms to ensure proper pronunciation.
Screen recordings

For screen recordings, use second-person language and write step-by-step instructions that match on-screen actions exactly. Include timing cues for when to start and stop recordings, and plan for cursor movements and highlights. Your script should guide viewers through each click and action clearly.
B-roll or stock footage

When using b-roll or stock footage, your script needs to complement the visuals without being too literal. Use the footage to reinforce concepts or provide context while your narration carries the main message. This approach works particularly well for abstract concepts or emotional storytelling.
There are 3 main ways to source b-roll. You can generate it with AI, you can use your own footage, or you can use stock footage.
Text on screen
Sometimes, to really make sure the message sticks, duplicate the narration with text on the screen.
This is particularly useful for quotes, testimonials, definitions, and video headlines at the very beginning.

Animation
Animation scripts focus on storytelling elements. Write descriptive visual cues, plan smooth transitions between scenes, and consider metaphors that can be visualized. The script should paint a picture that your animator (or animation tool) can bring to life.
Step 4: Write a brief (your north star)
Before diving into the full script, I always create a brief—a five-sentence summary of the video's core message. This becomes your north star, keeping you focused when you're deep in the writing process. Think of it as your elevator pitch for the video.
Your brief should answer:
- What problem are we solving?
- Why should viewers care?
- What's our main solution or message?
- What proof or examples will we share?
- What action do we want viewers to take?
Having these answers upfront prevents scope creep and keeps your script focused.
Step 5: Create a beat sheet before scripting
Here's where most people go wrong—they jump straight into writing dialogue without mapping out the video's structure. Instead, create what I call a beat sheet, outlining key moments in your video.
This approach, validated by research from Columbia University's Center for Teaching and Learning, helps structure content for maximum retention.
Step 6: Write your script using the two-column format
Now it's time to write your script using a two-column format that separates visuals from audio. This forces you to think about what viewers see AND hear simultaneously. Research from UMass Amherst shows that scripts written with visual-audio synchronization in mind achieve 20% better viewer retention.

Left Column: Visuals & On-Screen Text
Include scene descriptions, on-screen text (keep to 6-9 words max per line), and visual cues like screen recordings, graphics, or animations. Be specific about what appears when.
Right Column: Voiceover/Dialogue
Write your spoken narration, include pronunciation notes for technical terms, and add pause markers for emphasis. Remember to keep each scene to one main idea—if you're introducing a new concept, start a new scene.
If you're using Synthesia's AI Video Assistant, you can paste your script directly into the platform and get suggestions for improving clarity and engagement. The tool helps identify areas where your script might be too complex or where pacing needs adjustment.
Step 7: Write, test, and refine
Your first draft will be rough, and that's perfectly fine. The magic happens in the revision process. After writing your initial script, step away for a few hours or overnight. Come back with fresh eyes and read it aloud—this is non-negotiable.
Reading aloud reveals awkward phrasing, tongue-twisters, and unnatural transitions that look fine on paper. Mark any spots where you stumble or run out of breath. These are signals to simplify or break up sentences. Aim for 130-150 spoken words per minute, which Columbia University research shows is optimal for comprehension.
Next, test your script with a colleague who represents your target audience. Watch their face as they read or listen—where do they look confused? Where do they engage? Their feedback is gold. I've found that scripts improve dramatically after just one round of peer review.
Make your beginning memorable
The first 5 seconds determine whether viewers stay or leave. McKinsey's AI research found that "rise-fall" narrative arcs, where a story builds, dips, and resolves, boost engagement by 22% over linear stories.
Here are two proven hooks that work:
- Ask a rhetorical or thought-provoking question that addresses your viewer's pain point directly. Show the end result at the start—people want to know where you're taking them.
- Introduce a relatable scenario that mirrors your audience's experience.
Whatever approach you choose, make it specific and relevant to your viewer's world.
Script quality checklist: 12 points to review before production
Before you move to production, run through this checklist I've developed over years of script reviews. Each point addresses a common failure point in video scripts.
Free script template you can use today
Here's a simple template that works for most business videos. I've used this structure for everything from training videos to product demos, and it consistently delivers results.
Scene 1: Hook (5-10 seconds)
Visual: Problem visualization
Script: "Creating [specific task] shouldn't take [current time]. Here's how to do it in [shorter time]."
Scene 2: Solution Overview (15-20 seconds)
Visual: Presenter or product demo
Script: "The key is [main solution]. Here's exactly how it works."
Scene 3: Proof (30-45 seconds)
Visual: Step-by-step demonstration
Script: "Watch this: [demonstrate key steps with specific outcomes]."
Scene 4: Call-to-Action (5-10 seconds)
Visual: End card with clear next step
Script: "Ready to try this? [Specific action] at [specific location]."
Take your video content to the next level
A successful script makes all the difference in your final video. With the framework I've shared, you're equipped to write scripts that engage viewers, deliver your message clearly, and drive action. Remember, great scripts aren't written—they're rewritten, tested, and refined.
Ready to turn your script into a professional video? Synthesia's script-to-video tool can transform your written content into engaging videos with AI avatars in minutes, not days. The platform's Live Collaboration features let team members review and edit scripts together, eliminating those endless email chains that slow down production.
Start with one video, apply these principles, and watch your content transform from good to exceptional. Your audience is waiting for clear, engaging video content—now you know exactly how to deliver it.
About the author
Strategic Advisor
Kevin Alster
Kevin Alster heads up the learning team at Synthesia. He is focused on building Synthesia Academy and helping people figure out how to use generative AI videos in enterprise. His journey in the tech industry is driven by a decade-long experience in the education sector and various roles where he uses emerging technology to augment communication and creativity through video. He has been developing enterprise and branded learning solutions in organizations such as General Assembly, The School of The New York Times, and Sotheby's Institute of Art.

Frequently asked questions
How do I structure a video script to keep viewers engaged from the hook to the call-to-action?
A compelling video script follows a clear structure that guides viewers through your message naturally. Start with a hook in the first 5 seconds that addresses your audience's specific pain point or shows the end result they want to achieve. Then move into context (5-15 seconds) explaining why this matters right now, followed by your solution overview (15-30 seconds) where you present your main message clearly. Include 2-3 proof points or examples (30-90 seconds) that demonstrate your solution in action, and finish with one specific, actionable call-to-action in the final 10 seconds.
This structure works because it mirrors how people naturally process information, moving from problem recognition to solution discovery. Each section should flow seamlessly into the next using transitional phrases like "Here's how" or "Watch this" to maintain momentum. By keeping each scene focused on one main idea and limiting narration to 3-4 sentences per scene, you create natural variety that holds attention throughout your video.
What's the best way to format a video script so visuals and narration stay in sync?
The two-column format is the industry standard for keeping visuals and narration perfectly synchronized. Create a simple table with your left column dedicated to visual elements (scene descriptions, on-screen text, graphics, and visual cues) and your right column for audio elements (voiceover narration, pronunciation guides, and pause markers). This format forces you to think about what viewers see and hear simultaneously, ensuring your message lands effectively.
When filling in your two-column script, be specific about timing and transitions. For example, if you're describing a product feature in the narration, note exactly when that feature should appear on screen in the visual column. Include details like "Screen recording starts" or "Cut to talking head" so anyone reading your script understands the exact visual-audio relationship. This approach eliminates confusion during production and helps you spot mismatches between what's being said and shown before you start creating your video.
How long is a 2-minute video script in words?
A 2-minute video script typically contains 260-300 words, based on the optimal speaking pace of 130-150 words per minute for video content. This pace allows viewers to comfortably process information without feeling rushed or bored. Speaking too fast can overwhelm your audience, while speaking too slowly risks losing their attention, so staying within this range ensures maximum comprehension and engagement.
To check if your script fits the 2-minute target, read it aloud using a timer and mark any spots where you naturally pause or stumble. These pauses add to your total time, so factor them into your word count. If you're including on-screen text or visual demonstrations that require extra processing time, aim for the lower end of the range (around 260 words) to give viewers time to absorb both visual and audio information effectively.
How should I adapt my script for different video types like training, explainers, and how-to videos?
Each video type requires a distinct scripting approach that aligns with its specific purpose and audience expectations. Training videos should break down complex processes into digestible steps, use clear transitions between concepts, and include checkpoints for understanding. Focus on one skill or concept at a time and incorporate real-world scenarios that learners can relate to. Explainer videos need to start with the problem your audience faces, introduce your solution as the logical answer, and demonstrate exactly how it works in practice while maintaining a balance between informative and engaging content.
How-to videos demand concise, step-by-step instructions that match on-screen actions exactly. Use second-person language ("You'll click on...") and include timing cues for when to start and stop screen recordings. For all video types, match your tone to your audience's expectations: professional for corporate training, conversational for explainers, and instructional for how-to content. The key is understanding that each format serves a different stage in your viewer's journey, from awareness (explainers) to consideration (how-to) to decision (training).
How can Synthesia help me draft, refine, and turn my script into a finished video?
Synthesia streamlines the entire video creation process from initial script to polished video. Start by using the AI script generator, which only requires selecting a video template and describing your topic and audience. You can add optional details like language, context, objective, persona, and tone for more targeted results. The platform generates a complete script already divided into scenes, eliminating the guesswork of structuring your content effectively.
Once your script is generated, Synthesia's platform lets you refine it directly within the video editor, where you can see how your words will appear with AI avatars and visuals. The Live Collaboration features enable team members to review and edit scripts together in real-time, eliminating endless email chains. When you're satisfied with your script, simply select your AI avatar, customize visuals, and generate your video. This integrated approach means you can go from script idea to finished video in minutes rather than days, with the flexibility to make changes at any stage without expensive reshoots.