How to Write a Video Script (+ Free Template)

Written by
Kevin Alster
October 13, 2025

Create AI videos with 230+ avatars in 140+ languages.

Try Free AI Video
Get Started for FREE
Get started

Creating videos is no easy task, but a good video script can provide a solid foundation to make it easier. But before you start writing a video script, you should take time for some deep, creative thinking.

Yes, you are probably tempted to cut right to the chase, write the script, and film the video. But trust me, these 5 pre-writing steps are essential if you want your video to actually make sense.

Here's my step-by-step guide to writing a video script.

📝 Quick start: video script essentials
  • Clearly define your target audience and primary goal—this shapes your tone, content, and call-to-action.
  • Choose the right video type and visual delivery method to align with your message and audience.
  • Create a beat sheet to outline key moments and ensure proper pacing before writing dialogue.
  • Use a two-column format to separate visuals from audio for synchronized viewer experience.
  • Write in a conversational style, keep scenes concise, and aim for 130-150 words per minute.
  • Read your script aloud, test with colleagues, and refine iteratively.
  • Match your writing style to your delivery method for greater impact.

Step 1: Define your audience and goal

After helping thousands of creators develop effective video scripts, I've learned that the actual writing is just one piece of the puzzle. The real magic happens in the preparation. And it all starts with knowing exactly who you're talking to and what you want them to do. Here's how I recommend approaching audience and goal definition. For your audience, ask yourself:

  • What's their current knowledge level on this topic?
  • What specific problem are they trying to solve?
  • Where and how will they be watching—mobile during their commute or desktop at work?
  • What objections or concerns might they have?

For your goals, choose ONE primary outcome: educate, persuade, or instruct. Define your success metrics (completion rate, engagement, conversions) and determine your call-to-action before you start writing.

I've seen too many scripts fail because creators tried to accomplish multiple goals in one video. Focus on one clear outcome, and your script will be infinitely more effective.

Step 2: Choose the right video type

Distinct video types require different approaches to script writing. The format you choose should align with both your goal and your audience's expectations. Let me walk you through the most common types and when to use each.

Training videos

Training videos follow a different script format than marketing or explainer content. The goal isn't to sell, but to educate and equip learners with knowledge. Your script should break down complex processes into digestible steps, use clear transitions between concepts, and include checkpoints for understanding.

In my experience working with training teams, the most effective training scripts focus on one skill or concept at a time. They also incorporate real-world scenarios that learners can relate to, making the content stick better.

Explainer videos

Creating an explainer video demonstrates the value of your product or service to your target audience. An explainer video script needs to be short and catchy to keep attention without sounding salesy. Start with the problem your audience faces, introduce your solution as the logical answer, and show exactly how it works in practice.

Marketing videos

A marketing video script needs to strike the right balance between showcasing value and not being too pushy. These can include promo videos, product demos, case studies, or testimonials. The key is leading with benefits rather than features, and always answering the viewer's question: "What's in it for me?"

How-to videos

How-to videos are essential for customer support and onboarding. A well-scripted how-to video demonstrates a solution concisely, without overwhelming detail. These work particularly well as knowledge base videos or customer onboarding content.

⚠️ Avoid these common script mistakes
  • Writing for the page instead of the ear: Scripts should sound natural when spoken. Write how you speak, use contractions, and don't fear starting sentences with "But" or "So."
  • Cramming too much into one scene: Stick to one main idea per scene to avoid overwhelming your viewers.
  • Ignoring visual-audio sync: Make sure your narration matches what's on the screen for clarity.
  • Using jargon without context: Explain technical terms simply and use the "explain it to a friend" test to keep your script accessible.

Step 3: Pick your visual delivery method

The visuals you choose determine how you write your script. A talking head video needs first-person narration ("I'll show you..."), while screen recordings work better with second-person instructions ("You'll click on..."). Let's break down each approach.

Talking head or AI avatar

image

When using a talking head or AI avatar, write in first person to create a personal connection. Keep sentences between 8-16 words for natural delivery, and include brief pause markers between complex ideas. Add phonetic spelling for brand names or technical terms to ensure proper pronunciation.

{lite-youtube videoid="2KDuI4_RH0U" style="background-image: url('https://img.youtube.com/vi/2KDuI4_RH0U/maxresdefault.jpg');" }

Screen recordings

image

For screen recordings, use second-person language and write step-by-step instructions that match on-screen actions exactly. Include timing cues for when to start and stop recordings, and plan for cursor movements and highlights. Your script should guide viewers through each click and action clearly.

B-roll or stock footage

image

When using b-roll or stock footage, your script needs to complement the visuals without being too literal. Use the footage to reinforce concepts or provide context while your narration carries the main message. This approach works particularly well for abstract concepts or emotional storytelling.

There are 3 main ways to source b-roll. You can generate it with AI, you can use your own footage, or you can use stock footage.

Text on screen

Sometimes, to really make sure the message sticks, duplicate the narration with text on the screen.

This is particularly useful for quotes, testimonials, definitions, and video headlines at the very beginning.

image

Animation

Animation scripts focus on storytelling elements. Write descriptive visual cues, plan smooth transitions between scenes, and consider metaphors that can be visualized. The script should paint a picture that your animator (or animation tool) can bring to life.

Step 4: Write a brief (your north star)

Before diving into the full script, I always create a brief—a five-sentence summary of the video's core message. This becomes your north star, keeping you focused when you're deep in the writing process. Think of it as your elevator pitch for the video.

Your brief should answer:

  • What problem are we solving?
  • Why should viewers care?
  • What's our main solution or message?
  • What proof or examples will we share?
  • What action do we want viewers to take?

Having these answers upfront prevents scope creep and keeps your script focused.

Step 5: Create a beat sheet before scripting

Here's where most people go wrong—they jump straight into writing dialogue without mapping out the video's structure. Instead, create what I call a beat sheet, outlining key moments in your video.

This approach, validated by research from Columbia University's Center for Teaching and Learning, helps structure content for maximum retention.

🔑 Beat sheet: your script blueprint
  • Hook (0-5 seconds): Grab attention with a compelling problem or outcome.
  • Context (5-15 seconds): Explain why viewers should care right now.
  • Solution overview (15-30 seconds): Present your main message clearly and concisely.
  • Proof points (30-90 seconds): Share 2-3 specific examples or demonstrations.
  • Call-to-action (final 10 seconds): Give one clear, actionable next step.

This structure mirrors how audiences process information, guiding them logically from problem to solution.

Step 6: Write your script using the two-column format

Now it's time to write your script using a two-column format that separates visuals from audio. This forces you to think about what viewers see AND hear simultaneously. Research from UMass Amherst shows that scripts written with visual-audio synchronization in mind achieve 20% better viewer retention.

Two-column script template showing visual and audio elements

Left Column: Visuals & On-Screen Text

Include scene descriptions, on-screen text (keep to 6-9 words max per line), and visual cues like screen recordings, graphics, or animations. Be specific about what appears when.

Right Column: Voiceover/Dialogue

Write your spoken narration, include pronunciation notes for technical terms, and add pause markers for emphasis. Remember to keep each scene to one main idea—if you're introducing a new concept, start a new scene.

If you're using Synthesia's AI Video Assistant, you can paste your script directly into the platform and get suggestions for improving clarity and engagement. The tool helps identify areas where your script might be too complex or where pacing needs adjustment.

Step 7: Write, test, and refine

Your first draft will be rough, and that's perfectly fine. The magic happens in the revision process. After writing your initial script, step away for a few hours or overnight. Come back with fresh eyes and read it aloud—this is non-negotiable.

Reading aloud reveals awkward phrasing, tongue-twisters, and unnatural transitions that look fine on paper. Mark any spots where you stumble or run out of breath. These are signals to simplify or break up sentences. Aim for 130-150 spoken words per minute, which Columbia University research shows is optimal for comprehension.

Next, test your script with a colleague who represents your target audience. Watch their face as they read or listen—where do they look confused? Where do they engage? Their feedback is gold. I've found that scripts improve dramatically after just one round of peer review.

🛠️ Top 5 script-writing challenges (and solutions)
  1. "My script sounds robotic when read aloud": Use contractions, casual phrasing, and conversational language.
  2. "I can't fit everything important into a short video": Focus on one key takeaway or create a video series.
  3. "My technical content is too complex for general audiences": Use analogies, real-world examples, and define terms clearly.
  4. "I struggle with timing and pacing": Read aloud, use shorter sentences for emphasis, and mark natural pause points.
  5. "My videos don't get watched to the end": Hook viewers early and introduce pattern interrupts every 30-45 seconds.

Make your beginning memorable

The first 5 seconds determine whether viewers stay or leave. McKinsey's AI research found that "rise-fall" narrative arcs, where a story builds, dips, and resolves, boost engagement by 22% over linear stories.

Here are two proven hooks that work:

  1. Ask a rhetorical or thought-provoking question that addresses your viewer's pain point directly. Show the end result at the start—people want to know where you're taking them.
  2. Introduce a relatable scenario that mirrors your audience's experience.

Whatever approach you choose, make it specific and relevant to your viewer's world.

Script quality checklist: 12 points to review before production

{lite-youtube videoid="4DA48gnzOWU" style="background-image: url('https://img.youtube.com/vi/4DA48gnzOWU/maxresdefault.jpg');" }

Before you move to production, run through this checklist I've developed over years of script reviews. Each point addresses a common failure point in video scripts.

✅ Script quality checklist
  1. One clear goal and call-to-action defined
  2. Hook captures attention within first 5 seconds
  3. Conversational tone throughout (read-aloud test passed)
  4. Visual and audio elements mapped in two columns
  5. Technical terms include pronunciation guides
  6. Each scene focuses on one main idea
  7. On-screen text stays under 2 lines (6-9 words per line)
  8. Timing allows for natural pacing (130-150 words per minute)
  9. Proof points directly support main message
  10. Call-to-action is specific and actionable
  11. Script length matches intended video duration
  12. Peer review completed with target audience representative

Free script template you can use today

Here's a simple template that works for most business videos. I've used this structure for everything from training videos to product demos, and it consistently delivers results.

Scene 1: Hook (5-10 seconds)
Visual: Problem visualization
Script: "Creating [specific task] shouldn't take [current time]. Here's how to do it in [shorter time]."

Scene 2: Solution Overview (15-20 seconds)
Visual: Presenter or product demo
Script: "The key is [main solution]. Here's exactly how it works."

Scene 3: Proof (30-45 seconds)
Visual: Step-by-step demonstration
Script: "Watch this: [demonstrate key steps with specific outcomes]."

Scene 4: Call-to-Action (5-10 seconds)
Visual: End card with clear next step
Script: "Ready to try this? [Specific action] at [specific location]."

Take your video content to the next level

A successful script makes all the difference in your final video. With the framework I've shared, you're equipped to write scripts that engage viewers, deliver your message clearly, and drive action. Remember, great scripts aren't written—they're rewritten, tested, and refined.

Ready to turn your script into a professional video? Synthesia's script-to-video tool can transform your written content into engaging videos with AI avatars in minutes, not days. The platform's Live Collaboration features let team members review and edit scripts together, eliminating those endless email chains that slow down production.

Start with one video, apply these principles, and watch your content transform from good to exceptional. Your audience is waiting for clear, engaging video content—now you know exactly how to deliver it.

About the author

Strategic Advisor

Kevin Alster

Kevin Alster heads up the learning team at Synthesia.  He is focused on building Synthesia Academy and helping people figure out how to use generative AI videos in enterprise.  His journey in the tech industry is driven by a decade-long experience in the education sector and various roles where he uses emerging technology to augment communication and creativity through video.  He has been developing enterprise and branded learning solutions in organizations such as General Assembly, The School of The New York Times, and Sotheby's Institute of Art.

Go to author's profile
Get started

Make videos with AI avatars in 140+ languages

Try out our AI Video Generator

Create a free AI video
Create free AI video
Create free AI video
Unmute

Trusted by 50,000+ teams.

faq

Frequently asked questions

How do I structure a video script to keep viewers engaged from the hook to the call-to-action?

A compelling video script follows a clear structure that guides viewers through your message naturally. Start with a hook in the first 5 seconds that addresses your audience's specific pain point or shows the end result they want to achieve. Then move into context (5-15 seconds) explaining why this matters right now, followed by your solution overview (15-30 seconds) where you present your main message clearly. Include 2-3 proof points or examples (30-90 seconds) that demonstrate your solution in action, and finish with one specific, actionable call-to-action in the final 10 seconds.

This structure works because it mirrors how people naturally process information, moving from problem recognition to solution discovery. Each section should flow seamlessly into the next using transitional phrases like "Here's how" or "Watch this" to maintain momentum. By keeping each scene focused on one main idea and limiting narration to 3-4 sentences per scene, you create natural variety that holds attention throughout your video.

What's the best way to format a video script so visuals and narration stay in sync?

The two-column format is the industry standard for keeping visuals and narration perfectly synchronized. Create a simple table with your left column dedicated to visual elements (scene descriptions, on-screen text, graphics, and visual cues) and your right column for audio elements (voiceover narration, pronunciation guides, and pause markers). This format forces you to think about what viewers see and hear simultaneously, ensuring your message lands effectively.

When filling in your two-column script, be specific about timing and transitions. For example, if you're describing a product feature in the narration, note exactly when that feature should appear on screen in the visual column. Include details like "Screen recording starts" or "Cut to talking head" so anyone reading your script understands the exact visual-audio relationship. This approach eliminates confusion during production and helps you spot mismatches between what's being said and shown before you start creating your video.

How long is a 2-minute video script in words?

A 2-minute video script typically contains 260-300 words, based on the optimal speaking pace of 130-150 words per minute for video content. This pace allows viewers to comfortably process information without feeling rushed or bored. Speaking too fast can overwhelm your audience, while speaking too slowly risks losing their attention, so staying within this range ensures maximum comprehension and engagement.

To check if your script fits the 2-minute target, read it aloud using a timer and mark any spots where you naturally pause or stumble. These pauses add to your total time, so factor them into your word count. If you're including on-screen text or visual demonstrations that require extra processing time, aim for the lower end of the range (around 260 words) to give viewers time to absorb both visual and audio information effectively.

How should I adapt my script for different video types like training, explainers, and how-to videos?

Each video type requires a distinct scripting approach that aligns with its specific purpose and audience expectations. Training videos should break down complex processes into digestible steps, use clear transitions between concepts, and include checkpoints for understanding. Focus on one skill or concept at a time and incorporate real-world scenarios that learners can relate to. Explainer videos need to start with the problem your audience faces, introduce your solution as the logical answer, and demonstrate exactly how it works in practice while maintaining a balance between informative and engaging content.

How-to videos demand concise, step-by-step instructions that match on-screen actions exactly. Use second-person language ("You'll click on...") and include timing cues for when to start and stop screen recordings. For all video types, match your tone to your audience's expectations: professional for corporate training, conversational for explainers, and instructional for how-to content. The key is understanding that each format serves a different stage in your viewer's journey, from awareness (explainers) to consideration (how-to) to decision (training).

How can Synthesia help me draft, refine, and turn my script into a finished video?

Synthesia streamlines the entire video creation process from initial script to polished video. Start by using the AI script generator, which only requires selecting a video template and describing your topic and audience. You can add optional details like language, context, objective, persona, and tone for more targeted results. The platform generates a complete script already divided into scenes, eliminating the guesswork of structuring your content effectively.

Once your script is generated, Synthesia's platform lets you refine it directly within the video editor, where you can see how your words will appear with AI avatars and visuals. The Live Collaboration features enable team members to review and edit scripts together in real-time, eliminating endless email chains. When you're satisfied with your script, simply select your AI avatar, customize visuals, and generate your video. This integrated approach means you can go from script idea to finished video in minutes rather than days, with the flexibility to make changes at any stage without expensive reshoots.