
Create AI videos with 230+ avatars in 140+ languages.
I've helped thousands of users create their first Synthesia videos, and I've noticed the same pattern: most people jump straight into the platform without proper planning, then struggle with script timing, avatar selection, or achieving their desired outcome.
In this guide, I'll walk you through the complete process I recommend—from initial concept to final video—based on what actually works for different use cases.
Whether you're creating training content, marketing videos, or internal communications, the key to successful Synthesia videos isn't just knowing which buttons to click. It's understanding how to plan, structure, and optimize your content for maximum impact.
Recent research from University College London found that Synthesia-generated videos are as effective as traditional presenter-led videos for adult learning outcomes. Over 50,000 businesses, including more than half of the Fortune 100, have created millions of videos using this process. I've seen organizations reduce video production time by up to 75% using this workflow.
Step 0: Plan your video strategy
Before touching Synthesia, spend 10-15 minutes on strategic planning. This upfront investment saves hours of rework later.
Define your video's purpose and audience
Are you creating compliance training for new hires? A product demo for prospects? An executive update for global teams? Each requires a different approach.
Training videos need clear learning objectives and knowledge checks. Product demos combine avatar narration with screen recordings. Internal communications prioritize consistency and quick updates over complex visuals.
Set success metrics upfront
Track completion rates (aim for 80%+), engagement points, or knowledge retention scores. For marketing videos, measure click-through rates and conversions. For training, track assessment scores before and after viewing.
Determine optimal length
Based on customer data: 45-90 seconds for explainers, 2-4 minutes for tutorials, 5-7 minutes for detailed training. Shorter videos get higher completion rates, but ensure you cover essential information.
Plan for localization needs
If you'll need multiple language versions, structure your content for easy translation from the start. Use simple, clear language that translates well across cultures. Avoid idioms and culturally specific references.
Step 1: Write a strategic video script
Your script determines 80% of your video's success. I recommend following the FOCA framework: Focus (hook), Outcome (what they'll learn), Content (main message), Action (clear CTA).
Structure for success
Aim for 2-4 short sentences per scene, with 12-23 scenes total for optimal pacing. Start with a strong hook in your first 5 seconds—pose a question, share a surprising statistic, or address a pain point directly.
Common script mistakes I see
Long lists in narration (use on-screen text for scannable information instead), technical jargon without context, and missing clear CTAs.
Many users find Synthesia's AI-generated scripts helpful as starting points, but always refine them for accuracy and brand tone, especially for technical or financial content.
I recommend writing your script in a conversational tone, as if explaining to a colleague. Read it aloud before importing—if it sounds stiff spoken, it will sound worse with an AI voice.
Step 2: Select your starting point (template vs. file import vs. custom)
Once your script is ready, log in to Synthesia. You have three main paths to start your video, each suited to different needs.
Option 1: Start from a template
Click 'New video' in the top right corner, then browse the 55+ pre-designed templates. Templates work best when you need professional design quickly or want inspiration for layout and transitions.
You can design your own video template and save it for future use. Set up your Brand Kit first with colors, fonts, and logos, then create a master template. This feature is particularly useful for teams creating regular video series.
If using a template, add scenes by clicking the + button on the right side of the video canvas. Select scenes that match your content flow—don't feel obligated to use every available scene type.

Option 2: Start from a file: PDF, Word (DOC, DOCX), PowerPoint (PPT, PPTX), and plain text (TXT)
With Synthesia’s AI video assistant, you can import files like slides, docs, scripts, or outlines and convert them into engaging videos easily.
The AI video assistant will parse the content, turn text into natural-sounding narration, and auto-generate on-brand scenes with pacing, transitions, and relevant visuals. You can then fine-tune scripts, timing, voices, avatars, and layout section-by-section, and instantly regenerate to produce a polished video without manual editing.

Option 3: Start from scratch
Choose a blank canvas when you need complete creative control or have specific brand requirements. While this takes longer, it gives you full flexibility over every element.

Step 3: Choose the right AI avatar for your content
Avatar selection impacts viewer engagement more than you might think. Click 'Avatar' above the video canvas to browse options.
Match avatar to content type
Expressive AI avatars work best for engaging presentations and marketing content—they use natural gestures and varied facial expressions. Professional avatars suit formal training content and compliance videos where authority matters more than entertainment.

Use framing strategically
Most avatars offer waist-up and chest-up options. Use waist-up for introductions and conclusions where full presence matters. Switch to chest-up for detailed explanations where facial expressions convey important information. Mix framings between scenes to add visual variety.
Click the avatar in your canvas to access layout options. You can display the full avatar, switch to a circle view for a modern look, or remove the avatar entirely for voiceover-only scenes.

Custom avatars
For executive communications or brand consistency, create a custom avatar of yourself or your spokesperson. Go to 'Avatars' → 'Create your own avatar'. Choose between a 5-minute web avatar or studio-quality custom avatar based on your needs and budget.

Step 4: Paste script and optimize voiceover
Copy your script and paste it into the script box scene by scene. Don't dump everything into one scene—this creates pacing issues and limits your editing options.

Synthesia automatically detects your script language and suggests a matching AI voice. But don't accept the default—test 2-3 voice options with your first scene. Click the voice selector in the top right corner of the script box to browse alternatives.
Voice selection strategy
Consider your audience's preferences and content tone. Professional voices work for formal training. Conversational voices suit marketing content. For global audiences, choose voices with neutral accents.

Step 5: Build engaging visuals and interactions
This step transforms your video from a talking head into an engaging experience. Focus on visual hierarchy and purposeful animations rather than adding elements for decoration.
Establish visual hierarchy
Each scene needs a clear focal point. Start with your title, add supporting text below, then include one visual element that reinforces your message. Avoid cluttering scenes—if viewers don't know where to look, they'll stop watching.
Text: Click 'Text' above the canvas. Use Title for main points, Subtitle for supporting information, Paragraph for detailed explanations. Maintain consistent positioning across scenes.

Adjust text properties in the right panel. Keep fonts consistent—use no more than two font families throughout your video. Ensure sufficient contrast for accessibility.

Add supporting visuals strategically
Shapes: Click 'Shape' to add geometric elements. Use shapes to create visual containers for text or highlight important information. Adjust color, opacity, and shadows in the right panel.

Media: Click 'Media' to add images, videos, or icons. Search stock content or generate or upload your own. For software tutorials, combine avatar narration with screen recordings rather than static screenshots.

Screen recordings: Click 'Record' to capture your screen directly. Choose specific tabs, windows, or full screen based on your needs. Trim and loop recordings to match script timing perfectly.

Time animations with precision
Animations guide viewer attention when used purposefully. Click any element and scroll to 'Animation' in the right panel. Add enter animations to introduce key points. Use exit animations to clear space for new information.

You can synchronize animations with your script using trigger markers. Add a Marker in your script where you mention a key point, then set that marker as the animation trigger. This creates perfect audio-visual synchronization.

Add meaningful interactivity
Interactive elements boost engagement and knowledge retention. Use them sparingly but strategically—one or two purposeful interactions outperform constant clicking.
Add 'wait for click' pauses at decision points to let viewers control pacing. Create clickable hotspots for knowledge checks or branching scenarios. Note: only shapes and text can be interactive directly. To make images clickable, place a transparent shape over them and set opacity to 0%.
Scene transitions and music
Click a scene thumbnail on the left, then enable 'Scene transition' in the right panel. Choose transitions that match your content tone—professional content uses subtle fades, while marketing videos can handle dynamic transitions.

For background music, enable the 'Music' toggle in the scene settings. Choose from stock options or upload your own. Keep volume low—music should enhance, not compete with narration.

Use the 'Change all' feature in the color selector to update colors across all slides instantly. This maintains brand consistency without manual updates to every element.

Step 6: Quality check before generation
Before generating, run through this checklist. Five minutes of review saves regeneration time and credits.
- Script review: One clear idea per scene? Strong hook in first 5 seconds? Explicit CTA near the end? Natural conversation flow?
- Visual consistency: Consistent typography throughout? Sufficient contrast for readability? Cohesive color scheme? Aligned elements across scenes?
- Timing check: Preview your video using the Play button. Do transitions feel natural? Does pacing match content complexity? Are animations synchronized with narration?
- Accessibility: Clear fonts at readable sizes? High contrast between text and background? Captions enabled for hearing-impaired viewers?
Step 7: Generate and distribute your video
Click 'Generate' in the top right corner. Add a descriptive title and include automatic captions for accessibility and SEO benefits.

Generation typically takes 3-10 minutes depending on video length. You'll receive an email notification when complete.
Once generated, you have multiple distribution options. Download as MP4 for LMS upload or email attachment. Enable video sharing for a direct link—perfect for quick stakeholder reviews. Embed directly into your website or learning platform.

To share, enable 'Enable video sharing' and copy the link. You can also duplicate the video to create variations for different audiences or A/B testing.

Scaling your video production
Once you've mastered single video creation, scale your production efficiently:
- Create reusable templates: After perfecting a video format, save it as a template. Your team can then create consistent videos 10x faster. Set up Brand Kits first to ensure all videos maintain visual consistency.
- Leverage bulk features: Use template variables for personalized video series. Need 50 onboarding videos with different names? Create one template with variables, then bulk generate. For enterprise needs, explore the API for programmatic video creation.
- Establish collaboration workflows: Use Synthesia Spaces for team projects. Set up approval workflows so stakeholders review before final generation. Create different workspaces for different departments or video types.
- Plan for localization: Structure content for easy translation from the start. Synthesia supports 140+ languages—take advantage of this for global reach. Create one master video, then generate versions in multiple languages efficiently.
Measuring success and iterating
Track these metrics to optimize future videos:
- Completion rates: Aim for 80%+ for training videos, 60%+ for marketing content. If rates drop at specific points, that scene needs revision.
- Engagement metrics: Monitor where viewers pause, replay, or drop off. Use this data to adjust pacing and content density.
- Learning outcomes: For training videos, compare pre and post-assessment scores. Strong videos show measurable knowledge improvement.
- Time to value: Track how quickly you can update videos versus traditional methods. Most users report 75% time savings—use this to justify expanded video programs.
One key advantage of Synthesia: easy content updates. When information changes, update the script and regenerate in 30 minutes rather than reshooting. This agility enables you to keep content current and relevant.
Ready to make Synthesia videos that deliver results?
You now have the complete framework for creating effective Synthesia videos. The key isn't perfection on your first attempt—it's starting with a clear plan and improving based on viewer feedback.
Remember: successful videos solve specific problems for specific audiences. Focus on your viewer's needs, keep your message clear, and let Synthesia handle the technical complexity.
Want to dive deeper into video strategy? Check out our FOCA video framework for advanced techniques. And if you're ready to start creating but don't have an account yet, explore our plans to find the right fit for your needs.
About the author
Strategic Advisor
Kevin Alster
Kevin Alster heads up the learning team at Synthesia. He is focused on building Synthesia Academy and helping people figure out how to use generative AI videos in enterprise. His journey in the tech industry is driven by a decade-long experience in the education sector and various roles where he uses emerging technology to augment communication and creativity through video. He has been developing enterprise and branded learning solutions in organizations such as General Assembly, The School of The New York Times, and Sotheby's Institute of Art.

Frequently asked questions
How do I use Synthesia to create a video from start to finish?
Creating a video in Synthesia follows a simple workflow that takes about 35 minutes from concept to shareable video. Start by defining your goal, audience, and call-to-action, then write a 45-60 second script using the FOCA framework (Focus, Outcome, Content, Action). Next, log into Synthesia and choose whether to start from a template, import a file, or create from scratch. Select an AI avatar that matches your content tone, paste your script scene by scene, and add visual elements like text, shapes, or screen recordings to support your message.
Once your content is ready, run a quick quality check for script flow, visual consistency, and timing before clicking 'Generate' in the top right corner. The platform will process your video in 3-10 minutes, after which you can download it as an MP4, share via direct link, or embed it on your website. This streamlined process eliminates the need for cameras, microphones, or video editing skills while delivering professional results that engage your audience.
Can I import a PowerPoint, PDF, or Word file to turn it into a Synthesia video?
Yes, Synthesia's AI video assistant can transform your existing PowerPoint presentations, PDFs, Word documents, and plain text files directly into engaging videos. Simply click 'New video' and select the file import option, then upload your document. The AI assistant automatically parses your content, converts text into natural-sounding narration, and generates on-brand scenes with appropriate pacing, transitions, and relevant visuals based on your content.
After the initial generation, you have full control to fine-tune every aspect of your video. Adjust scripts, timing, voices, avatars, and layout for each scene individually, then regenerate instantly to see your changes. This feature is particularly valuable for teams who already have training materials, presentations, or documentation they want to transform into more engaging video content without starting from scratch.
How do I choose the right AI avatar for my video, and can I create a custom avatar of myself?
Selecting the right avatar depends on your content type and audience expectations. For engaging presentations and marketing content, choose expressive AI avatars that use natural gestures and varied facial expressions to maintain viewer interest. For formal training or compliance videos where authority matters more than entertainment, professional avatars work best. You can also vary avatar framing between waist-up for introductions and chest-up for detailed explanations to add visual variety and emphasize different types of content.
If you need consistent brand representation or want executives to deliver messages personally, you can create custom avatars. Navigate to 'Avatars' then 'Create your own avatar' to choose between a 5-minute web avatar for quick needs or a studio-quality custom avatar for premium results. Custom avatars are particularly valuable for executive communications, brand consistency across video series, or when you need the same spokesperson to deliver regular updates without scheduling repeated recording sessions.
Does Synthesia support multiple languages, and how should I plan for localization?
Synthesia supports over 140 languages with AI voices that automatically match your script language, making it ideal for global teams and international audiences. When planning for localization, structure your content from the start with translation in mind by using simple, clear language that translates well across cultures and avoiding idioms or culturally specific references. The platform automatically detects your script language and suggests appropriate voices, though you should test 2-3 voice options to find the best match for your audience's preferences.
To create multilingual versions efficiently, develop one master video with your primary language, then duplicate it and replace the script with translations. This approach maintains consistent visuals and timing while allowing you to generate versions in multiple languages within minutes rather than hours. Many organizations use this capability to ensure training materials, product updates, and company communications reach their entire global workforce in their preferred language, significantly improving engagement and comprehension.
Can I try Synthesia for free before choosing a plan?
Yes, Synthesia offers a free AI video generator that lets you create and experience the platform before committing to a paid plan. You can access this by clicking 'Create free AI video' on the website, which allows you to test core features like avatar selection, script input, and basic video generation without providing credit card information. This gives you hands-on experience with the interface and helps you understand how Synthesia can meet your specific video creation needs.
The free trial is particularly useful for evaluating video quality, testing different avatars and voices, and understanding the workflow before making a purchase decision. You can create a complete video to share with stakeholders and get buy-in for larger video initiatives. Once you're ready to scale your video production with additional features like custom avatars, advanced templates, and team collaboration tools, you can explore the various pricing plans that match your organization's needs and video volume requirements.