Vittorio Ferrari joins Synthesia’s R&D team

Published on
May 17, 2024
Table of contents

Turn your texts, PPTs, PDFs or URLs to video - in minutes.

Learn more

One thing we are extremely proud of here at Synthesia is our R&D team, and we are delighted to announce that it has recently expanded with the addition of another exceptional member.

Prior to joining Synthesia, Vittorio served as Principal Scientist at Google, where he led multiple research groups focused on computer vision, machine learning, and elements of natural language processing. Previously, he built and led research groups at ETH Zurich and the University of Edinburgh. His work is regularly published in top-tier scientific publications, and he serves as an Associate Editor of the International Journal of Computer Vision.

We are thrilled that Vittorio has chosen to continue his outstanding work as Director of Science at Synthesia.

In the interview below, you can learn more about Vittorio's remarkable contributions, his role at Synthesia, and his vision for AI video creation

But first, here’s a short lightning round with Vittorio 👇

✨ Can you tell us a fun fact about yourself?
Many people think that Ferrari is some kind of noble name, due to the association with the car maker. However, it just means "Smith", literally!

✨ What are your favourite hobbies outside of work?
I play competitive table tennis, love board games and video games.

✨ What are 3 associations that come to your mind when you hear the term "AI video"?
Impressive, impactful, and fun!

On to more serious stuff – can you tell us a bit about your professional background? 

I have been leading research groups for the past 15 years and I worked on many topics in computer vision and machine learning throughout my career. In the last few years the accent was on 3D deep learning, vision+language models, transfer learning, and human-machine collaboration for annotation. I am most excited by the kick of discovery: that moment when, after formulating an idea and carrying out experiments, you look at all the data, see a pattern emerge and eventually understand a phenomenon. I also enjoy the leadership aspects, especially helping younger researchers grow towards achieving their full potential, and leading complex projects involving several moving parts. 

Why Synthesia and why now?

Synthesia is an exciting place to be! The tech is very timely, extremely well executed, and fulfills a real need. The team is very dynamic, motivated, and right now the company is in a great position for further growth. Finally, for me personally, I feel it is a time to start a new adventure.

The AI Golden Age is Already Here (w/ Vittorio Ferrari)

Can you tell us more about your role at Synthesia?

I am hired as Director of Science and I will lead a large portion of Synthesia’s R&D. The role includes tracing a roadmap for the technical development of Synthesia in the future, further strengthening synergy between different teams, and ultimately enabling new functionalities to keep the product exciting and ahead of the competition.

What is your vision for Synthesia and AI video in general?

I hope we will achieve the ability to generate general videos with many features that unlock creative ideas, and eventually reach the point of enabling fully flagged storytelling. For example, full-body avatars with hands and gestures, avatars interacting with objects and other avatars, embed the avatars in complex scenes, enable moving the camera, place sophisticated lighting, and more emotional voices. Importantly, all of this content should be specified purely in natural language. Finally, pushing even further, the AI could make suggestions for the content, beyond just performing rendering. 

Eventually the video creation process should become an interactive experience where an artist and a machine work together to realize the artist’s vision. As to your last question, I think the AI video industry will race towards increasing the functionalities their products can offer, while keeping the image quality bar high.

(When) will it be possible to create Hollywood-quality content from a browser? 👀

There are many challenges. One is to strike a good trade-off between quality and generality. For example, if you allow completely free camera movement, then it is likely that for some scenes or some viewpoints the rendering quality will be rather low. Another challenge is editability: not only generating a fresh new video, but also altering an existing video (for example, replacing that car with a Ferrari, or adding a Coca-Cola bottle on that table). Editability requires strong scene understanding, in addition to generative abilities. And many more challenges: dealing with complex lighting, keeping latency in reasonable range, staying within the bounds of regulations.

Given the thrilling challenges in AI research, now is a perfect time to join an AI startup. Do you agree?

We are in a golden age for AI, with a rare confluence of technical abilities which can provide real value to users. An AI startup (or better ‘scale-up’ in the case of Synthesia) can move fast and has the agility necessary to have a direct pathway from ideas to product.

At Synthesia, our mission is crystal clear: to make video easy for everyone. We're excited about the future possibilities and remain at the forefront of AI video with our exceptional team.

You can learn more about AI research at Synthesia here, or explore our current job openings.


Frequently asked questions