Synthesia’s AI clones are more expressive than ever. Soon they’ll be able to talk back.

Curated from MIT Technology Review — Here’s what matters right now:

Earlier this summer, I walked through the glassy lobby of a fancy office in London, into an elevator, and then along a corridor into a clean, carpeted room. Natural light flooded in through its windows, and a large pair of umbrella-like lighting rigs made the room even brighter. I tried not to squint as I took my place in front of a tripod equipped with a large camera and a laptop displaying an autocue. I took a deep breath and started to read out the script. I’m not a newsreader or an actor auditioning for a movie—I was visiting the AI company Synthesia to give it what it needed to create a hyperrealistic AI-generated avatar of me. The company’s avatars are a decent barometer of just how dizzying progress has been in AI over the past few years, so I was curious just how accurately its latest AI model, introduced last month, could replicate me.  When Synthesia launched in 2017, its primary purpose was to match AI versions of real human faces—for example, the former footballer David Beckham —with dubbed voices speaking in different languages. A few years later, in 2020, it started giving the companies that signed up for its services the opportunity to make professional-level presentation videos starring either AI versions of staff members or consenting actors. But the technology wasn’t perfect. The avatars’ body movements could be jerky and unnatural, their accents sometimes slipped, and the emotions indicated by their voices didn’t always match their facial expressions. Now Synthesia’s avatars have been updated with more natural mannerisms and movements, as well as expressive voices that better preserve the speaker’s accent—making them appear more humanlike than ever before. For Synthesia’s corporate clients, these avatars will make for slicker presenters of financial results, internal communications, or staff training videos. I found the video demonstrating my avatar as unnerving as it is technically impressive. It’s slick enough to pass as a high-definition recording of a chirpy corporate speech, and if you didn’t know me, you’d probably think that’s exactly what it was. This demonstration shows how much harder it’s becoming to distinguish the artificial from the real. And before long, these avatars will even be able to talk back to us. But how much better can they get? And what might interacting with AI clones do to us?   The creation process When my former colleague Melissa visited Synthesia’s London studio to create an avatar of herself last year , she had to go through a long process of calibrating the system, reading out a script in different emotional states, and mouthing the sounds needed to help her avatar form vowels and consonants. As I stand in the brightly lit room 15 months later, I’m relieved to hear that the creation process has been significantly streamlined. Josh Baker-Mendoza, Synthesia’s technical supervisor, encourages me to gesture and move my hands as I would during natural conversation, while simultaneously

Next step: Stay ahead with trusted tech. See our store for scanners, detectors, and privacy-first accessories.

Original reporting: MIT Technology Review

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.