EMO, the robotic entity, grasped how its pliable mouth would animate corresponding to its 26 facial actuators by observing its own reflection.
Subscribe to our newsletterA Robot Learns to Lip Sync – YouTube

Watch On
Could you be absolutely certain that the individual conversing with you is not, in any way, a machine? In the near future, such certainty might become elusive.
For the first instance, researchers have engineered a robot capable of articulating its mouth in a manner indistinguishable from a human. This advancement allows it to circumvent the phenomenon known as the “uncanny valley,” wherein a robot’s behaviour becomes disconcerting due to its proximity to natural human actions, yet falling just short of that naturalism.
The scientists at Columbia University achieved this breakthrough by enabling their creation, EMO, to study its own image in a mirror. It learned the mechanics of its flexible visage and synthetic lips responding to the precise commands of its 26 facial actuators, each possessing the capacity for movement across up to 10 degrees of freedom.
Their methodology was detailed in a research paper published on January 14th in the journal Science Robotics.
How EMO learned to move its face like a human
EMO employs an artificial intelligence (AI) system termed a “vision-to-action” language model (VLA). This designation signifies its ability to learn the translation of visual input into synchronized physical movements, bypassing the need for predetermined instructions. Throughout its training phase, the anthropomorphic robot generated thousands of seemingly spontaneous facial expressions and lip articulations while observing its reflection.
Subsequently, the researchers presented EMO with extensive hours of YouTube content featuring individuals speaking in various languages and singing. This exposure facilitated the robot’s ability to correlate its understanding of actuator-driven facial movements with corresponding vocalizations, irrespective of linguistic comprehension. Ultimately, EMO attained the capability to synchronize its lips with spoken audio in 10 distinct languages with remarkable precision.
“We encountered notable challenges with specific phonetic sounds, such as ‘B’, and those requiring lip pursing, like ‘W’,” stated Hod Lipson, an engineering professor and director of Columbia’s Creative Machines Lab, in a released statement. “However, these capabilities are anticipated to progress with continued development and practice.”
Numerous engineers in the field of robotics have endeavoured, without success, to construct a convincing humanoid. Therefore, prior to its public debut, EMO underwent rigorous evaluation by human observers. The researchers then presented footage of the robot speaking, generated via the VLA model, alongside two alternative methods of controlling its mouth, to 1,300 participants. A reference video demonstrating optimal lip synchronization was also provided.
The two alternative control methods included an amplitude baseline, where EMO’s lip movements were dictated by audio intensity, and a nearest-neighbor landmarks baseline, which involved mirroring facial movements observed in others that produced similar sounds. Participants were tasked with selecting the video clip exhibiting the closest adherence to ideal lip motion. The VLA model was chosen in 62.46% of instances, surpassing the 23.15% and 14.38% selection rates for the amplitude and nearest-neighbor landmarks baselines, respectively.
Robot carers will require friendly faces
Although variations exist in how individuals across different genders and cultures direct their gaze, humans generally place significant reliance on facial cues during interpersonal communication. A 2021 study employing eye-tracking technology revealed that we direct our attention to the face of our conversation partners 87% of the time, with approximately 10% to 15% of that duration specifically focused on the mouth. Further research indicates that mouth movements are so critical they can even influence auditory perception.
The researchers posit that neglecting the importance of facial expression is a contributing factor to the shortcomings of previous attempts at creating convincing robotic entities.
“A substantial portion of contemporary humanoid robotics research concentrates on limb and hand functionality, for tasks such as locomotion and object manipulation,” Lipson commented. “However, facial expressiveness is equally vital for any robotic application involving human interaction.”
With the rapid advancement of AI technology, robots are projected to assume an increasing array of roles necessitating direct human engagement, encompassing fields like education, healthcare, and elder care. Consequently, their effectiveness will be directly proportional to their capacity to emulate human facial expressions.
“Robots possessing this capability will undoubtedly exhibit a superior capacity for establishing rapport with humans, given that a substantial segment of our communication relies on facial body language, a channel that remains largely unexplored,” remarked Yuhang Hu, the study’s lead author, in the press release.
However, his team is not the sole entity engaged in efforts to imbue humanoid robots with greater lifelikeness. In October 2025, a Chinese enterprise disclosed a video showcasing an unnervingly realistic robotic head, developed as part of their initiative to enhance the naturalness of human-robot interactions. The preceding year, a Japanese group unveiled an artificial self-healing skin designed to lend a more human appearance to robot faces.
Sourse: www.livescience.com
