REALISTIC FACIAL ANIMATION: THE LATEST TOOLS, TECHNIQUES AND CHALLENGES

By OLIVER WEBB

Timothée Chalamet and mini Hugh Grant in Wonka. When it came to generating a CG version of an actor as recognizable as Hugh Grant for Wonka, Framestore turned to sculptor and facial modeler Gabor Foner to better understand the quirks of muscular activation in the actor’s face, eventually developing a formula forre-creating facial performances. (Image courtesy of Warner Bros. Pictures) — Timothée Chalamet and mini Hugh Grant in *Wonka*. When it came to generating a CG version of an actor as recognizable as Hugh Grant for Wonka, Framestore turned to sculptor and facial modeler Gabor Foner to better understand the quirks of muscular activation in the actor’s face, eventually developing a formula for
re-creating facial performances. (Image courtesy of Warner Bros. Pictures)

Vicon’s CaraPost single camera tracking system tracks points using only a single camera. Single-camera tracking works automatically when only one camera can see a point. If a point becomes visible again in two cameras, the tracking reverts to multi-camera tracking. (Image courtesy of Vicon Motion Systems Ltd. UK)

Realistic facial animation remains a cornerstone of visual effects, enabling filmmakers to create compelling characters and immersive storytelling experiences. Facial animation has come a long way since the days of animatronics, as filmmakers now have access to a choice of advanced facial motion capture systems. Technologies such as performance capture and motion tracking, generative AI and new facial animation software have played a central role in the advancement of realistic facial animation. World-leading animation studios are utilizing these tools and technologies to create more realistic content and breaking new ground in the way characters are depicted.

A wide range of facial capture technology is in use today. One of the pioneers of facial animation technology is Vicon’s motion capture system, which was vital in its use in films such as Avatar and The Lord of the Rings trilogy. Faceware is another leading system that has been used in various films, including Dungeons & Dragons: Honor Among Thieves, Doctor Strange in the Multiverse of Madness and Godzilla vs. Kong, as well as games such as Hogwarts Legacy and EA Sports FC 24. ILM relies on several systems, including the Academy Award-winning Medusa, which has been a cornerstone of ILM’s digital character realization. ILM also pioneered the Flux system, which was created for Martin Scorsese’s The Irishman, as well as the Anyma Performance Capture system, which was developed with Disney Research Studios. DNEG, on the other hand, uses a variety of motion capture options depending on the need. Primarily, DNEG uses a FACS-based system to plan, record and control the data on the animation rigs. “However, we try to be as flexible as possible in what method we use to capture the data from the actors, as sometimes we could be using client vendors or client methods,” says Robyn Luckham, DNEG Animation Director and Global Head of Animation.

ILM’s Flux system, which was created for Martin Scorsese’s The Irishman, allows filmmakers to capture facial data on set without the need for traditional head-mounted cameras on the actor. (Image courtesy of ILM) — ILM’s Flux system, which was created for Martin Scorsese’s *The Irishman*, allows filmmakers to capture facial data on set without the need for traditional head-mounted cameras on the actor. (Image courtesy of ILM)

Developing a collaborative working understanding of the human aspects, nuances and complexities and the technology was particularly challenging for Framestore on 2023’s Wonka. “Part of that was building an animation team that had a good knowledge of how FACS [Facial Action Coding System] works,” remarks Dale Newton, Animation Supervisor at Framestore. “While as humans we all have the same facial anatomy, everyone’s face moves in different ways, and we all have unique mannerisms. When it came to generating a CG version of an actor as recognizable as Hugh Grant, that raised the bar very high for us. At the core of the team was sculptor and facial modeler Gabor Foner, who helped us to really understand the quirks of muscular activation in Hugh’s face, [such as] what different muscle combinations worked together with what intensities to use for any particular expression. We ended up with a set of ingredients, recipes if you like, to re-create any particular facial performance.”

Masquerade3 represents the next level of facial capture technology at Digital Domain. “This latest version brings a revolution to facial capture by allowing markerless facial capture without compromising on quality,” Digital Domain VFX Supervisor Jan Philip Cramer explains. “In fact, it often exceeds previous standards. Originally, Masquerade3 was developed to capture every detail of an actor’s face without the constraints of facial markers. Utilizing state-of-the-art machine learning, it captures intricate details like skin texture and wrinkle dynamics. We showcased its outstanding quality through the creation of iconic characters such as Thanos and She-Hulk for Marvel. Eliminating the need for markers is a natural and transformative progression, further enhancing our ability to deliver unmatched realism. To give an example of the impact of this update; normally, the CG actor has to arrive two hours early on set to get the markers applied. After each meal, they have to be reapplied or fixed. The use of COVID masks has made this issue infinitely worse. On She-Hulk, each day seemed to have a new marker set due to pandemic restrictions, and that caused more hold-ups on our end. So, we knew removing the markers would make a sizeable impact on production.”

Vicon’s Cara facial motion capture system.(Image courtesy of Vicon Motion Systems Ltd. UK) — Vicon’s Cara facial motion capture system. (Image courtesy of Vicon Motion Systems Ltd. UK)

Faceware’s Mark IV Wireless Headcam System for facial capture. (Image courtesy of Faceware Technologies Inc.)

ILM relies on the Academy Award-winning Medusa system, a cornerstone of ILM’s digital character realization, the Anyma Performance Capture system, which was also developed with Disney Research Studios, and the Flux on-set system. (Image courtesy of ILM)

Wētā FX’s FACET system was developed primarily for Avatar, where it provided input to the virtual production technology. Other major facial capture projects include The Hobbit trilogy and the Planet of the Apes trilogy. (Image courtesy of Wētā FX)

Ensuring that the emotions and personalities of each character are accurately conveyed is critical when it comes to mastering realistic facial animation. “The process would loosely consist of capture, compare, review and adjust,” Luckham explains. “It would be a combination of the accuracy of the data capture, the amount of adjustments we would need to make in review of the motion capture data against the actor’s performance and then the animation of the character face rig against the actor’s performance, once that data is put onto the necessary creature/character for which it is intended. Once we have done as much as we can in motion capture, motion editing and animation, we would then go into Creature CFX – for the flesh of the face, skin folds, how the wrinkles would express emotion, how the blood flow on a face would express color in certain emotions – to again push it as close as we can to the performance that the actor gave. After that would be lighting, which is a huge part of getting the result of facial animation and should never be overlooked. If any one of these stages are not respected, it is very easy to not hit the acting notes and realism that is needed for a character.”

For Digital Domain, the most important aspect of the process is combining the actor with the CG asset. “You want to ensure all signature wrinkles match between them. Any oddity or unique feature of the actor should be translated into the CG version,” Cramer notes. “All Thanos’ wrinkles are grounded in the actor Josh Brolin. These come to life especially in his expressions, as the wrinkle lines created during a smile or frown exactly match the actor. In addition, you don’t want the actor to feel too restricted during a capture session. You want them to come across as natural as possible. Once we showed Josh that every nuance of his performance comes through, he completely changed his approach to the character. Rather than over-enunciating and overacting, he underplayed Thanos and created this fantastic, stoic character. This is only possible if the actor understands and trusts that we are capturing the essence of his performance to the pixel.”

On Disney’s Peter Pan & Wendy, Framestore’s facial animation for Tinker Bell was generated through a mixture of facial capture elements using a head-mounted camera worn by the actress Yara Shahidi, which then underwent a facial solve involving deep learning algorithms trained to translate her facial motion onto Framestore’s CG facial rig.(Image courtesy of Walt Disney+) — On Disney’s Peter Pan & Wendy, Framestore’s facial animation for Tinker Bell was generated through a mixture of facial capture elements using a head-mounted camera worn by the actress Yara Shahidi, which then underwent a facial solve involving deep learning algorithms trained to translate her facial motion onto Framestore’s CG facial rig. (Image courtesy of Walt Disney+)

Performance capture is a critical aspect of realistic facial animation based on human performance. “Since Avatar, it has been firmly established as the go-to setup for realistic characters,” Cramer adds. “However, there are unique cases where one would go a different route. On Morbius, for instance, most faces were fully keyframed with the help of HMCs [head-mounted cameras], as they had to perfectly match the actor’s face to allow for CG transitions. In addition, some characters might need a more animated approach to achieve a stylistic look. But all that said, animation is still needed. We get much closer to the final result with Masquerade3, but it’s important to add artistic input to the process. The animators make sure the performance reads best to a given camera and can alter the performance to avoid costly reshoots.”

Masquerade3 was developed by Digital Domain to capture every detail of an actor’s face without the constraints of facial markers, as showcased through the creation of iconic characters such as Thanos from the Avengers films. All Thanos’ wrinkles are grounded in the actor Josh Brolin. (Image courtesy of Marvel) — Masquerade3 was developed by Digital Domain to capture every detail of an actor’s face without the constraints of facial markers, as showcased through the creation of iconic characters such as Thanos from the *Avengers* films. All Thanos’ wrinkles are grounded in the actor Josh Brolin. (Image courtesy of Marvel)

For Framestore’s work on Disney’s Peter Pan & Wendy, the facial animation for Tinker Bell was generated through a mixture of facial capture elements using a head-mounted camera worn by the actress Yara Shahidi. “This, then, underwent a facial ‘solve,’ which involves training deep learning algorithms to translate her facial motion onto our facial rigs,” Newton says. “The level of motion achieved by these solves required experienced animators to tighten up and finesse the animation in order to achieve the quality level for VFX film production. In contrast, performance capture on Wonka meant that we had good visual reference for the animators working on the Oompa Loompa as voiced by Hugh Grant. Working with Hugh and [co-writer/director] Paul King, we captured not only Hugh’s performance in the ADR sessions but also preparatory captures, which allowed us to isolate face shapes and begin the asset build. We had a main ARRI Alexa camera set up that he performed towards. Additionally, we had a head-mounted 1k infrared camera that captured his face and a couple of Canon 4k cameras on either side of him that captured his body movements.”

Masquerade3 represents the next level of facial capture technology at Digital Domain by allowing markerless facial capture while maintaining high quality. (Image courtesy of Digital Domain)

YouTube creators of “The Good Times are Killing Me” experiment with face capture for their custom characters. Rokoko Face Capture for iOS captures quality facial expressions on the fly. It can be used on its own or alongside Smartsuit Pro and Smartgloves for full-body motion capture.(Image courtesy of Rokoko) — YouTube creators of “The Good Times are Killing Me” experiment with face capture for their custom characters. Rokoko Face Capture for iOS captures quality facial expressions on the fly. It can be used on its own or alongside Smartsuit Pro and Smartgloves for full-body motion capture. (Image courtesy of Rokoko)

Rokoko Creative Director Sam Lazarus playing with Unreal’s MetaHuman Animator while using the Headrig for iPhone face capture.(Image courtesy of Rokoko) — Rokoko Creative Director Sam Lazarus playing with Unreal’s MetaHuman Animator while using the Headrig for iPhone face capture. (Image courtesy of Rokoko)

According to Oliver James, Chief Scientist at DNEG, generative AI systems, which can generate data resembling that which they were trained on, have huge potential applications in animation. “They also have the potential to create huge legal and ethical problems, so they need to be used responsibly,” James argues. “Applied to traditional animation methods, it’s possible to generate animation curves which replicate the style of an individual, but can be directed at a high level. So instead of having to animate the motion of every joint in a character over time, we could just specify an overall motion path and allow an AI system, trained on real data, to fill in the details and generate a realistic full-body animation. These same ideas can be applied to facial motion too, and a system could synthesize animation that replicated the mannerisms of an individual from just a high-level guide. Face-swapping technology allows us to bypass several steps in traditional content creation, and we can produce photorealistic renderings of new characters driven directly from video. These techniques are typically limited by the availability of good example data to train the networks on, but this is being actively tackled by current research, and we’re already starting to see convincing renders based on just a single reference image.”

Newton suggests that given how finely tuned performances in film VFX are today, it will take some time for AI systems to become useful in dealing with more than the simplest animation blocking. “A personal view on how generative AI is developing these days – some companies create software that seems to want to replace the artist. A healthier attitude, one that protects the artists and, thereby, also the business we work in at large, is to focus AI development on the boring and repetitive tasks, leaving the artist time to concentrate on facets of the work that require aesthetic and creative input. It seems to me a safer gamble that future creative industries should have artists and writers at their core, rather than machines,” Newton says.

For Digital Domain, the focus has always been to marry artistry with technology and make the impossible possible. “There is no doubt that generative AI will be here to stay and utilized everywhere; we just need to make sure to keep a balance,” Cramer adds. “I hope we keep giving artists the best possible tools to make amazing content. Gen AI should be part of those tools. However, I sure hope gen AI will not be utilized to replace creative steps but rather to improve them. If someone with an artistic background can make fancy pictures, imagine how much better an amazing artist can utilize those.”

There has been a surge in new technologies over the past few years that have drastically helped to improve realistic facial animation. “The reduced cost and complexity of capturing and processing high resolution, high frame rate and multi-view video of a performance have made it easier to capture a facial performance with incredible detail and fidelity,” James says. “Advances in machine learning have made it possible to use this type of capture to build facial rigs that are more expressive and lifelike than previous methods. Similar technology allows these rigs to perform in real-time, which improves the experience for animators; they can work more interactively with the rig and iterate more quickly over ideas. Real-time rendering from game engines allows animators to see their work in context: they can see how a shadow might affect the perception of an expression and factor that into their work more effectively. The overall trend is away from hand-tuned, hand-sculpted rigs, and towards real-time, data-driven approaches.”

Cramer supports the view that both machine learning and AI have had a serious impact on facial animation. “We use AI for high-end 1:1 facial animation, let’s say, for stunts. This allows for face swapping at the highest level and improves our lookdev for our 3D renders. In addition, we can control and animate the performance. On the 3D side with Masquerade3, we use machine learning to generate a 4D-like face mask per shot. Many aspects of our pipeline now utilize little training models to help streamline our workflow and make better creative decisions.”

On Peter Pan & Wendy, Framestore relied on advanced facial technology to capture Tinker Bell. “We worked with Tinker Bell actress Yara Shahidi who performed the full range of FACS units using OTOY’s ICT scanning booth. The captured data was solved onto our CG facial rig via a workflow developed using a computer vision and machine learning-based tracking and performance retargeting tool,” Newton details. “This created a version of the facial animation for Tinker Bell the animators could build into their scenes. This animation derived from the solve required tightening up and refinement from the animators, which was layered on top in a non-destructive way. This workflow suited this show, as the director wanted to keep Yara’s facial performance as it was recorded on the CG character in the film. Here, the technology was very useful in reducing the amount of the time it might have taken to animate the facials for particular shots otherwise.”

Concludes Luckham. “For facial capture specifically, I would say new technologies have generally improved realism, but mostly indirectly. It’s not the capturing of the data literally, but more how easy we can make it for the actor to give a better performance. I think markerless and camera-less data-capturing is the biggest improvement for actors and performances over technology improvements. Being able to capture them live on set rather than on a separate stage, adds to the filmmaking process and the involvement of the production. Still, at the moment, I think the more intrusive facial cameras and stage-based capturing does get the better results. Personally I would like to see facial capture become a part of the on-set production, as the acting you would get from it would be better. Better for the actor, better for the director and better for the film as a whole.”

DNEG primarily uses a FACS (Facial Action Coding System) based system to plan, record and control the data on the animation rigs, but tries to be as flexible as possible in what method they use to capture the data from the actors, as sometimes they could be using client vendors or client methods. (Images courtesy of DNEG)

Creating Ariel’s digital double for The Little Mermaid was one of the most complicated assets Framestore ever built. It involved replacing all of Halle Bailey’s body, at times also her face, into her mermaid form. (Image courtesy of Walt Disney Studios) — Creating Ariel’s digital double for *The Little Mermaid* was one of the most complicated assets Framestore ever built. It involved replacing all of Halle Bailey’s body, at times also her face, into her mermaid form. (Image courtesy of Walt Disney Studios)

On Morbius, where Masquerade3 was used for facial capture, most faces were fully keyframed with the help of HMCs, as they had to perfectly match the actor’s face to allow for CG transitions. (Image courtesy of Digital Domain and Columbia Pictures/Sony) — On *Morbius*, where Masquerade3 was used for facial capture, most faces were fully keyframed with the help of HMCs, as they had to perfectly match the actor’s face to allow for CG transitions. (Image courtesy of Digital Domain and Columbia Pictures/Sony)

REALISTIC FACIAL ANIMATION: THE LATEST TOOLS, TECHNIQUES AND CHALLENGES

You May Also Like:

REALIZING THE SKY-HIGH POTENTIAL OF THE CLOUD

THE EXPANDING HORIZONS OF MOTION CAPTURE

HOW TO SIMULATE A WATERFALL IN HOUDINI AND MAYA BIFROST