By BARBARA ROBERTSON
By BARBARA ROBERTSON
Actors performing while balancing head helmets fitted with cameras pointed at their faces may be an odd sight, but it’s a common one on film productions when those actors play characters that become CG.
The motion-capture gear allows the “CG characters” to interact with live-action actors on set, directors to direct them, the director of photography to light them, and camera operators to frame them. It has helped make the integration of CG characters into live-action films seamless, and it’s pushed forward the path toward creating believable digital humans. But it’s still awkward.
Moving one step closer to the point when actors don’t have to wear silly pajamas and head cameras, ILM and Disney Research have worked together to make a markerless, high-fidelity, performance-capture system production-worthy. First prototyped in 2015 by Disney Research, the system they call Anyma has evolved, but had not been used for a film until ILM implemented it to create “Smart Hulk” in Avengers: Endgame.
“I think it’s a revolution,” says Thabo Beeler, principal research scientist at Disney Research. “You can capture facial performances and preserve all the skin sliding with witness cameras. It gives the actor freedom to move around.”
“It’s the next evolution in getting closer to mixing digital and actor performances,” says Russell Earl, Visual Effects Supervisor at ILM on Avengers: Endgame. “The fidelity is pretty amazing.”
You can see the result on CG Hulk’s face as the character looks and performs with the sensitivity and humor of actor Mark Ruffalo. ILM created 80% of the Smart Hulk shots according to the studio, using the Anyma technology for the first time (Framestore did the hangar sequence for the time-travel testing). Doing so meant developing and modifying in-house tools during postproduction to accommodate Anyma’s high-fidelity data.
“It was a great leap of faith,” Earl says, “but when we first saw the data, it was ‘Oh my gosh, this is great.’”
To set the stage for Anyma, the team first did a facial scan of Ruffalo using Disney Research’s Medusa system to measure and describe his face. Unlike other performance-capture systems, Anyma needs only about 20 shapes. Not FACS-based phonemes and expressions that try to activate muscles but, instead, a few extreme positions – for example, everything lifted, or everything compressed, or both eyebrows at once.
The system used those shapes from Medusa to automatically build a digital puppet that could be driven by Anyma. That is, the scanned data was integrated and fit to an underlying skull. The system fit the skull, jaw and eyes to the Ruffalo’s Medusa scan, based on forensic measurements of a typical male his age with his BMI [Body Mass Index] to create a digital puppet.
“I think it’s a revolution. You can capture facial performances and preserve all the skin sliding with witness cameras. It gives the actor freedom to move around.”
—Thabo Beeler, Principal Research Scientist, Disney Research
“It doesn’t have to be anatomically correct,” Beeler says. “It just provides constraints for later on. The specialty of this puppet is that it has a notion of the underlying skull and the skin thickness. The skin thickness measurements indicate where the skull could be. Same for the jaw and other boney structures. Looking at just the skin is limiting – it doesn’t do well in performance capture.”
That insight – thinking of the face as an anatomical structure rather than a shell – is one part of Anyma’s success.
“The secret sauce is the anatomically-inspired model,” Beeler says. “Not anatomical muscle simulation. Anyma has a notion of the underlying bone structure and the tissue between the skin and bone. It’s data driven.”
Part two of the secret sauce is that the researchers separated deformation and transformation – that is, they separated motion from deformation. They did this by dividing the face into small patches of approximately one-by-one centimeter.
“If you look at a small patch on a face, it doesn’t do all that much,” Beeler says. “It stretches a bit (and) forms wrinkles and patterns. As a whole, the face can do much more; a small patch is not that complex. So Anyma chops up the face into a thousand small patches. Then, for each of those patches it builds a deformation space. And it learns how that patch can be formed from the input shapes.”
The problem with that approach is the resulting digital face could do anything – rip apart, blow up. To avoid that, the second thing Anyma learns is the relationship of the tiny skin patches to the underlying bone structure. In other words, how the eyebrow skin, for example, relates to the bone.
“If you take those two things – the patches and the relationship of the bone to the skin surface – you can put them together into a face that is realistic,” Beeler says. “You can create a digital model that preserves the anatomical plausibility of a face. You can also meaningfully extrapolate what a particular face can do from the shapes you feed into the system.”
Although the team had captured Ruffalo’s performance using head cameras, he could have been photographed with a single witness camera. The system creates the digital facial shape, synthesizes an image from that estimate, and optimizes the difference between it and the actor’s face using optical flow until there is a good convergence. The result is an accurate per-frame mesh of high-resolution geometry, a frame-by-frame reproduction of Ruffalo’s facial performance.
“But it is not hooked up in a way that an animator can play with it,” Beeler says. “We know where each point moves over time, but it’s difficult to put that onto a rig.”
That’s where ILM comes into the picture.
“The Anyma solve provided us with an accurate mesh of Mark Ruffalo’s face moving,” Earl says. “We had to take that mesh and apply it to Hulk’s face. We had to put the output of Anyma into a useful state. We knew we had to raise the bar to make sure the results were as accurate and highly detailed as we could get. So we rebuilt our entire system of in-house retargeting.”
In the past, the artists running the retargeting solvers might use corrective shapes to adjust the result of captured data transferred from a digital model of an actor’s face onto the targeted CG character’s face.
“But when you correct subjectively, you get to the point where a character like Smart Hulk doesn’t look like him anymore,” Earl says. “You can get off model quickly. What we were trying to do with the new retargeting tool was get as close as we could to a clean solve.”
Modelers had built Smart Hulk after looking at the scan data for Ruffalo, the shapes generated for Medusa for the facial library, and artwork for Smart Hulk. The retargeting system worked to apply data to that model in somewhat the same way as Anyma.
“It’s the next evolution in getting closer to mixing digital and actor performances. The fidelity is pretty amazing.”
—Russell Earl, Visual Effects Supervisor, ILM
“I think we’ll get to the point where actors don’t have to wear silly pajamas and head cameras. It’s coming.”
—Russell Earl, Visual Effects Supervisor, ILM
“It looks at the mesh, applies it to Smart Hulk’s facial library broken into shapes similar to Ruffalo, and tries to hit the same deformations on Ruffalo,” Earl says. “It’s a learning system. It looks at the Hulk face library, at what curves it needs to drive those shapes.”
The goal for the retargeting system and the artists using the system is to produce Maya scene files for animators to use. “We get controls at a gross level and then more finite detail,” Earl says. “That finite detail is what gives us a better result. In the past, because we didn’t have that finite detail, we had to go outside and drive a change with shapes. We’d loose nuance and detail. Now, we keep as much solve data as we can every step of the way. We don’t get rid of it; we creatively adjust it. We can dial it in and out. It is the perfect combination.”
In addition, animators had deformers that could add subtleties to the facial performance – for sticky lips, a little more skin slide, pucker, and so forth.
“It was late in the show when we had all the pieces together,” Earl says. “We tried to use the system as we made the changes, and ended up solving shots multiple times. But by the time we got the last shots, everything was in place. We had a system built and running smoothly. We could do a thousand shots. And the show came to an end.”
Even though Anyma did its calculations in parallel – 1,000 cores could solve 1,000 frames – at first the solves needed to run overnight. Initially, retargeting took days – the crew was rebuilding the system during production – but Anyma got faster, retargeting took only a day, and the system worked. It was worth it.
“When we first saw the result we thought, ‘OK, we’ve made the right choice,’” Earl says. “We knew Smart Hulk could be onscreen with close ups, in daylight, and in scenes with other actors, and he would have to hold up to that scrutiny. The key thing for us was making sure we could capture Mark Ruffalo’s facial performance with his body – that it didn’t require a separate ADR later. We had him on set with the other actors and captured that natural performance where he’s there in the moment.”
Earl adds, “I think we’ll get to the point where actors don’t have to wear silly pajamas and head cameras. It’s coming.”