Real-Time CG Humans Hit the Big-Time

By IAN FAILES

Epic Games’ Unreal Engine was used in the Siren project to craft a photoreal digital actor based off of a live actor performance. (Image courtesy of Epic Games)

Just as photoreal CG humans in film and television garner much attention, so too in recent times has the proliferation of real-time CG humans. These were once almost the sole domain of games, where they certainly still exist either in pre-rendered scenes or in fully interactive real-time rendered gameplay.

But with game engines and performance capture tools being used more frequently in cinematic productions and in non-game experiences, real-time CG humans are finding their place elsewhere as well.

Synthetic humans now tend to appear in shorts, trailers, AR and VR experiences, on-stage events, demonstrations and applications. VFX Voice checked in with a range of game-engine companies, studios and academic researchers for a wide overview of where things are with real-time CG humans.

Actor Andy Serkis was replicated using an Unreal Engine workflow. (Image courtesy of Epic Games)

A WIDE RANGE OF PROJECTS IN UNREAL

Epic Games’ Unreal Engine lies behind many of the real-time digital humans in existence, in terms of projects emanating from Epic itself, through partners or by outside studios using Unreal. Last year, Epic also acquired 3Lateral, a leader in digital human tools.

The result is an ecosystem casting a wide net over real-time humans, as Epic Games Lead Tech Animator Chris Evans describes: “From the Senua demo, a cut scene featuring both an actor and director in front of thousands, orchestrated in under five minutes to ‘Meet Mike,’ a VR project where a digital host interviewed many industry pioneers in a shared virtual space, to Siren, a high-fidelity character that interacted live with an audience and a virtual version of actor/director Andy Serkis.”

In working within the realm of real-time digital humans, Evans says that 3Lateral, in particular, has been concentrating heavily on the issue of bringing creations out of the so-called uncanny valley. “Our team at 3Lateral have dedicated themselves to solving this problem for a number of years, and are relentlessly driven to not only capture accurate human likenesses, but to do so with all the subtleties and nuances required to craft believable, real-time digital humans.”

Epic’s aim is to enable real-time digital humans for interaction and reaction with users, something of course not possible with pre-rendered characters. “This level of versatility opens many more uses for digital characters, whether it’s in healthcare, training, simulation, entertainment and beyond,” suggests Evans. “Looking forward, one area that we often discuss internally is the future of user-driven digital personas existing and interacting within shared digital environments – commonly dubbed as ‘the metaverse.’”

**The central character in Unity’s The Heretic seen in the Unity user interface. (Image of courtesy Unity Technologies)**

UNITY’S CINEMATIC HUMANS

Unity Technologies, the makers of the Unity game engine, have also jumped deep into digital humans, especially via their own short film projects, including The Heretic. To make the digital character for this short, Unity’s Demo Team took advantage of various off-the-shelf 3D and 4D scanning services, as well as rigging and motion-capture solutions.

“Creating a digital human for a real-time project has already been done in several AAA games, and specifically a lot of innovative work has happened in the creation of game cinematics,” notes Unity producer Silvia Rasheva. “For The Heretic, we wanted to raise the quality above what we’ve seen so far in the area.”

The team relied upon Unity’s High Definition Render Pipeline (HDRP) for rendering. “While HDRP was still in preview when we started planning and actually building the character in Unity, it already provided a lot of what we needed in terms of core rendering features,” notes Unity Senior Software Engineer Lasse Jon Fuglsang Pedersen. “For instance, we knew that we needed a proven solution for subsurface scattering, and HDRP by then already had a real-time implementation of the Disney Burley model that we were able to use directly for subsurface scattering on skin, eyes and teeth, with different diffusion profiles.

“For our next step on this journey,” continues Pedersen, “we would like to increase the fidelity. For example, based on the quality of the 4D data that we are now getting from our vendors, we have observed that the resolution of our neutral is a bit too low to reproduce some of the smallest details in the actor’s performance. And in terms of data, Unity can handle more than we threw at it this time. So for our next project we will focus a bit on optimization, and we will be less conservative with our data budgets, and we will see where that takes us.”

4D scan data for the Unity digital human. (Image courtesy of Unity Technologies)

**A final render of The Heretic character. (Image courtesy of Unity Technologies)**

Digital Domain’s Doug Roble as his real-time DigiDoug alter ego. (Image courtesy of Digital Domain)

Doug Roble is captured in the Light Stage to acquire his likeness for DigiDoug. (Image courtesy of Digital Domain)

Elbor is a digital character created by Roble, who morphed into Elbor (Roble’s name backwards) during a real-time TED talk event in 2019. (Image courtesy of Digital Domain)

DIGITAL DOMAIN KNOWS HUMANS

VFX studio Digital Domain works in the area of both pre-rendered digital humans and real-time ones. Senior Director of Software R&D Doug Roble famously donned an Xsens suit and helmet-mounted camera for a TED Talk featuring himself and his real-time rendered alter ego DigiDoug. Since then, the studio has continued to extend its research in the area.

“Since TED,” says Roble, “we have been able to greatly improve the quality of the facial capture and performance by running massive hyperparameter searches where we automate the process of generating and testing thousands of different deep-learning capture models. Through this technique we can create new mathematical models that more accurately capture the subtle motions in a face and represent them as moving geometry.” Other aspects, such as having someone else other than Roble drive DigiDoug and making the character autonomous, are underway. The studio has also been able to take advantage of real-time ray-traced rendering in Unreal Engine using NVIDIA’s GPUs to render DigiDoug. They are looking beyond standard rendering techniques, too. “We’re currently developing a whole new way of rendering the character – a neural rendering technique – that takes the realism of DigiDoug to a whole new level,” adds Roble. “In this new deep-learning technique, we are teaching a renderer how to produce more realistic versions of Doug’s face without having to do all the hard work of painstakingly re-creating all the details.”

Digital Domain’s real-time digital human technology has been adapted for several projects. It was used to build a real-time Pikachu that was driven by Ryan Reynolds as part of the Detective Pikachu publicity run. “We are also using the same technologies to generate super high-fidelity digital human characters for the next generation of gaming platforms,” states Digital Domain’s Darren Hendler, Director of the Digital Human Group at the studio. “Not only are these using our newer capture systems, but they are also utilizing new real-time facial-rigging platforms that we have developed. Recently, we also created a real-time digital version of Martin Luther King Jr. for a high-profile historical project for TIME.”

“We are also using the same technologies to generate super high-fidelity digital human characters for the next generation of gaming platforms. Not only are these using our newer capture systems, but they are also utilizing new real-time facial-rigging platforms that we have developed. Recently, we also created a real-time digital version of Martin Luther King Jr. for a high-profile historical project for TIME.”

—Darren Hendler, Director of the Digital Human Group, Digital Domain

K-pop band member Akali makes an appearance at the League of Legends Pro League finals, care of Cubic Motion’s Persona system. (Image courtesy of Cubic Motion)

DIGITAL HUMANS ON STAGE

Another avenue for CG humans has been live broadcasts. For example, at China’s League of Legends Pro League (LPL) finals – an esport event showcasing Riot Games’ League of Legends – a band member of the fictional K-pop band K/DA was shown during a live broadcast both dancing and being interviewed in real-time. This band member, Akali, came about via Cubic Motion’s Persona system.

“Using the latest in computer vision, Persona reads, records and translates an actor’s performance onto their digital counterpart in real-time,” describes Cubic Motion Product Manager Tony Lacey. “Designed from the ground up for live performance, Persona enables immediate character animation in game engines such as Unreal Engine 4.”

A game rig of Akali was retrofitted for the performance and enhanced to add a large number of Facial Action Coding System (FACS)-based expressions. For the interview portion of the broadcast, a motion-capture volume was built in a room just behind the stage.

“Here,” says Lacey, “our partners at Animatrik installed and operated an OptiTrack body performance-capture system. The actress who played Akali, Jeon So-Yeon, was in the volume wearing a mocap suit and the Persona system. The actress was also wearing a microphone and had an audio feed from the stage, so she could participate in a live interview. Her body and facial performance data was all sent to the Pixotope, Future Group’s cross-modality rendering system for AR.”

DRIVING EMOTIONS WITH SPEECH

Real-time rendered humans allow, of course, the chance for real humans to interact with their digital counterparts. One research group has developed a framework for driving a digital human with emotions via speech. This is known as the Matt AI project from Tencent Technology Company Limited’s NEXT Studios and AI Lab, which had also worked on Siren. Here, the digital Matt can ‘talk’ and answer questions, with the facial-animation nuances produced via speech.

“The core of speech-driven facial animation with emotion is to learn a general and accurate function that maps the input speech to the controllers of our facial rig,” details Tencent’s Jingxiang Li. “We achieved this goal via deep learning and mainly focused on two key ingredients. Firstly, we constructed a large-scale high-quality multi-emotion training dataset, which contains actor’s speech and rig controls corresponding to the actor’s performance frame-by-frame. Next, we trained a deep neutral network based on the training dataset.”

Li believes the technology could be used as a tool for lip sync and to automatically generate facial animation for game development, especially for games that have a large amount of dialogue. “It also,” says Li, “made it possible to bring life to a voice assistant by creating an avatar and driving that avatar with speech.”

A DIGITAL HUMAN FROM JUST ONE PHOTOGRAPH

While 3D scans, performance capture and other methods can make hugely detailed digital humans, there are new ways for producing high-quality versions with much less inputs. For instance, Reallusion has released an AI-based head generator called Headshot as a plug-in for its Character Creator toolset.

“All you need is to load one photo of a face and the Headshot AI will analyze the image, provide a sculpt morph with additional customization and generate a realistic digital double in minutes,” says Reallusion Vice President of Product Marketing John C. Martin. “The major shift is that most digital humans are demos and single-character research examples, but Reallusion has developed and released a product that is ready to start generating characters for films, games, immersive experiences and even virtual production.”

“Headshot Pro Mode allows game developers and virtual production teams to immediately funnel a cast of digital doubles into iClone, Unreal, Unity, Maya, ZBrush and more,” notes Martin. “The idea is to allow the digital humans to go anywhere they like and give creators a solution to rapidly develop, iterate and collaborate with real-time.”

Other applications exist in this space, too. Pinscreen, for example, provides an avatar-creation app based on deep-learning research relating to the generation of a realistic digital-face re-creation with facial features and hair from a single smartphone photo.

Matt AI from Tencent’s NEXT Studios and AI Lab uses speech to animate emotions. (Image courtesy of Tencent Technology Company Limited)

Using just one input image, Reallusion’s Headshot can generate a 3D avatar suitable for use in real-time projects. (Image courtesy Reallusion)

Headshot works within Reallusion’s Character Creator tool. (Image courtesy of Reallusion)

THE FUTURE OF REAL-TIME CG HUMANS

Ultimately, there are a wealth of real-time CG human projects out there. Some, of course, aim to render characters completely photorealistcally, while others are going for a recognizable likeness. The advent of ‘deep fakes’ that can work in real-time is also now a part of these developments. The uses of real-time CG humans remain widespread, including all those mentioned above, as well as in other areas such as communicating over distances, clothes shopping, personalized gaming and generating synthetic social media identities.