By TREVOR HOGG
By TREVOR HOGG
In his 1950 paper Computing Machinery and Intelligence, English mathematician Alan Turing devised a series of questions to test the ability of machines to imitate human intelligence without being detected. The Turing Test was the inspiration for the opening sequence of the science fiction classic Blade Runner when a suspected replicant kills his human interrogator. The academic theory and plot conceit are becoming a reality as massive data collection is taking place by Google, Apple, Facebook and Amazon to further their development of artificial intelligence, which is seen as the next technological revolution that will impact everything from daily life, medicine, manufacturing and even the creation of imagery. With each innovation, there is excitement about making what was once impossible achievable while also raising concerns about the societal impact. The visual effects industry views artificial intelligence, machine learning and deep learning as the means to streamline the creative and technical process to allow for quicker iterations for clients.
Hired during the COVID-19 pandemic by Weta Digital to be their CTO, Joe Marks was previously the Vice President and Research Fellow at Disney Research and Executive Director of the Center for Machine Learning at Carnegie Mellon University. “AI is a toolkit that has been evolving since the late 1950s. A lot of the early work was on heuristic search, searching amongst different permutations and combinations. Initially, it was thought that playing chess would be hard and human-perception-related issues, like speech and language, were going to be easy. It turns out that was exactly wrong!
“Visual effects in general is in a unique position as we generate thousands of images a day that are already perfectly labeled because you usually know what you’re rendering. But for the longest time we didn’t have the computing power and the knowledge of machine learning, so that data was wasted. We can help the art direction with machine learning as we have algorithms and ways of supporting the style of certain people and creating versions.”
—Johannes Saam, Senior Creative Technologist, Framestore
“For classical AI,” continues Marks, “it is sitting down with an expert and saying, ‘Explain to me your expertise and I’ll try to code that into the machine by hand.’ With machine learning it is, ‘Give me data samples and I’m going to learn patterns from it.’” Marks has a humanistic attitude towards the technology. “I find that AI and computers in general are the most amazing tools that we’ve built, and they amplify humans. That’s what gets me excited about generating new AI-enabled tools that enhance artists, storytellers and musicians.”
Framestore partnered with software developers Weightshift for a research project that is meant to significantly reduce the animation process by combining ragdoll simulation, keyframe animation and machine learning to allow animators to focus on the artistry rather than dealing with repetitive and time-consuming tasks. Benefitting from these techniques were the photoreal creatures of Lady and the Tramp and His Dark Materials.
“The current implementation of machine learning is more around, ‘How do we do things faster, better and more efficiently?’” notes Theo Jones, Visual Effects Supervisor at Framestore. “The visual effects industry, like other industries, is getting quite squeezed for budgets and schedules. In order to operate a high-end, large-scale Hollywood production, you need a large, robust pipeline and infrastructure to push that through, so there is a machine aspect to it. One of the things that brought me into the industry and has kept me here for so long and satisfied is the talent of the people I work with. We’re a long way off of machines replacing that talent. There are certainly more low-level tasks, the grunt work of getting through the shots and work that machine learning is having quite an impact on.”
Frequent tasks such as greenscreen keys and rotoscoping can be mastered by machine learning, believes Johannes Saam, Senior Creative Technologist at Framestore. “Visual effects in general is in a unique position as we generate thousands of images a day that are already perfectly labeled because you usually know what you’re rendering. But for the longest time we didn’t have the computing power and the knowledge of machine learning, so that data was wasted. We can help the art direction with machine learning as we have algorithms and ways of supporting the style of certain people and creating versions.
“A tool that we have right now allows us to do AI casting,” continues Saam. “What that means is that we can take pictures of different famous or nonfamous people, feed them into a system, and mix and match them in a way that is smarter than putting them in Photoshop and blending them together. You can say, ‘Get me Trevor’s eyes and Johannes’ nose,’ and see what comes out of it. It’s almost like doing a police sketch. We can use that to hone in the casting. Even though those pictures aren’t yet animatable and fully rigged, and can’t replace extras yet, we can at least find our extras in a smarter way that is visual.”
LAIKA Studios entered into a partnership with Intel to harvest data collected from pre-production tasks that would train character-specific neuro networks to scan frames and detect key points on a puppet’s face. “A lot of what we do in the Roto Paint Department is cleaning up the rigs and seams of the puppets,” states James Pina, Roto Paint Lead, Technical at LAIKA Studios. “What this allows us to do is to take away lot of the repetitive part of the process and be more creative, and it gives us more time to do better work.”
“One of the biggest breakthroughs for us, in terms of speed, has been building gigantic, pre-trained models that can be guided by deep learning networks. We train these on hundreds of thousands of images, so it can automatically refine data you want to improve.”
—Darren Hendler, Director, Digital Human Group, Digital Domain
Originally, it was thought that machine learning could provide general solutions for general problems. “We started feeding a bunch of images of puppets, but it wasn’t that useful as it wasn’t task-specific to what we were trying to do, which was to remove the seams on the puppets’ faces,” remarks Jeff Stringer, Director of Production Technology at LAIKA Studios. “It isn’t so much about quantity, but the quality of the data.” There is scope and limitation to machine learning, says Stringer. “One of the things that we want to talk to Intel next is ways to use sound files to drive a rig to get to the initial facial animation. It is important to give the machines the right tasks, like removing things from the frame. When you start to add things, you can get into some territory where you’re getting results that an artist wouldn’t do.”
Machine learning can still assist the creative process. “I would use AI to generate a set of random events and let the artists pick out what they want,” notes Narayan Sundararajan, Senior Principal Engineer & Director, Applied Machine Learning at Intel. “It is like augmenting the creative process versus replacing it, which is what you don’t want to do as that’s when you start creating the same thing over and over again.”
AI is seen not only as a technical tool but as an avenue for personalized storytelling, which is illustrated by Agence, an interactive experience created by Canadian Pietro Gagliano, Founder and Creative Director at Transitional Forms, and the National Film Board of Canada, where the user has the ability to observe and intervene with artificially intelligent creatures. “At the time we started this project,” Gagliano explains, “Unity had two factors that we were looking for. One was the ML-Agents toolkit so we could attach our reinforcement ideas to the game engine, and the other was Cinemachine, where we could build dynamic camera systems to judge what types of cuts could happen.”
“For classical AI, it is sitting down with an expert and saying, ‘Explain to me your expertise and I’ll try to code that into the machine by hand.’ With machine learning it is, ‘Give me data samples and I’m going to learn patterns from it.’ I find that AI and computers in general are the most amazing tools that we’ve built and they amplify humans. That’s what gets me excited about generating new AI-enabled tools that enhance artists, storytellers and musicians.”
—Joe Marks, CTO, Weta Digital
The narrative was not meant to be restricted, Gagliano says. “The nature of Agence is to create as many dynamic systems as possible to get surprises. I believe there is a way to teach AI about the rhythm of storytelling and there is a way to treat storytelling as a technology that works for us humans. One of the things that we encountered quickly is that we lacked the data to support that initiative. We started into this idea of reinforcement learning because it’s synthetic data, so we were able to create simulations that created data again and again. That’s how we’re training the neuro network to run these little creatures rather than the storytelling.”
Autodesk is designing software that utilizes artificial intelligence to break down images into objects. “The AI innovations we’ve shipped in Flame since 2019 include a depth extraction tool to generate Z-depth for a scene automatically, and a face-normal map generator to recognize human faces and generate normal maps for them,” explains Will Harris, Flame Product Manager at Autodesk. “From there we built tools that produce alpha mattes to output various things, the first of which was a sky tool and later a face matte keyer for human heads. These alpha matte tools are specialized object-recognition keyers. With the face tool, we can produce high-contrast mattes to track anything from bags under eyes to lips or cheeks across a sequence to generate a matte. Most recently we added a salient keyer which is like a Swiss army knife for extracting mattes. This feature can be customized for any object, cars, traffic lights, and can even recognize detailed features of a hand. These tools can help artists get 80% of the way to fixing a shot, saving hours of tedious extraction and shot prep work along the way.”
AI and machine learning require intensive computation. “While local compute resources are critical, for some processes, having cloud compute resources brought to bear on specific machine-learning training challenges is necessary due to processing requirements of generating these algorithms,” states Ben Fischler, Industry Strategy Manager at Autodesk. “AI and machine learning are already impacting rendering in areas like de-noising and image cleanup. We can expect more developments along these lines, particularly 2D/3D compositing processes for things like de-aging or frame interpolation for adding frames. There are techniques where you might need to add noise to frames – doing that during a render is demanding on turnaround times. However, if you could use AI to add noise to frames after the fact, like post noise, grain and texture, which are subtle effects that make images look photoreal, you could save a lot of time. There is also the possibility of using AI to render 12 frames per second and for figuring out and filling in the in-between frames; that frame interpolation concept has been demonstrated by NVIDIA.”
“One of the biggest breakthroughs for us, in terms of speed, has been building gigantic, pre-trained models that can be guided by deep learning networks,” states Darren Hendler, Director, Digital Human Group for Digital Domain. “We train these on hundreds of thousands of images, so it can automatically refine data you want to improve. In practice, this might look like a job where you had to roto someone’s face. If your gigantic module has seen thousands of different people and objects and you give it a face to roto, it’s going to be able to learn to do exactly what you want much faster because of all the training that’s come before. This compounds over time, shrinking the time it takes to do a task until one day maybe it takes a couple of seconds.”
The next wave of machine learning will alter things for the better, Hendler believes. “What this will likely center on is augmenting or modifying a performance or something that already exists. Machine learning is exceptionally good at that. This could be turning a plate performance into a character, identifying objects in a scene, improving qualities of renders. However, it won’t be used for new creative performances that have no ties to the real world. Actors still need to drive those. Something else worth noting is that machine learning won’t be taking over the whole process, only parts of it.”
AI has not reached the point of being widely used. “Right now, AI and machine learning are mostly utilized for simpler projects and stunts, like face-swapping for a YouTube video,” remarks Nic Hatch, CEO at Ncam. “They are rarely the main drivers of traditional visual effects, although we see some exceptions from larger studios who have been designing custom tools. Overall, it’s easier to think of this as the beginning of a journey. Everything is still mainly research at the moment, but I anticipate we will start to see more AI-driven off-the-shelf products in the next three to five years, opening it up to studios of all sizes.”
As with the emergence and adoption of new technology, there are fears of the workforce being reduced. “I don’t see it eliminating jobs,” Hatch says. “The world has a huge appetite for media right now, so there are more artists and more work than ever before. I have seen no evidence to suggest this will lessen. As the use of AI expands, it may simply change the way certain jobs are done. Currently, artists can spend a great deal of time on simpler, more mundane tasks. That type of work can potentially be done by computers so humans can spend more time on what they’re good at – ideation and creativity. AI has the potential to change the industry in many ways, but perhaps one of the most important will be allowing creatives to focus on the parts of the job they love and thereby reigniting a passion for visual effects.”