Tired of foreign dubbing trashing his movies, British film director Scott Mann, in collaboration with researchers from the German Max Plank Insititute, has created an artificial intelligence capable of making the movement of the actor’s mouth and his gestures reproduce with astonishing precision when folded into another idiom.
In the video demonstration of the technology, which you can find a few lines below, we see Jack Nicholson and Tom Cruise performing the key scene of ‘Some Good Men’ in French, Robert De Niro speaking perfect German and Tom Hanks playing the role by Forrest Gump in German, Spanish and Japanese. In all planes there are a perfect synchronization between the mouth of the actors and the audio that is heard.
Mann comments in statements to ‘Wired’ that it all started with the horror he felt when he saw the dubbed version into another language of his feature film ‘Heist’, with Robert De Niro. That dubbing changed the original dialogues of the film for others that were closer to the movement of the actors’ mouths. This for him destroyed the essence of the scenes that had cost him so much work to compose.
“I remember feeling devastated,” says Mann. “If you make a small change in a word or in a performance, you can have a big change in a character in the rhythm of the story, and in turn in the movie.”
As a result, Mann became interested in ‘deepfake’ technology: an artificial intelligence algorithm that allows one person’s face and gestures to be substituted for another in an extraordinarily realistic way. This tool has been very controversial in recent years for its unethical use. It has been used, for example, to insert the faces of famous women in pornographic videos or to create false images with which to ‘bully’ high school students.
In his search, Mann uncovered an investigation led by Christian Theobalt, Director of the Graphics, Vision & Video research group at the Max-Planck Institute in Germany. Theobalt has created a technology, related to ‘deepfake’ although much more complex, that allows modify the movements and gestures of the actors’ lips as if they were speaking in another language.
The algorithm of the German researchers takes on the one hand the facial expressions and movements of an actor and on the other the gestures of the lips of a person who is reciting the text in another language. The result is an animated 3D model that perfectly matches the actor’s face with the lip movement of the bender. This model is then inserted into the film to replace the original face of the actor.
“[Esta tecnología] it will be invisible in no time “says Mann. “People will see something and they won’t know it was originally shot in French or whatever.”
The practical application of this tool in film production can be enormous. Not only because it can make dubbing more natural and credible, but also because it would allow the directors to continue modifying the dialogues after filming is finished. It is very common in any production to have to bring the team and actors together to re-shoot scenes that for technical or artistic reasons do not quite fit into the montage. Directors like Woody Allen are accustomed to their teams to reserve a couple of weeks after the main shoot is finished for these types of contingencies. But ‘reshoots’ are expensive for producers and getting money to carry them out is not always possible, so a tool like this would save them a lot of headaches.
Mann is aware that actors may be shocked by this technology at first. “There is fear and amazement: they are the two reactions that I receive”, admits. Virginia Gardner, one of the actresses in his film, has also spoken for ‘Wired’: “I think it’s the best way to be able, as an actor, to keep your interpretation in another language,” he says. “If you trust your director and trust that this process is only going to improve a movie, then I really don’t see any downside to him.”
Mann has taken Theobalt’s technology and is beginning to commercialize it through his Flawless company. According to the director It has already contacted several studios to create the other language versions of various films.
Although we have already seen technologies similar to this, they had not yet been able to become fully credible. It is a matter of time, and not long, that it is impossible to distinguish the manipulated videos from the real ones. In a very few years it will not only be possible to easily transform videos by processing them with tools like this, but also we will even be able to do it in real time
This is how the new Nvidia video call works.
Nvidia introduced last October a new technology based on artificial intelligence that reduces the bandwidth needed to make a video call to one tenth of what is used today. In addition, this neural network is capable of modifying videos while the call is being made without loss of image quality or connection problems. This tool allows you to correct the position of the head so that it is always shown in front of the camera or even show us as an avatar that moves its mouth and gestures just as we are doing.
Nvidia talks about a rebirth of video calling, and surely he is right. There is not much left so that our video calls are not interrupted, they are always seen in a great light and we can also do them in the language we want thanks to the advance in real-time translation tools. In a few years the zooms and skypes of today will seem to us experiences of the last century. Cinema, from what we are seeing, takes the same path.