The final demonstration during a day-long visit to Meta’s research offices in Washington State was by far the most memorable – true telepresence.
Achieving true telepresence in consumer hardware can be one of the most challenging ideas for consumers to prepare. It may also be the key to unlocking the killer use case of virtual reality. Meta calls its approach to the challenge Codec Avatars, and the technology could change the world in ways we can hardly imagine right now.
“I think most people don’t realize what’s coming,” Sheikh told UploadVR.
Earlier this year he suggested it was “at five miracles”. Now?
“I would count three to four research miracles that we need to solve before they reach embedded computing on Quest Pro,” Yaser Sheikh, research director at Meta’s Reality Labs office in Pittsburgh, told UploadVR.
Codec avatars aim to make you feel 1:1 like you’re sharing a room with another person who isn’t physically present with you. Meta calls the approach “Codec Avatars” and if these miracles happen, the concept could transform communications in ways as fundamental as the telephone. It’s perhaps the closest we’ve ever come to teleportation, and the term “telepresence” takes on new meaning placed in the context of the most impressive of the three Avatar Codec demos Meta has shown.
In the Codec Avatars 2.0 demo, I came face to face with Jason Saragih, research director for Meta in Pittsburgh. I moved my head from side to side and his eyes maintained eye contact with me. As I conversed, I moved a light around her face to highlight her skin and her expressions from different angles. All the while, I picked up the same kinds of subtle facial movements that I had picked up all day from people who shared the same physical room with me like Mark Zuckerberg, Andrew Bosworth, Michael Abrash, and others.
The difference was that Saragih virtually shared my space. It was physically located in Pittsburgh – over 2,500 miles by road from where I was in Washington State. He wore a prototype VR research headset to translate his facial movements and I wore a standard PC-powered Rift S. We had an intercontinental VR call that, for lack of a better description, traversed the “strange valley” in virtual reality. The first call of this loyalty made in the offices of Reality Labs took place in 2021.
“It’s actually probably the first time somebody’s done this in real time for a real person,” Sheikh said. “It was like what I imagine the moment of talking about Alexander Graham Bell with his assistant.”
So what’s the problem ? Why can’t we immediately replace phones with Codec Avatars offering true telepresence? It’s too hard to do the right scan. Currently, creating an avatar that captures both a person’s face and body with this fidelity requires several hours in two different ultra-expensive scanning systems, followed by at least four weeks of time. treatment.
“Strange Valley” refers to a well-known concept describing the discomfort people feel when seeing faces that look like humans but miss the mark significantly.
In film, you feel comfortably immersed in the story looking at something like the cute robot Baymax, but constantly distracted looking at more human faces in heavily motion-captured works like the 2004 film The Polar Express. You can trace the ‘likability’ or ‘familiarity’ of these faces and as the ‘realism’ increases there is a point where people just get freaked out looking at something almost but not quite human. Sympathy decreases and you end up with what looks like a valley on the chart. If you can get enough of the digital representation right, you can go back to the other side of the valley when people start to accept the similarities again. Arguably James Cameron’s 2009 film Avatar is the most high-profile example of this chasm being crossed in movies for long periods of time, as it featured digitized human-like characters that no longer distracted most viewers. .
Doing the same thing in real time with a living human in stereoscopic virtual reality is a much more difficult task. Failure to meet this standard is why so many people criticize the avatar Zuckerberg showed off at Horizon Worlds earlier this year. And that’s also why most real-time avatars from big companies are closer to babies than photorealistic depictions – cartoon babies sit more comfortably on the left side of the valley. An article by The Information earlier this year, for example, referenced a 2019 demo of avatars at Apple from teams developing its VR headset technology that still fell into the strange valley.
And indeed, two more demos provided by Meta on either side of the live intercontinental call made it clear how far there is still to go.
On either side of the most impressive avatar search demo Meta has ever presented to the press were two more that highlighted the incredible challenge ahead.
On the right, Meta showed an avatar scanned from a cellphone. It took a 10 minute phone scan and a few hours of processing time. Meta recently showcased the same concept at SIGGRAPH 2022, and while impressive in its own way, avatar fidelity has fallen deep into the strange valley.
On the left, Meta showed off a full-body avatar captured in what the company calls the “Sociopticon.” It contains 220 high-resolution cameras requiring hours of scanning and days of processing to produce a hyper-realistic digitized Avatar Codec. I could change this avatar’s clothes at any time because the pre-recorded person was performing the same moves over and over. As he jumped, I could walk around the avatar and see each new set of clothes presumably bunch up and drape around his body exactly as I would expect with physical clothes.
Sheikh said the three demos represent the “miracles” needed to transform communications in the 21st century in the same way the telephone changed the 20th century.
“It’s actually the confluence of these three elements that we have to resolve,” he said. “You have to make it as easy as scanning yourself with a phone. It has to be Avatar Codec 2.0 quality and it has to have the full body as well.
If there’s a fourth “miracle”, then it’s driving the hyper-realistic scanned Avatar Codec from the sensors and processors built into a VR headset that can be mass-produced and priced in the range of buyers.
“There are a lot of things to sort out,” Meta principal researcher Michael Abrash told UploadVR. “But I say the real miracle is how do you scale it?”
The meta-researchers said they wouldn’t allow anyone to drive someone else’s Codec avatar. This was hinted at in 2019 with some of the first public revelations of this technology and reiterated during the recent SIGGRAPH demo of the phone digitized version of avatars.
“Internally, we all see the fact that authenticity — the fact that you can trust that when you’re here in VR or AR, you are who you are — the other source is trusted,” Sheikh said. “It’s kind of like an existential requirement of this technology.”
In the same way that Apple’s Face ID scans your face and headsets like HoloLens 2 or Magic Leap 2 authenticate users with their iris – Meta’s own headsets could one day do the same to ensure that the person who wears the helmet is the one that projects its specific scanned Avatar codec into the “metaverse”.
“What this thing will allow us to do is share spaces with people,” Sheikh said. “Videoconferencing allowed us to survive the pandemic, but we all realized in the end what was missing…it was the feeling of sharing space with each other.”