Videoconferencing Needs to Climb Out of the Uncanny Valley

Many people will continue to WFH more, but remote communication tools are still lacking. Tech companies are racing to add more presence to our telepresence.
An illustration of a man scrolling through faces on large screens
Illustration: Sonia Pulido

In 2020 we all became Alice Nelson, the Brady Bunch housekeeper who occupied the center square in the show's iconic tic-tac-toe board of faces. Except today, the Marcias and Gregs around us are yammering about KPIs, cost-cutting, and sourdough starters—and proving that, yes, those annoying futurists were right. We can do business from home.

The benefits are obvious: no time lost getting to work, more flexibility for workers to care for children and elders, big savings on office leases, and, with fewer cars commuting on roads and fewer business travelers in the skies, way less carbon burned. Proof of concept in the bag, Silicon Valley is now gearing up to make remote conferencing our permanent reality.

Sure, in the long view, the solution to far-flung connecting might be virtual reality, in which we can share space in our corporate matrices from the comfort of home. But the pandemic has given us a taste of the Zoom life, and we are hungry for something better. And faster. It'll take a while before cumbersome VR headsets slim down for everyday use. That's why, inside fledgling startups and tech giants like Google and Facebook, designers are concocting products to make telepresence more compelling and are adding features unique to the digital realm. Collaborative documents, whiteboards, AR, AI.

Getting to compelling won't be an easy task. “As soon as people can realistically see each other in person, the amount of video chat usage is going to decline dramatically, because it's a shitty version of being together,” says Jason Citron, CEO of Discord, a chat service originally designed for gamers. The drawbacks are plain: lack of eye contact, frozen screens, inability to read cues or jump into a conversation. So what are they doing about it?

See Me, Feel Me

Jeremy Bailenson, a VR expert at Stanford University, says that we're now in a remote-conferencing uncanny valley. Eye contact alone exploits a nonverbal vocabulary as extensive as the Oxford English Dictionary. But with our cameras situated above the screen, current systems can't transmit those signals. This is particularly true in group meetings.

Remote eye contact is possible. Companies like Cisco have for years sold deluxe telepresence systems with a high-resolution camera on top of multiple large screens. The high-density image, low latency, and eye-to-eye lock create the sensation of an actual physical presence with your remote conversational partner. Alas, these room-size supersystems can cost six figures.

So Big Tech is trying other means to jack up verisimilitude at a more attainable price. Google has ramped up its Meet app with noise cancellation and low-light enhancement to make your lockdown punim less bleak. Facebook's Messenger Rooms allow ad hoc meetings, spicing up their images with photo filters, 360-degree backgrounds, and special effects like bunny ears. As for eye contact, “there are low-tech ways of approaching this, which involve things like redrawing the pixels that are painting your eyes to artificially adjust them slightly upward, to make it seem like you're looking at me,” says Javier Soltero, the general manager and vice president of Google's G Suite. But Google isn't committing to this, and Soltero also worries about fictionalizing people's images.

Replace the Face

Some companies want to ditch the videostream and replace our faces with evocative avatars, ones with expressive cues. Apple has already popularized smart “animoji,” which present you as an animal, or even poop, lip-syncing your words with appropriate facial expressions. Facebook just upped its game in giving people tools to look like (idealized) versions of their actual appearance. “What do we really need for that to work?” asks Philip Rosedale, who founded Second Life and now has a project called High Fidelity. “There has to be some tension, it has to have danger.” In other words, via expressions, the avatar has to feel human enough that there is something at stake.

The Conference Conundrum

Large gatherings—the ones we used to travel to—pose perhaps the biggest challenge. I've been “attending” a number of remote conferences of late, with mixed results. Some, like TED's first fully virtual conference, stream the talks and then use a standard conferencing system to let small groups talk with speakers après presentation. (The shift from lean-back to lean-in was jarring, like exiting a Finnish sauna in wintertime.)

A startup called Shindig tries to replicate the vitality of a panel discussion or the intimacy of a fireside chat. Audience members can privately video chat among themselves. High Fidelity uses a different approach: 3D audio, a kind of stereoscopic distribution of sound that simulates how voices are carried in real life, both in small groups and massive crowds. “You could have a stadium concert and you could hear 10,000 people roar in the distance,” says High Fidelity's Rosedale.

Become Part of the Machine

The conference lobby scrum, where you reconnect or try to make new friends or business contacts, is the hardest to replicate. Everything in that melee is dictated by a profusion of body signals. Are people leaning in intensely, speaking sotto voce? Or taking desultory sips of wine and shifting their feet? A new social grammar is required for remote gatherings.

A company called Wave is taking its cue on virtual interaction from multiplayer games like Fortnite or World of Warcraft, whose millions of players have already jury-rigged a way to socialize, when they aren't killing each other. Wave immerses avatars in a gamelike conference world, where there's no need to prowl the corridors looking for the person you want to talk to. Just click on the person's name from the list of attendees and you are teleported to their virtual location. Online Town, a San Bruno, California, startup, lets crude avatars mingle in conference halls, at parks, and on beaches. It uses a clever trick to get around the awkwardness of butting in on a group chat. As you approach a cluster of avatars in conversation, you hear their actual voices more clearly, allowing you to judge when it's OK to nudge your way in.

“We're just trying to provide the minimum amount of interface needed for us to want to be around each other this way,” says Cyrus Tabrizi, a cofounder of Online Town. “As long as something makes you feel good about being around the other people here, that's a win.”

What's missing, of course, is the frisson of being in a hotel bar or dining room with fellow flesh-and-blood attendees and the full benefit of our evolutionary perceptions. Replicating the kismet of chance encounters and meaningful glances is tech's next challenge. Will future attendees of the Sun Valley Conference close mega-mergers with a teddy bear shaking hands with a pile of feces?

A dealmaker usually likes to look a potential partner in the eye. Painted pixels, better cameras, or VR may soon get close enough to do the trick. But until then, a penetrating look from 6 feet away will beat webcams.


This article appears in the July/August issue. Subscribe now.

Let us know what you think about this article. Submit a letter to the editor at [email protected].


What Happens Next?