skip to main content
research-article

ViGather: Inclusive Virtual Conferencing with a Joint Experience Across Traditional Screen Devices and Mixed Reality Headsets

Published: 13 September 2023 Publication History

Abstract

Teleconferencing is poised to become one of the most frequent use cases of immersive platforms, since it supports high levels of presence and embodiment in collaborative settings. On desktop and mobile platforms, teleconferencing solutions are already among the most popular apps and accumulate significant usage time---not least due to the pandemic or as a desirable substitute for air travel or commuting.
In this paper, we present ViGather, an immersive teleconferencing system that integrates users of all platform types into a joint experience via equal representation and a first-person experience. ViGather renders all participants as embodied avatars in one shared scene to establish co-presence and elicit natural behavior during collocated conversations, including nonverbal communication cues such as eye contact between participants as well as body language such as turning one's body to another person or using hand gestures to emphasize parts of a conversation during the virtual hangout. Since each user embodies an avatar and experiences situated meetings from an egocentric perspective no matter the device they join from, ViGather alleviates potential concerns about self-perception and appearance while mitigating potential 'Zoom fatigue', as users' self-views are not shown. For participants in Mixed Reality, our system leverages the rich sensing and reconstruction capabilities of today's headsets. For users of tablets, laptops, or PCs, ViGather reconstructs the user's pose from the device's front-facing camera, estimates eye contact with other participants, and relates these non-verbal cues to immediate avatar animations in the shared scene.
Our evaluation compared participants' behavior and impressions while videoconferencing in groups of four inside ViGather with those in Meta Horizon as a baseline for a social VR setting. Participants who participated on traditional screen devices (e.g., laptops and desktops) using ViGather reported a significantly higher sense of physical, spatial, and self-presence than when using Horizon, while all perceived similar levels of active social presence when using Virtual Reality headsets. Our follow-up study confirmed the importance of representing users on traditional screen devices as reconstructed avatars for perceiving self-presence.

References

[1]
Karan Ahuja, Eyal Ofek, Mar Gonzalez-Franco, Christian Holz, and Andrew D Wilson. 2021. Coolmoves: User motion accentuation in virtual reality. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 5, 2 (2021), 1--23.
[2]
Alphabet. 2022. MediaPipe. https://mediapipe.dev/
[3]
Apple. 2022. ARKit. https://developer.apple.com/alp-reality/
[4]
Sara Atske. 2021. 1. How the internet and technology shaped Americans' personal experiences amid COVID-19. https://www.pewresearch.org/internet/2021/09/01/how-the-internet-and-technology-shaped-americans-personal-experiences-amid-covid-19/
[5]
Autodesk. 2023. The Wild. https://thewild.com/
[6]
Valentin Bazarevsky, Yury Kartynnik, Andrey Vakunov, Karthik Raveendran, and Matthias Grundmann. 2019. Blazeface: Sub-millisecond neural face detection on mobile gpus. arXiv preprint arXiv:1907.05047 (2019).
[7]
Mark Billinghurst and Hirokazu Kato. 1999. Real world teleconferencing. In CHI'99 extended abstracts on Human factors in computing systems. 194--195.
[8]
Abraham G Campbell, Thomas Holz, Jonny Cosgrove, Mike Harlick, and Tadhg O'Sullivan. 2019. Uses of virtual reality for communication in financial services: A case study on comparing different telepresence interfaces: Virtual reality compared to video conferencing. In Future of information and communication conference. Springer, 463--481.
[9]
Géry Casiez, Nicolas Roussel, and Daniel Vogel. 2012. 1? filter: a simple speed-based low-pass filter for noisy input in interactive systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2527--2530.
[10]
Xu Chen, Tianjian Jiang, Jie Song, Jinlong Yang, Michael J Black, Andreas Geiger, and Otmar Hilliges. 2022. gDNA: Towards Generative Detailed Neural Avatars. In Computer Vision and Pattern Recognition (CVPR).
[11]
Yucheng Chen, Yingli Tian, and Mingyi He. 2020. Monocular human pose estimation: A survey of deep learning-based methods. Computer Vision and Image Understanding, Vol. 192 (2020), 102897.
[12]
Yi Fei Cheng, Tiffany Luong, Andreas Rene Fender, Paul Streli, and Christian Holz. 2022. ComforTable User Interfaces: Surfaces Reduce Input Error, Time, and Exertion for Tabletop and Mid-air User Interfaces. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 150--159.
[13]
Cluster. 2023. Cluster. https://cluster.mu/
[14]
Zoom Video Communications. 2022. zoom. https://zoom.us/
[15]
HTC Corporation. 2022. HTC Vive Flow. https://www.vive.com/us/product/vive-flow/overview/
[16]
Arthur Digital. 2023. Arthur. https://www.arthur.digital/
[17]
Scott Elrod, Richard Bruce, Rich Gold, David Goldberg, Frank Halasz, William Janssen, David Lee, Kim McCall, Elin Pedersen, Ken Pier, et al. 1992. Liveboard: a large interactive display supporting group meetings, presentations, and remote collaboration. In Proceedings of the SIGCHI conference on Human factors in computing systems. 599--607.
[18]
Ben Elzendoorn, Marco De Baar, Rene Chavan, Timothy Goodman, Cock Heemskerk, Roland Heidinger, Klaus Kleefeldt, Jarich Koning, Stephen Sanders, Peter Sp"ah, et al. 2009. Analysis of the ITER ECH Upper Port Launcher remote maintenance using virtual reality. Fusion Engineering and Design, Vol. 84, 2--6 (2009), 733--735.
[19]
Andreas Rene Fender and Christian Holz. 2022. Causality-preserving asynchronous reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1--15.
[20]
Mozilla Foundation. 2022. mozilla:hubs. https://hubs.mozilla.com/
[21]
Frame. 2023. Frame. https://learn.framevr.io//
[22]
Jonathon D Hart, Thammathip Piumsomboon, Louise Lawrence, Gun A Lee, Ross T Smith, and Mark Billinghurst. 2018. Emotion sharing and augmentation in cooperative virtual reality games. In Proceedings of the 2018 Annual Symposium on Computer-Human Interaction in Play Companion Extended Abstracts. 453--460.
[23]
Jonathon Derek Hart, Thammathip Piumsomboon, Gun A Lee, Ross T Smith, and Mark Billinghurst. 2021. Manipulating Avatars for Enhanced Communication in Extended Reality. In 2021 IEEE International Conference on Intelligent Reality (ICIR). IEEE, 9--16.
[24]
Jeremy Hartmann, Christian Holz, Eyal Ofek, and Andrew D Wilson. 2019. Realitycheck: Blending virtual environments with situated physical reality. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--12.
[25]
Zhenyi He, Ruofei Du, and Ken Perlin. 2020. Collabovr: A reconfigurable framework for creative collaboration in virtual reality. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 542--554.
[26]
Zhenyi He, Keru Wang, Brandon Yushan Feng, Ruofei Du, and Ken Perlin. 2021. GazeChat: Enhancing Virtual Conferences with Gaze-aware 3D Photos. In The 34th Annual ACM Symposium on User Interface Software and Technology. 769--782.
[27]
Hyperspace. 2023. MootUp. https://mootup.com/
[28]
Jiaxi Jiang, Paul Streli, Huajian Qiu, Andreas Fender, Larissa Laich, Patrick Snape, and Christian Holz. 2022. Avatarposer: Articulated full-body pose tracking from sparse motion sensing. In European Conference on Computer Vision. Springer, 443--460.
[29]
Andrew Jones, Magnus Lang, Graham Fyffe, Xueming Yu, Jay Busch, Ian McDowall, Mark Bolas, and Paul Debevec. 2009. Achieving eye contact in a one-to-many 3D video teleconferencing system. ACM Transactions on Graphics (TOG), Vol. 28, 3 (2009), 1--8.
[30]
Brennan Jones, Yaying Zhang, Priscilla NY Wong, and Sean Rintel. 2020. Vroom: virtual robot overlay for online meetings. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1--10.
[31]
Mohamed Kari, Tobias Grosse-Puppendahl, Luis Falconeri Coelho, Andreas Rene Fender, David Bethge, Reinhard Schütte, and Christian Holz. 2021. Transformr: Pose-aware object substitution for composing alternate mixed realities. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 69--79.
[32]
Shunichi Kasahara and Jun Rekimoto. 2014. JackIn: integrating first-person view with out-of-body vision generation for human-human augmentation. In Proceedings of the 5th augmented human international conference. 1--8.
[33]
Jesper Kjeldskov, Jacob H Smedegård, Thomas S Nielsen, Mikael B Skov, and Jeni Paay. 2014. EyeGaze: enabling eye contact over video. In Proceedings of the 2014 International Working Conference on Advanced Visual Interfaces. 105--112.
[34]
Marco Kurzweg, Jens Reinhardt, Wladimir Nabok, and Katrin Wolf. 2021. Using Body Language of Avatars in VR Meetings as Communication Status Cue. Proceedings of Mensch und Computer 2021 (2021).
[35]
Marco Kurzweg and Katrin Wolf. 2022. Body Language of Avatars in VR Meetings as Communication Status Cue: Recommendations for Interaction Design and Implementation. i-com, Vol. 21 (2022), 175 -- 201.
[36]
Hideaki Kuzuoka. 1992. Spatial workspace collaboration: a SharedView video support system for remote collaboration capability. In Proceedings of the SIGCHI conference on Human factors in computing systems. 533--540.
[37]
J Lafferty and P Eady. 1974. The Desert Survival Problem Manual.
[38]
Matthew Lombard, Theresa B Ditton, and Lisa Weinstein. 2009. Measuring presence: the temple presence inventory. In Proceedings of the 12th annual international workshop on presence. 1--15.
[39]
Tencent Holdings Ltd. 2022. Wechat. https://www.wechat.com/
[40]
Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, et al. 2019. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172 (2019).
[41]
Guido Makransky, Lau Lilleholt, and Anders Aaby. 2017. Development and validation of the Multimodal Presence Scale for virtual reality environments: A confirmatory factor analysis and item response theory approach. Computers in Human Behavior, Vol. 72 (2017), 276--285.
[42]
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017. Vnect: Real-time 3d human pose estimation with a single rgb camera. Acm transactions on graphics (tog), Vol. 36, 4 (2017), 1--14.
[43]
Inc Meta Platforms. 2022a. Facebook. https://facebook.com
[44]
Inc Meta Platforms. 2022b. Meta Avatar SDK. https://developer.oculus.com/documentation/unity/meta-avatars-overview/
[45]
Inc Meta Platforms. 2022c. Meta Horizon Workrooms. https://www.oculus.com/workrooms
[46]
Inc Meta Platforms. 2022d. Meta Quest Pro. https://www.meta.com/ch/en/quest/quest-pro/
[47]
Inc Microsoft. 2022a. Microsoft Teams. https://www.microsoft.com/en-us/microsoft-teams/group-chat-software
[48]
Inc Microsoft. 2022b. Skype. https://www.skype.com/en/
[49]
Sebastian Molinillo, Roc'io Aguilar-Illescas, Rafael Anaya-Sánchez, and Mar'ia Vallesp'in-Arán. 2018. Exploring the impacts of interactions, social presence and emotional engagement on active collaborative learning in a social web-based environment. Computers & Education, Vol. 123 (2018), 41--52.
[50]
Teresa Monahan, Gavin McArdle, and Michela Bertolotto. 2008. Virtual reality for collaborative e-learning. Computers & Education, Vol. 50, 4 (2008), 1339--1353.
[51]
Thanh Khuong Nguyen and Thi Hong Tham Nguyen. 2021. The Acceptance and Use of Video Conferencing for Teaching in Covid-19 Pandemic: An Empirical Study in Vietnam. AsiaCALL Online Journal, Vol. 12, 5 (2021), 1--16.
[52]
Ohan Oda, Carmine Elvezio, Mengu Sukan, Steven Feiner, and Barbara Tversky. 2015. Virtual replicas for remote assistance in virtual and augmented reality. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. 405--415.
[53]
Kenton O'hara, Jesper Kjeldskov, and Jeni Paay. 2011. Blended interaction spaces for distributed team collaboration. ACM Transactions on Computer-Human Interaction (TOCHI), Vol. 18, 1 (2011), 1--28.
[54]
OptiTrack. 2022. Motion Capture Systems. http://optitrack.com/
[55]
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L Davidson, Sameh Khamis, Mingsong Dou, et al. 2016. Holoportation: Virtual 3d teleportation in real-time. In Proceedings of the 29th annual symposium on user interface software and technology. 741--754.
[56]
Juyeon Park and Jennifer Paff Ogle. 2021. How virtual avatar experience interplays with self-concepts: the use of anthropometric 3D body models in the visual stimulation process. Fashion and Textiles, Vol. 8, 1 (2021), 1--24.
[57]
Tomislav Pejsa, Julian Kantor, Hrvoje Benko, Eyal Ofek, and Andrew Wilson. 2016. Room2room: Enabling life-size telepresence in a projected augmented reality environment. In Proceedings of the 19th ACM conference on computer-supported cooperative work & social computing. 1716--1725.
[58]
Zhongling Pi, Yi Zhang, Fangfang Zhu, Ke Xu, Jiumin Yang, and Weiping Hu. 2019. Instructors' pointing gestures improve learning regardless of their use of directed gaze in video lectures. Comput. Educ., Vol. 128 (2019), 345--352.
[59]
Thammathip Piumsomboon, Gun A Lee, Jonathon D Hart, Barrett Ens, Robert W Lindeman, Bruce H Thomas, and Mark Billinghurst. 2018. Mini-me: An adaptive avatar for mixed reality remote collaboration. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1--13.
[60]
Rabindra Ratan, Dave B Miller, and Jeremy N Bailenson. 2022. Facial appearance dissatisfaction explains differences in zoom fatigue. Cyberpsychology, Behavior, and Social Networking, Vol. 25, 2 (2022), 124--129.
[61]
Holger Regenbrecht, Michael Haller, Joerg Hauber, and Mark Billinghurst. 2006. Carpeno: interfacing remote collaborative virtual environments with table-top interaction. Virtual Reality, Vol. 10, 2 (2006), 95--107.
[62]
Daniel Roth, Constantin Kleinbeck, Tobias Feigl, Christopher Mutschler, and Marc Erich Latoschik. 2018. Beyond Replication: Augmenting Social Behaviors in Multi-User Virtual Realities. 2018 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (2018), 215--222.
[63]
Jiwon Ryu and Gerard Jounghyun Kim. 2020. Interchanging the Mode of Display Between Desktop and Immersive Headset for Effective and Usable On-line Learning. In IHCI.
[64]
Bernd Schroeer, Andreas Kain, and Udo Lindemann. 2010. Supporting creativity in conceptual design: Method 635-extended. In DS 60: Proceedings of DESIGN 2010, the 11th International Design Conference, Dubrovnik, Croatia.
[65]
Keisuke Shiro, Atsushi Okada, Takashi Miyaki, and Jun Rekimoto. 2018. Omnigaze: A display-covered omnidirectional camerafor conveying remote user's presence. In Proceedings of the 6th International Conference on Human-Agent Interaction. 176--183.
[66]
Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. 2011. Real-time human pose recognition in parts from single depth images. In CVPR 2011. Ieee, 1297--1304.
[67]
Misha Sra, Aske Mottelson, and Pattie Maes. 2018. Your place and mine: Designing a shared VR experience for remotely located users. In Proceedings of the 2018 Designing Interactive Systems Conference. 85--97.
[68]
Frank Steinicke, Nale Lehmann-Willenbrock, and Annika Luisa Meinecke. 2020. A first pilot study to compare virtual group meetings using video conferences and (immersive) virtual reality. In Symposium on Spatial User Interaction. 1--2.
[69]
Carsten Stoll, Nils Hasler, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Fast articulated motion tracking using a sums of gaussians body model. In 2011 International Conference on Computer Vision. IEEE, 951--958.
[70]
Paul Streli, Rayan Armani, Yi Fei Cheng, and Christian Holz. 2023. HOOV: Hand Out-Of-View Tracking for Proprioceptive Interaction using Inertial Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1--16.
[71]
Spatial Systems. 2023. Spatial. https://www.spatial.io//
[72]
Denis Tome, Chris Russell, and Lourdes Agapito. 2017a. Lifting from the deep: Convolutional 3d pose estimation from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2500--2509.
[73]
Denis Tome, Chris Russell, and Lourdes Agapito. 2017b. Lifting from the deep: Convolutional 3d pose estimation from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2500--2509.
[74]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1653--1660.
[75]
Hongyan Wang, Zhongling Pi, and Weiping Hu. 2019. The instructor's gaze guidance in video lectures improves learning. J. Comput. Assist. Learn., Vol. 35 (2019), 42--50.
[76]
Stephan Wenninger, Jascha Achenbach, Andrea Bartl, Marc Erich Latoschik, and Mario Botsch. 2020. Realistic virtual humans from smartphone videos. In Proceedings of the 26th ACM Symposium on Virtual Reality Software and Technology. 1--11.
[77]
Jacob O Wobbrock, Leah Findlater, Darren Gergle, and James J Higgins. 2011. The aligned rank transform for nonparametric factorial analyses using only anova procedures. In Proceedings of the SIGCHI conference on human factors in computing systems. 143--146.
[78]
Jason W Woodworth, David Broussard, and Christoph W Borst. 2022. Redirecting Desktop Interface Input to Animate Cross-Reality Avatars. In 2022 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). IEEE, 843--851.
[79]
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William T Freeman, Rahul Sukthankar, and Cristian Sminchisescu. 2020. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6184--6193.
[80]
Andrew Yoshimura and Christoph W. Borst. 2020a. Evaluation and Comparison of Desktop Viewing and Headset Viewing of Remote Lectures in VR with Mozilla Hubs. In ICAT-EGVE.
[81]
Andrew Yoshimura and Christoph W. Borst. 2020b. Evaluation of Headset-based Viewing and Desktop-based Viewing of Remote Lectures in a Social VR Platform. 26th ACM Symposium on Virtual Reality Software and Technology (2020).
[82]
Jacob Young, Tobias Langlotz, Matthew Cook, Steven Mills, and Holger Regenbrecht. 2019. Immersive telepresence and remote collaboration using mobile and wearable devices. IEEE transactions on visualization and computer graphics, Vol. 25, 5 (2019), 1908--1918.
[83]
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
[84]
Michael Zollhöfer, Justus Thies, Pablo Garrido, Derek Bradley, Thabo Beeler, Patrick Pérez, Marc Stamminger, Matthias Nießner, and Christian Theobalt. 2018. State of the art on monocular 3D face reconstruction, tracking, and applications. In Computer graphics forum, Vol. 37. Wiley Online Library, 523--550.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Human-Computer Interaction
Proceedings of the ACM on Human-Computer Interaction  Volume 7, Issue MHCI
MHCI
September 2023
1017 pages
EISSN:2573-0142
DOI:10.1145/3624512
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2023
Published in PACMHCI Volume 7, Issue MHCI

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. avatars
  2. co-presence
  3. collaboration
  4. cross-platform
  5. embodied presence
  6. immersive social interaction
  7. mixed reality
  8. social VR
  9. teleconferencing
  10. video conferencing
  11. virtual reality

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)278
  • Downloads (Last 6 weeks)8
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media