research-article

Fast and Robust Mid-Air Gesture Typing for AR Headsets using 3D Trajectory Decoding

Authors:

Per Ola KristenssonAuthors Info & Claims

IEEE Transactions on Visualization and Computer Graphics, Volume 29, Issue 11

Pages 4622 - 4632

https://doi.org/10.1109/TVCG.2023.3320218

Published: 02 October 2023 Publication History

Abstract

We present a fast mid-air gesture keyboard for head-mounted optical see-through augmented reality (OST AR) that supports users in articulating word patterns by merely moving their own physical index finger in relation to a virtual keyboard plane without a need to indirectly control a visual 2D cursor on a keyboard plane. To realize this, we introduce a novel decoding method that directly translates users' three-dimensional fingertip gestural trajectories into their intended text. We evaluate the efficacy of the system in three studies that investigate various design aspects, such as immediate efficacy, accelerated learning, and whether it is possible to maintain performance without providing visual feedback. We find that the new 3D trajectory decoding design results in significant improvements in entry rates while maintaining low error rates. In addition, we demonstrate that users can maintain their performance even without fingertip and gesture trace visualization.

References

[1]

J. Adhikary and K. Vertanen. Text entry in virtual environments using speech and a midair keyboard. IEEE Transactions on Visualization and Computer Graphics, 27 (5): pp. 2648–2658, 2021.

[2]

O. Alsharif, T. Ouyang, F. Beaufays, S. Zhai, T. Breuel, and J. Schalkwyk. Long short term memory neural network for keyboard gesture decoding. pp. 2076–2080, 04 2015.

[3]

X. Bi, C. Chelba, T. Ouyang, K. Partridge, and S. Zhai. Bimanual gesture keyboard. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST '12, pp. 137–146. Association for Computing Machinery, New York, NY, USA, 2012.

Digital Library

[4]

E. Biju, A. Sriram, M. M. Khapra, and P. Kumar. Joint transformer/RNN architecture for gesture typing in indic languages. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 999–1010. International Committee on Computational Linguistics, Barcelona, Spain (Online), Dec. 2020.

[5]

S. J. Castellucci and I. MacKenzie. Gathering text entry metrics on android devices. In CHI EA '11, 2011.

[6]

W. Chan, N. Jaitly, Q. Le, and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE, 2016.

[7]

N. Dahlbäck, A. Jönsson, and L. Ahrenberg. Wizard of oz studies: why and how. In Proceedings of the 1st international conference on Intelligent user interfaces, pp. 193–200, 1993.

Digital Library

[8]

L. Dong, S. Xu, and B. Xu. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE, 2018.

[9]

J. Dudley, H. Benko, D. Wigdor, and P. O. Kristensson. Performance envelopes of virtual keyboard text input strategies in virtual reality. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 289–300. IEEE, 2019.

[10]

J. Dudley, J. Zheng, G. Aakar, H. Benko, M. Longest, R. Wang, and P. O. Kristensson. Evaluating the performance of hand-based probabilistic text input methods on a mid-air virtual qwerty keyboard. In IEEE Transactions on Visualization and Computer Graphics: forthcoming, 2023.

[11]

J. J. Dudley, K. Vertanen, and P. O. Kristensson. Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Trans. Comput.-Hum. Interact., 25 (6), Dec. 2018.

[12]

M. Gordon, T. Ouyang, and S. Zhai. WatchWriter: Tap and Gesture Typing on a Smartwatch Miniature Keyboard with Statistical Decoding, pp. 3817–3821. New York, NY, USA: Association for Computing Machinery, 2016.

[13]

A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:, 2012.

[14]

A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML '06, pp. 369–376. Association for Computing Machinery, New York, NY, USA, 2006.

[15]

J. Hu, J. J. Dudley, and P. O. Kristensson. An evaluation of caret navigation methods for text editing in augmented reality. In 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 640–645. IEEE, 2022.

[16]

F. Kern, F. Niebling, and M. E. Latoschik. Text input for non-stationary xr workspaces: Investigating tap and word-gesture keyboards in virtual and augmented reality. IEEE Transactions on Visualization and Computer Graphics, 29 (5): pp. 2658–2669, 2023.

Digital Library

[17]

B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, eds., Machine Learning: ECML 2004, pp. 217–226. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.

[18]

P. Knierim, V. Schwind, A. M. Feit, F. Nieuwenhuizen, and N. Henze. Physical keyboards in virtual reality: Analysis of typing performance and effects of avatar hands. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI'18, pp. 1–9. Association for Computing Machinery, New York, NY, USA, 2018.

[19]

P. O. Kristensson. Discrete and Continuous Shape Writing for Text Entry and Control. PhD thesis, 2007.

[20]

P. O. Kristensson. Five challenges for intelligent text entry methods. AI Magazine, 30 (4): pp. 85–85, 2009.

Digital Library

[21]

P. O. Kristensson. Next-generation text entry. Computer, 48 (07): pp. 84–87, 2015.

Digital Library

[22]

P. O. Kristensson, J. Lilley, R. Black, and A. Waller. A design engineering approach for quantitatively exploring context-aware sentence retrieval for nonspeaking individuals with motor disabilities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–11, 2020.

Digital Library

[23]

P. O. Kristensson, M. Mjelde, and K. Vertanen. Understanding adoption barriers to dwell-free eye-typing: Design implications from a qualitative deployment study and computational simulations. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 607–620, 2023.

Digital Library

[24]

P. O. Kristensson and T. Müllners. Design and analysis of intelligent text entry systems with function structure models and envelope analysis. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2021.

Digital Library

[25]

P. O. Kristensson and K. Vertanen. The potential of dwell-free eye-typing for fast assistive gaze communication. In Proceedings of the symposium on eye tracking research and applications, pp. 241–244, 2012.

Digital Library

[26]

P. O. Kristensson and S. Zhai. SHARK2:a large vocabulary shorthand writing system for pen-based computers. in Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, 01 2004.

[27]

L. A. Leiva, S. Kim, W. Cui, X. Bi, and A. Oulasvirta. How We Swipe: A Large-Scale Shape-Writing Dataset and Empirical Findings. New York, NY, USA: Association for Computing Machinery, 2021.

[28]

I. S. MacKenzie and R. W. Soukoreff. Phrase sets for evaluating text entry techniques. In CHI ‘03 Extended Abstracts on Human Factors in Computing Systems, CHI EA '03, pp. 754–755. Association for Computing Machinery, New York, NY, USA, 2003.

[29]

A. Markussen, M. R. Jakobsen, and K. Hornbæk. Vulture: a mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1073–1082, 2014.

[30]

A. Mehra, J. R. Bellegarda, O. Bapat, P. Lal, and X. Wang. Leveraging gans to improve continuous path keyboard input models. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8174–8178. IEEE, 2020.

[31]

S. Reyal, S. Zhai, and P. O. Kristensson. Performance and user experience of touchscreen and gesture keyboards in a lab setting and in the wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 679–688, 2015.

Digital Library

[32]

M. Richardson, M. Durasoff, and R. Wang. Decoding Surface Touch Typing from Hand-Tracking, pp. 686–696. New York, NY, USA: Association for Computing Machinery, 2020.

[33]

J. Shen, J. Dudley, and P. O. Kristensson. The imaginative generative adversarial network: Automatic data augmentation for dynamic skeleton-based hand gesture and human action recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–8. IEEE, 2021.

[34]

J. Shen, J. Dudley, and P. O. Kristensson. Simulating realistic human motion trajectories of mid-air gesture typing. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 393–402. IEEE, 2021.

[35]

J. Shen, J. Dudley, G. Mo, and P. O. Kristensson. Gesture spotter: A rapid prototyping tool for key gesture spotting in virtual and augmented reality applications. IEEE Transactions on Visualization and Computer Graphics, 28 (11): pp. 3618–3628, 2022.

[36]

J. Shen, J. Hu, J. J. Dudley, and P. O. Kristensson. Personalization of a mid-air gesture keyboard using multi-objective bayesian optimization. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 702–710. IEEE, 2022.

[37]

K. Vertanen and P. O. Kristensson. Mining, analyzing, and modeling text written on mobile devices. Natural Language Engineering, 27 (1): pp. 1–33, 2021.

[38]

W. Xu, H. Liang, A. He, and Z. Wang. Pointing and selection methods for text entry in augmented reality head mounted displays. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 279–288, 2019.

[39]

X. Yi, C. Liang, H. Chen, J. Song, C. Yu, H. Li, and Y. Shi. From 2d to 3d: Facilitating single-finger mid-air typing on qwerty keyboards with probabilistic touch modeling. in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 (1): pp. 1–25, 2023.

Digital Library

[40]

C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi. Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI '17, pp. 4479–4488. Association for Computing Machinery, New York, NY, USA, 2017.

[41]

C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi. Tap, Dwell or Gesture? Exploring Head-Based Text Entry Techniques for HMDs, pp. 4479–4488. New York, NY, USA: Association for Computing Machinery, 2017.

[42]

S. Zhai and P. Kristensson. The word-gesture keyboard: Reimagining keyboard interaction. Communications of The ACM - CACM, p. 55, 09 2012.

[43]

S. Zhai and P.-O. Kristensson. Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '03, pp. 97–104. Association for Computing Machinery, New York, NY, USA, 2003.

[44]

S. Zhai and P. O. Kristensson. The word-gesture keyboard: Reimagining keyboard interaction. Commun. ACM, 55 (9): pp. 91–101, Sept. 2012.

Digital Library

[45]

S. Zhai, P. O. Kristensson, P. Gong, M. Greiner, S. A. Peng, L. M. Liu, and A. Dunnigan. Shapewriter on the iphone: from the laboratory to the real world. In CHI'09 Extended Abstracts on Human Factors in Computing Systems, pp. 2667–2670. 2009.

Cited By

Darbar RHu XYan XWei YLiang HXu WSarcar S(2024)OnArmQWERTY: An Empirical Evaluation of On-Arm Tap Typing for AR HMDsProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682084(1-12)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1145/3677386.3682084
Richardson MBotros FShi YSnow BGuo PZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686762
Richardson MBotros FShi YGuo PSnow BZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676343(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676343
Show More Cited By

Recommendations

Using Mid-Air Haptics to Guide Mid-Air Interactions
Human-Computer Interaction – INTERACT 2023
Abstract
When users interact with mid-air gesture-based interfaces, it is not always clear what interactions are available, or how they might be executed. Mid-air interfaces offer no tactile affordances, pushing systems to rely on other modalities (e.g. ...
Bimanual Word Gesture Keyboards for Mid-air Gestures
CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems

Mid-air hand gestural interaction has generally been researched as a pointing device. However, recent research has shown potential for text input with the use of word gesture keyboards (WGK), where these forms of interactions require the input system to ...
Pen + Mid-Air: An Exploration of Mid-Air Gestures to Complement Pen Input on Tablets
NordiCHI '16: Proceedings of the 9th Nordic Conference on Human-Computer Interaction

In this paper, we report on a series of studies, exploring the potential of pen and mid-air input on tablets. We describe a field study with an early prototype of a drawing application and follow-up inquiries, such as the development and comparison of ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics

IEEE Transactions on Visualization and Computer Graphics Volume 29, Issue 11

Nov. 2023

465 pages

ISSN:1077-2626

Issue’s Table of Contents

1077-2626 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 02 October 2023

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Darbar RHu XYan XWei YLiang HXu WSarcar S(2024)OnArmQWERTY: An Empirical Evaluation of On-Arm Tap Typing for AR HMDsProceedings of the 2024 ACM Symposium on Spatial User Interaction10.1145/3677386.3682084(1-12)Online publication date: 7-Oct-2024
https://dl.acm.org/doi/10.1145/3677386.3682084
Richardson MBotros FShi YSnow BGuo PZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasAdjunct Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3672539.3686762(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3672539.3686762
Richardson MBotros FShi YGuo PSnow BZhang LDong JVertanen KMa SWang R(2024)StegoType: Surface Typing from Egocentric CamerasProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676343(1-14)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676343
Streli PRichardson MBotros FMa SWang RHolz C(2024)TouchInsight: Uncertainty-aware Rapid Touch and Text Input for Mixed Reality from Egocentric VisionProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676330(1-16)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676330

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents