skip to main content
research-article

Fast and Robust Mid-Air Gesture Typing for AR Headsets using 3D Trajectory Decoding

Published: 02 October 2023 Publication History

Abstract

We present a fast mid-air gesture keyboard for head-mounted optical see-through augmented reality (OST AR) that supports users in articulating word patterns by merely moving their own physical index finger in relation to a virtual keyboard plane without a need to indirectly control a visual 2D cursor on a keyboard plane. To realize this, we introduce a novel decoding method that directly translates users' three-dimensional fingertip gestural trajectories into their intended text. We evaluate the efficacy of the system in three studies that investigate various design aspects, such as immediate efficacy, accelerated learning, and whether it is possible to maintain performance without providing visual feedback. We find that the new 3D trajectory decoding design results in significant improvements in entry rates while maintaining low error rates. In addition, we demonstrate that users can maintain their performance even without fingertip and gesture trace visualization.

References

[1]
J. Adhikary and K. Vertanen. Text entry in virtual environments using speech and a midair keyboard. IEEE Transactions on Visualization and Computer Graphics, 27 (5): pp. 2648–2658, 2021.
[2]
O. Alsharif, T. Ouyang, F. Beaufays, S. Zhai, T. Breuel, and J. Schalkwyk. Long short term memory neural network for keyboard gesture decoding. pp. 2076–2080, 04 2015.
[3]
X. Bi, C. Chelba, T. Ouyang, K. Partridge, and S. Zhai. Bimanual gesture keyboard. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, UIST '12, pp. 137–146. Association for Computing Machinery, New York, NY, USA, 2012.
[4]
E. Biju, A. Sriram, M. M. Khapra, and P. Kumar. Joint transformer/RNN architecture for gesture typing in indic languages. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 999–1010. International Committee on Computational Linguistics, Barcelona, Spain (Online), Dec. 2020.
[5]
S. J. Castellucci and I. MacKenzie. Gathering text entry metrics on android devices. In CHI EA '11, 2011.
[6]
W. Chan, N. Jaitly, Q. Le, and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE, 2016.
[7]
N. Dahlbäck, A. Jönsson, and L. Ahrenberg. Wizard of oz studies: why and how. In Proceedings of the 1st international conference on Intelligent user interfaces, pp. 193–200, 1993.
[8]
L. Dong, S. Xu, and B. Xu. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888. IEEE, 2018.
[9]
J. Dudley, H. Benko, D. Wigdor, and P. O. Kristensson. Performance envelopes of virtual keyboard text input strategies in virtual reality. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 289–300. IEEE, 2019.
[10]
J. Dudley, J. Zheng, G. Aakar, H. Benko, M. Longest, R. Wang, and P. O. Kristensson. Evaluating the performance of hand-based probabilistic text input methods on a mid-air virtual qwerty keyboard. In IEEE Transactions on Visualization and Computer Graphics: forthcoming, 2023.
[11]
J. J. Dudley, K. Vertanen, and P. O. Kristensson. Fast and precise touch-based text entry for head-mounted augmented reality with variable occlusion. ACM Trans. Comput.-Hum. Interact., 25 (6), Dec. 2018.
[12]
M. Gordon, T. Ouyang, and S. Zhai. WatchWriter: Tap and Gesture Typing on a Smartwatch Miniature Keyboard with Statistical Decoding, pp. 3817–3821. New York, NY, USA: Association for Computing Machinery, 2016.
[13]
A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:, 2012.
[14]
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML '06, pp. 369–376. Association for Computing Machinery, New York, NY, USA, 2006.
[15]
J. Hu, J. J. Dudley, and P. O. Kristensson. An evaluation of caret navigation methods for text editing in augmented reality. In 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), pp. 640–645. IEEE, 2022.
[16]
F. Kern, F. Niebling, and M. E. Latoschik. Text input for non-stationary xr workspaces: Investigating tap and word-gesture keyboards in virtual and augmented reality. IEEE Transactions on Visualization and Computer Graphics, 29 (5): pp. 2658–2669, 2023.
[17]
B. Klimt and Y. Yang. The enron corpus: A new dataset for email classification research. In J.-F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, eds., Machine Learning: ECML 2004, pp. 217–226. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
[18]
P. Knierim, V. Schwind, A. M. Feit, F. Nieuwenhuizen, and N. Henze. Physical keyboards in virtual reality: Analysis of typing performance and effects of avatar hands. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI'18, pp. 1–9. Association for Computing Machinery, New York, NY, USA, 2018.
[19]
P. O. Kristensson. Discrete and Continuous Shape Writing for Text Entry and Control. PhD thesis, 2007.
[20]
P. O. Kristensson. Five challenges for intelligent text entry methods. AI Magazine, 30 (4): pp. 85–85, 2009.
[21]
P. O. Kristensson. Next-generation text entry. Computer, 48 (07): pp. 84–87, 2015.
[22]
P. O. Kristensson, J. Lilley, R. Black, and A. Waller. A design engineering approach for quantitatively exploring context-aware sentence retrieval for nonspeaking individuals with motor disabilities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–11, 2020.
[23]
P. O. Kristensson, M. Mjelde, and K. Vertanen. Understanding adoption barriers to dwell-free eye-typing: Design implications from a qualitative deployment study and computational simulations. In Proceedings of the 28th International Conference on Intelligent User Interfaces, pp. 607–620, 2023.
[24]
P. O. Kristensson and T. Müllners. Design and analysis of intelligent text entry systems with function structure models and envelope analysis. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, pp. 1–12, 2021.
[25]
P. O. Kristensson and K. Vertanen. The potential of dwell-free eye-typing for fast assistive gaze communication. In Proceedings of the symposium on eye tracking research and applications, pp. 241–244, 2012.
[26]
P. O. Kristensson and S. Zhai. SHARK2:a large vocabulary shorthand writing system for pen-based computers. in Proceedings of the 17th Annual ACM Symposium on User Interface Software and Technology, 01 2004.
[27]
L. A. Leiva, S. Kim, W. Cui, X. Bi, and A. Oulasvirta. How We Swipe: A Large-Scale Shape-Writing Dataset and Empirical Findings. New York, NY, USA: Association for Computing Machinery, 2021.
[28]
I. S. MacKenzie and R. W. Soukoreff. Phrase sets for evaluating text entry techniques. In CHI ‘03 Extended Abstracts on Human Factors in Computing Systems, CHI EA '03, pp. 754–755. Association for Computing Machinery, New York, NY, USA, 2003.
[29]
A. Markussen, M. R. Jakobsen, and K. Hornbæk. Vulture: a mid-air word-gesture keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1073–1082, 2014.
[30]
A. Mehra, J. R. Bellegarda, O. Bapat, P. Lal, and X. Wang. Leveraging gans to improve continuous path keyboard input models. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8174–8178. IEEE, 2020.
[31]
S. Reyal, S. Zhai, and P. O. Kristensson. Performance and user experience of touchscreen and gesture keyboards in a lab setting and in the wild. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 679–688, 2015.
[32]
M. Richardson, M. Durasoff, and R. Wang. Decoding Surface Touch Typing from Hand-Tracking, pp. 686–696. New York, NY, USA: Association for Computing Machinery, 2020.
[33]
J. Shen, J. Dudley, and P. O. Kristensson. The imaginative generative adversarial network: Automatic data augmentation for dynamic skeleton-based hand gesture and human action recognition. In 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), pp. 1–8. IEEE, 2021.
[34]
J. Shen, J. Dudley, and P. O. Kristensson. Simulating realistic human motion trajectories of mid-air gesture typing. In 2021 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 393–402. IEEE, 2021.
[35]
J. Shen, J. Dudley, G. Mo, and P. O. Kristensson. Gesture spotter: A rapid prototyping tool for key gesture spotting in virtual and augmented reality applications. IEEE Transactions on Visualization and Computer Graphics, 28 (11): pp. 3618–3628, 2022.
[36]
J. Shen, J. Hu, J. J. Dudley, and P. O. Kristensson. Personalization of a mid-air gesture keyboard using multi-objective bayesian optimization. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 702–710. IEEE, 2022.
[37]
K. Vertanen and P. O. Kristensson. Mining, analyzing, and modeling text written on mobile devices. Natural Language Engineering, 27 (1): pp. 1–33, 2021.
[38]
W. Xu, H. Liang, A. He, and Z. Wang. Pointing and selection methods for text entry in augmented reality head mounted displays. In 2019 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pp. 279–288, 2019.
[39]
X. Yi, C. Liang, H. Chen, J. Song, C. Yu, H. Li, and Y. Shi. From 2d to 3d: Facilitating single-finger mid-air typing on qwerty keyboards with probabilistic touch modeling. in Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 7 (1): pp. 1–25, 2023.
[40]
C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi. Tap, dwell or gesture? exploring head-based text entry techniques for hmds. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI '17, pp. 4479–4488. Association for Computing Machinery, New York, NY, USA, 2017.
[41]
C. Yu, Y. Gu, Z. Yang, X. Yi, H. Luo, and Y. Shi. Tap, Dwell or Gesture? Exploring Head-Based Text Entry Techniques for HMDs, pp. 4479–4488. New York, NY, USA: Association for Computing Machinery, 2017.
[42]
S. Zhai and P. Kristensson. The word-gesture keyboard: Reimagining keyboard interaction. Communications of The ACM - CACM, p. 55, 09 2012.
[43]
S. Zhai and P.-O. Kristensson. Shorthand writing on stylus keyboard. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '03, pp. 97–104. Association for Computing Machinery, New York, NY, USA, 2003.
[44]
S. Zhai and P. O. Kristensson. The word-gesture keyboard: Reimagining keyboard interaction. Commun. ACM, 55 (9): pp. 91–101, Sept. 2012.
[45]
S. Zhai, P. O. Kristensson, P. Gong, M. Greiner, S. A. Peng, L. M. Liu, and A. Dunnigan. Shapewriter on the iphone: from the laboratory to the real world. In CHI'09 Extended Abstracts on Human Factors in Computing Systems, pp. 2667–2670. 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Visualization and Computer Graphics
IEEE Transactions on Visualization and Computer Graphics  Volume 29, Issue 11
Nov. 2023
465 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 02 October 2023

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media