skip to main content
research-article

Keyboard Snooping from Mobile Phone Arrays with Mixed Convolutional and Recurrent Neural Networks

Published: 21 June 2019 Publication History

Abstract

The ubiquity of modern smartphones, because they are equipped with a wide range of sensors, poses a potential security risk---malicious actors could utilize these sensors to detect private information such as the keystrokes a user enters on a nearby keyboard. Existing studies have examined the ability of phones to predict typing on a nearby keyboard but are limited by the realism of collected typing data, the expressiveness of employed prediction models, and are typically conducted in a relatively noise-free environment. We investigate the capability of mobile phone sensor arrays (using audio and motion sensor data) for classifying keystrokes that occur on a keyboard in proximity to phones around a table, as would be common in a meeting. We develop a system of mixed convolutional and recurrent neural networks and deploy the system in a human subjects experiment with 20 users typing naturally while talking. Using leave-one-user-out cross validation, we find that mobile phone arrays have the ability to detect 41.8% of keystrokes and 27% of typed words correctly in such a noisy environment---even without user specific training. To investigate the potential threat of this attack, we further developed the machine learning models into a realtime system capable of discerning keystrokes from an array of mobile phones and evaluated the system's ability with a single user typing in varying conditions. We conclude that, in order to launch a successful attack, the attacker would need advanced knowledge of the table from which a user types, and the style of keyboard on which a user types. These constraints greatly limit the feasibility of such an attack to highly capable attackers and we therefore conclude threat level of this attack to be low, but non-zero.

References

[1]
Apple developer documentation: mach-absolute-time. https://developer.apple.com/documentation/kernel/1462446-mach_absolute_time. Accessed: 2018-07-01.
[2]
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533--1545, 2014.
[3]
Kamran Ali, Alex X Liu, Wei Wang, and Muhammad Shahzad. Keystroke recognition using wifi signals. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, pages 90--102. ACM, 2015.
[4]
Liang Cai and Hao Chen. Touchlogger: inferring keystrokes on touch screen from smartphone motion. In Proceedings of the 6th USENIX conference on Hot topics in security, pages 9--9. USENIX Association, 2011.
[5]
Liang Cai and Hao Chen. On the practicality of motion based keystroke inference attack. In International Conference on Trust and Trustworthy Computing, pages 273--290. Springer, 2012.
[6]
Ronan Collobert and Jason Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160--167. ACM, 2008.
[7]
Li Deng, Geoffrey Hinton, and Brian Kingsbury. New types of deep neural network learning for speech recognition and related applications: An overview. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8599--8603. IEEE, 2013.
[8]
Gunnar Fant. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations. Number 2. Walter de Gruyter, 1970.
[9]
Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. 1999.
[10]
Ross Girshick. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 1440--1448, 2015.
[11]
Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparse rectifier neural networks. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 315--323, 2011.
[12]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016.
[14]
Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. Convolutional neural network architectures for matching natural language sentences. In Advances in neural information processing systems, pages 2042--2050, 2014.
[15]
Biing-Hwang Juang, L Rabiner, and JG Wilpon. On the use of bandpass liftering in speech recognition. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP'86., volume 11, pages 765--768. IEEE, 1986.
[16]
Andrew Kelly. Cracking passwords using keyboard acoustics and language modeling.
[17]
Xiangyu Liu, Zhe Zhou, Wenrui Diao, Zhou Li, and Kehuan Zhang. When good becomes evil: Keystroke inference with smartwatch. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1273--1285. ACM, 2015.
[18]
Chris Xiaoxuan Lu, Bowen Du, Hongkai Wen, Sen Wang, Andrew Markham, Ivan Martinovic, Yiran Shen, and Niki Trigoni. Snoopy: Sniffing your smartwatch passwords via deep sequence learning. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 1(4):152, 2018.
[19]
Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015.
[20]
Philip Marquardt, Arunabh Verma, Henry Carter, and Patrick Traynor. (sp) iphone: decoding vibrations from nearby keyboards using mobile phone accelerometers. In Proceedings of the 18th ACM conference on Computer and communications security, pages 551--562. ACM, 2011.
[21]
Maryam Mehrnezhad, Ehsan Toreini, Siamak F Shahandashti, and Feng Hao. Stealing pins via mobile sensors: actual risk versus user perception. International Journal of Information Security, 17(3):291--313, 2018.
[22]
Tomáš Mikolov, Martin Karafiát, Lukáš Burget, Jan Černockỳ, and Sanjeev Khudanpur. Recurrent neural network based language model. In Eleventh Annual Conference of the International Speech Communication Association, 2010.
[23]
David L Mills. Internet time synchronization: the network time protocol. IEEE Transactions on communications, 39(10):1482--1493, 1991.
[24]
Ben Milner. A comparison of front-end configurations for robust speech recognition. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 1, pages I--797. IEEE, 2002.
[25]
Abdel-rahman Mohamed. Deep neural network acoustic models for asr. PhD thesis, 2014.
[26]
Lindasalwa Muda, Mumtaj Begam, and Irraivan Elamvazuthi. Voice recognition algorithms using mel frequency cepstral coefficient (mfcc) and dynamic time warping (dtw) techniques. arXiv preprint arXiv:1003.4083, 2010.
[27]
Alan V Oppenheim and Ronald W Schafer. Discrete-time signal processing. Pearson Education, 2014.
[28]
Javier Ram i rez, Jos e C Segura, Carmen Ben i tez, Angel De La Torre, and Antonio Rubio. Efficient voice activity detection algorithms using long-term speech information. speech communication, 42(3-4).
[29]
Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, and Vijay Pande. Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072, 2015.
[30]
Dominik Scherer, Andreas Müller, and Sven Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Artificial Neural Networks--ICANN 2010, pages 92--101. Springer, 2010.
[31]
Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673--2681, 1997.
[32]
Laurent Simon and Ross Anderson. Pin skimmer: Inferring pins through the camera and microphone. In Proceedings of the Third ACM workshop on Security and privacy in smartphones & mobile devices, pages 67--78. ACM, 2013.
[33]
Laurent Simon, Wenduan Xu, and Ross Anderson. DonâĂŹt interrupt me while i type: Inferring text entered through gesture typing on android keyboards. Proceedings on Privacy Enhancing Technologies, 2016(3):136--154, 2016.
[34]
Fikret Sivrikaya and Bülent Yener. Time synchronization in sensor networks: a survey. IEEE network, 18(4):45--50, 2004.
[35]
Peter Welch. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transactions on audio and electroacoustics, 15(2):70--73, 1967.
[36]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
[37]
Zhi Xu, Kun Bai, and Sencun Zhu. Taplogger: Inferring user inputs on smartphone touchscreens using on-board motion sensors. In Proceedings of the fifth ACM conference on Security and Privacy in Wireless and Mobile Networks, pages 113--124. ACM, 2012.
[38]
Tuo Yu, Haiming Jin, and Klara Nahrstedt. Writinghacker: audio based eavesdropping of handwriting via mobile devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 463--473. ACM, 2016.
[39]
Li Zhuang, Feng Zhou, and J D. Tygar. Keyboard acoustic emanations revisited. 13, 01 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 3, Issue 2
June 2019
802 pages
EISSN:2474-9567
DOI:10.1145/3341982
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2019
Accepted: 01 April 2019
Revised: 01 February 2019
Received: 01 November 2018
Published in IMWUT Volume 3, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Keyboard Snooping
  2. Machine Learning
  3. Security

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)22
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media