skip to main content
10.1145/3594738.3611365acmconferencesArticle/Chapter ViewAbstractPublication PagesubicompConference Proceedingsconference-collections
short-paper
Public Access

HPSpeech: Silent Speech Interface for Commodity Headphones

Published: 08 October 2023 Publication History

Abstract

We present HPSpeech, a silent speech interface for commodity headphones. HPSpeech utilizes the existing speakers of the headphones to emit inaudible acoustic signals. The movements of the temporomandibular joint (TMJ) during speech modify the reflection pattern of these signals, which are captured by a microphone positioned inside the headphones. To evaluate the performance of HPSpeech, we tested it on two headphones with a total of 18 participants. The results demonstrated that HPSpeech successfully recognized 8 popular silent speech commands for controlling the music player with an accuracy over 90%. While our tests use modified commodity hardware (both with and without active noise cancellation), our results show that sensing the movement of the TMJ could be as simple as a firmware update for ANC headsets which already include a microphone inside the hear cup. This leaves us to believe that this technique has great potential for rapid deployment in the near future. We further discuss the challenges that need to be addressed before deploying HPSpeech at scale.

References

[1]
Abdelkareem Bedri, Himanshu Sahni, Pavleen Thukral, Thad Starner, David Byrd, Peter Presti, Gabriel Reyes, Maysam Ghovanloo, and Zehua Guo. 2015. Toward Silent-Speech Control of Consumer Wearables. Computer 48, 10 (2015), 54–62. https://doi.org/10.1109/MC.2015.310
[2]
Lam A. Cheah., James M. Gilbert., Jose A. Gonzalez., Phil D. Green., Stephen R. Ell., Roger K. Moore., and Ed Holdsworth.2018. A Wearable Silent Speech Interface based on Magnetic Sensors with Motion-Artefact Removal. In Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies - BIODEVICES,. INSTICC, SciTePress, 56–62. https://doi.org/10.5220/0006573200560062
[3]
Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 112–125. https://doi.org/10.1145/3379337.3415879
[4]
Bose Corporation. 2023. QuietComfort 45 Noise Cancelling Smart Headphones | Bose. Retrieved May 26, 2023 from https://www.bose.com/en_us/products/headphones/noise_cancelling_headphones/quietcomfort-headphones-45.html
[5]
Tamás Gábor Csapó, Tamás Grósz, Gábor Gosztolya, László Tóth, and Alexandra Markó. 2017. DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface. In Proc. Interspeech 2017. 3672–3676. https://doi.org/10.21437/Interspeech.2017-939
[6]
B. Denby, Y. Oussar, G. Dreyfus, and M. Stone. 2006. Prospects for a Silent Speech Interface using Ultrasound Imaging. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, Vol. 1. I–I. https://doi.org/10.1109/ICASSP.2006.1660033
[7]
Jose A. Gonzalez, Lam A. Cheah, Angel M. Gomez, Phil D. Green, James M. Gilbert, Stephen R. Ell, Roger K. Moore, and Ed Holdsworth. 2017. Direct Speech Reconstruction From Articulatory Sensor Data by Machine Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2362–2374. https://doi.org/10.1109/TASLP.2017.2757263
[8]
Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication 55, 1 (2013), 22–32. https://doi.org/10.1016/j.specom.2012.02.001
[9]
Carl Q Howard, Colin H Hansen, and Anthony C Zander. 2005. A review of current ultrasound exposure limits. The Journal of Occupational Health and Safety of Australia and New Zealand 21, 3 (2005), 253–257.
[10]
Adesso Inc. 2023. Adesso Xtream G1 Gaming Headphones. Retrieved May 26, 2023 from https://www.adesso.com/product/xtream-g1-multimedia-gaming-headphone-headset-with-microphone/
[11]
Adesso Inc. 2023. Competitive Gaming Headset - Razer Kraken. Retrieved May 26, 2023 from https://www.razer.com/gaming-headsets/razer-kraken
[12]
Yincheng Jin, Yang Gao, Xuhai Xu, Seokmin Choi, Jiyang Li, Feng Liu, Zhengxiong Li, and Zhanpeng Jin. 2022. EarCommand: "Hearing" Your Silent Speech Commands In Ear. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2, Article 57 (jul 2022), 28 pages. https://doi.org/10.1145/3534613
[13]
Eloi Moliner Juanpere and Tamás Gábor Csapó. 2019. Ultrasound-Based Silent Speech Interface Using Convolutional and Recurrent Neural Networks. Acta Acustica united with Acustica 105, 4 (2019), 587–590.
[14]
Arnav Kapur, Shreyas Kapur, and Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 43–53. https://doi.org/10.1145/3172944.3172977
[15]
Arnav Kapur, Utkarsh Sarawgi, Eric Wadkins, Matthew Wu, Nora Hollenstein, and Pattie Maes. 2020. Non-Invasive Silent Speech Recognition in Multiple Sclerosis with Dysphonia. In Proceedings of the Machine Learning for Health NeurIPS Workshop(Proceedings of Machine Learning Research, Vol. 116), Adrian V. Dalca, Matthew B.A. McDermott, Emily Alsentzer, Samuel G. Finlayson, Michael Oberst, Fabian Falck, and Brett Beaulieu-Jones (Eds.). PMLR, 25–38. https://proceedings.mlr.press/v116/kapur20a.html
[16]
Myungjong Kim, Beiming Cao, Ted Mau, and Jun Wang. 2017. Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (2017), 2323–2336. https://doi.org/10.1109/TASLP.2017.2758999
[17]
Myungjong Kim, Nordine Sebkhi, Beiming Cao, Maysam Ghovanloo, and Jun Wang. 2018. Preliminary Test of a Wireless Magnetic Tongue Tracking System for Silent Speech Interface. In 2018 IEEE Biomedical Circuits and Systems Conference (BioCAS). 1–4. https://doi.org/10.1109/BIOCAS.2018.8584786
[18]
Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Alex Olwal, Jun Rekimoto, and Thad Starner. 2021. Mobile, Hands-Free, Silent Speech Texting Using SilentSpeller. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI EA ’21). Association for Computing Machinery, New York, NY, USA, Article 178, 5 pages. https://doi.org/10.1145/3411763.3451552
[19]
Naoki Kimura, Tan Gemicioglu, Jonathan Womack, Richard Li, Yuhui Zhao, Abdelkareem Bedri, Zixiong Su, Alex Olwal, Jun Rekimoto, and Thad Starner. 2022. SilentSpeller: Towards Mobile, Hands-Free, Silent Speech Text Entry Using Electropalatography. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 288, 19 pages. https://doi.org/10.1145/3491102.3502015
[20]
Naoki Kimura, Kentaro Hayashi, and Jun Rekimoto. 2020. TieLent: A Casual Neck-Mounted Mouth Capturing Device for Silent Speech Interaction. In Proceedings of the International Conference on Advanced Visual Interfaces (Salerno, Italy) (AVI ’20). Association for Computing Machinery, New York, NY, USA, Article 33, 8 pages. https://doi.org/10.1145/3399715.3399852
[21]
Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound Imaging-Based Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3290605.3300376
[22]
Ke Li, Ruidong Zhang, Bo Liang, François Guimbretière, and Cheng Zhang. 2022. EarIO: A Low-Power Acoustic Sensing Earable for Continuously Tracking Detailed Facial Movements. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 2, Article 62 (jul 2022), 24 pages. https://doi.org/10.1145/3534621
[23]
Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (Reims, France) (AH2019). Association for Computing Machinery, New York, NY, USA, Article 1, 9 pages. https://doi.org/10.1145/3311823.3311831
[24]
Hiroyuki Manabe, Akira Hiraiwa, and Toshiaki Sugimura. 2003. "Unvoiced Speech Recognition Using EMG - Mime Speech Recognition". In CHI ’03 Extended Abstracts on Human Factors in Computing Systems (Ft. Lauderdale, Florida, USA) (CHI EA ’03). Association for Computing Machinery, New York, NY, USA, 794–795. https://doi.org/10.1145/765891.765996
[25]
Geoffrey S Meltzner, James T Heaton, Yunbin Deng, Gianluca De Luca, Serge H Roy, and Joshua C Kline. 2018. Development of sEMG sensors and algorithms for silent speech recognition. Journal of Neural Engineering 15, 4 (jun 2018), 046031. https://doi.org/10.1088/1741-2552/aac965
[26]
Himanshu Sahni, Abdelkareem Bedri, Gabriel Reyes, Pavleen Thukral, Zehua Guo, Thad Starner, and Maysam Ghovanloo. 2014. The Tongue and Ear Interface: A Wearable System for Silent Speech Recognition. In Proceedings of the 2014 ACM International Symposium on Wearable Computers (Seattle, Washington) (ISWC ’14). Association for Computing Machinery, New York, NY, USA, 47–54. https://doi.org/10.1145/2634317.2634322
[27]
Tanja Schultz. 2010. ICCHP Keynote: Recognizing Silent and Weak Speech Based on Electromyography. In Computers Helping People with Special Needs, Klaus Miesenberger, Joachim Klaus, Wolfgang Zagler, and Arthur Karshmer (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 595–604.
[28]
Tanmay Srivastava, Prerna Khanna, Shijia Pan, Phuc Nguyen, and Shubham Jain. 2022. MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 6, 3, Article 140 (sep 2022), 26 pages. https://doi.org/10.1145/3550281
[29]
László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó, and Tamás Gábor Csapó. 2018. Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces. In Proc. Interspeech 2018. 3172–3176. https://doi.org/10.21437/Interspeech.2018-1078
[30]
You Wang, Ming Zhang, RuMeng Wu, Han Gao, Meng Yang, Zhiyuan Luo, and Guang Li. 2020. Silent Speech Decoding Using Spectrogram Features Based on Neuromuscular Activities. Brain Sciences 10, 7 (2020). https://doi.org/10.3390/brainsci10070442
[31]
Kele Xu, Yuxiang Wu, and Zhifeng Gao. 2019. Ultrasound-Based Silent Speech Interface Using Sequential Convolutional Auto-Encoder. In Proceedings of the 27th ACM International Conference on Multimedia (Nice, France) (MM ’19). Association for Computing Machinery, New York, NY, USA, 2194–2195. https://doi.org/10.1145/3343031.3350596
[32]
Ruidong Zhang, Mingyang Chen, Benjamin Steeper, Yaxuan Li, Zihan Yan, Yizhuo Chen, Songyun Tao, Tuochao Chen, Hyunchul Lim, and Cheng Zhang. 2022. SpeeChin: A Smart Necklace for Silent Speech Recognition. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 4, Article 192 (dec 2022), 23 pages. https://doi.org/10.1145/3494987
[33]
Ruidong Zhang, Ke Li, Yihong Hao, Yufan Wang, Zhengnan Lai, François Guimbretière, and Cheng Zhang. 2023. EchoSpeech: Continuous Silent Speech Recognition on Minimally-Obtrusive Eyewear Powered by Acoustic Sensing. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 852, 18 pages. https://doi.org/10.1145/3544548.3580801
[34]
Yongzhao Zhang, Yi-Chao Chen, Haonan Wang, and Xingyu Jin. 2021. CELIP: Ultrasonic-Based Lip Reading with Channel Estimation Approach for Virtual Reality Systems. In Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers (Virtual, USA) (UbiComp ’21). Association for Computing Machinery, New York, NY, USA, 580–585. https://doi.org/10.1145/3460418.3480163

Cited By

View all
  • (2024)Ring-a-Pose: A Ring for Continuous Hand Pose TrackingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997418:4(1-30)Online publication date: 21-Nov-2024
  • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
  • (2024)MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on EyeglassesProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676619(96-103)Online publication date: 5-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISWC '23: Proceedings of the 2023 ACM International Symposium on Wearable Computers
October 2023
145 pages
ISBN:9798400701993
DOI:10.1145/3594738
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Acoustic Sensing
  2. Commodity-off-the-shelf (COTS)
  3. Headphones
  4. Silent Speech

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

Conference

UbiComp/ISWC '23

Acceptance Rates

Overall Acceptance Rate 38 of 196 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)352
  • Downloads (Last 6 weeks)72
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Ring-a-Pose: A Ring for Continuous Hand Pose TrackingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997418:4(1-30)Online publication date: 21-Nov-2024
  • (2024)Whispering Wearables: Multimodal Approach to Silent Speech Recognition with Head-Worn DevicesProceedings of the 26th International Conference on Multimodal Interaction10.1145/3678957.3685720(214-223)Online publication date: 4-Nov-2024
  • (2024)MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on EyeglassesProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676619(96-103)Online publication date: 5-Oct-2024
  • (2024)Functional Now, Wearable Later: Examining the Design Practices of Wearable TechnologistsProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676615(71-81)Online publication date: 5-Oct-2024
  • (2024)EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric VideosProceedings of the 2024 ACM International Symposium on Wearable Computers10.1145/3675095.3676611(40-47)Online publication date: 5-Oct-2024
  • (2024)Unvoiced: Designing an LLM-assisted Unvoiced User Interface using EarablesProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699374(784-798)Online publication date: 4-Nov-2024
  • (2024)Enabling Hands-Free Voice Assistant Activation on EarphonesProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661890(155-168)Online publication date: 3-Jun-2024
  • (2024)GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass FrameProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649376(497-512)Online publication date: 29-May-2024
  • (2024)EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a WristbandProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642910(1-21)Online publication date: 11-May-2024
  • (2024)EyeEcho: Continuous and Low-power Facial Expression Tracking on GlassesProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642613(1-24)Online publication date: 11-May-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media