skip to main content
research-article

UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing

Published: 11 May 2024 Publication History

Abstract

We propose UltraCLR, a new contrastive learning framework that fuses dual modulation ultrasonic sensing signals to enhance gesture representation. Most existing ultrasound-based gesture recognition tasks rely on a large amount of manually labeled samples to learn task-specific representations via end-to-end training. However, they cannot exploit unlabeled continuous gesture signals that are easy to collect. Inspired by recent self-supervised learning techniques, UltraCLR aims to autonomously learn a ubiquitous gesture signal representation that can benefit all tasks from low-cost unlabeled signals. We use the STFT heatmap as a secondary input and leverage the contrastive learning framework to improve the high-quality Channel Impulsive Response heatmap input representations. The learned representations can better represent the spatial-position information and intermediate states of gesture movement. With the representation learned by UltraCLR, we can greatly reduce the complexity of downstream gesture recognition tasks so that they can be completed using a simple classifier trained with a small training set and a lower computational cost. Our experimental results show that UltraCLR outperforms state-of-the-art gesture recognition systems with only a few labeled samples and achieves more than 85% reduction in computational complexity and over 9× improvement in inference speed.

References

[1]
Sejal Bhalla, Mayank Goel, and Rushil Khurana. 2022. IMU2Doppler: Cross-modal domain adaptation for doppler-based activity recognition using IMU data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–20.
[2]
Romil Bhardwaj, Zhengxu Xia, Ganesh Ananthanarayanan, Junchen Jiang, Yuanchao Shu, Nikolaos Karianakis, Kevin Hsieh, Paramvir Bahl, and Ion Stoica. 2022. Ekya: Continuous learning of video analytics models on edge compute servers. In Proceedings of USENIX NSDI. 119–135.
[3]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. 2020. Unsupervised learning of visual features by contrasting cluster assignments. In Proceedings of NeurlPS. 9912–9924.
[4]
Youngjae Chang, Akhil Mathur, Anton Isopoussu, Junehwa Song, and Fahim Kawsar. 2020. A systematic study of unsupervised domain adaptation for robust human-activity recognition. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 4, 1 (2020), 1–30.
[5]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of ICML. 1597–1607.
[6]
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. 2020. Big self-supervised models are strong semi-supervised learners. In Proceedings of NeurlPS. 22243–22255 pages.
[7]
Xinlei Chen, Haoqi Fan, Ross B. Girshick, and Kaiming He. 2020. Improved baselines with momentum contrastive learning. CoRR abs/2003.04297 (2020), 1–20.
[8]
Yanjiao Chen, Meng Xue, Jian Zhang, Qianyun Guan, Zhiyuan Wang, Qian Zhang, and Wei Wang. 2021. ChestLive: Fortifying voice-based authentication with chest motion biometric on smart devices. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2021), 1–25.
[9]
Haiming Cheng and Wei Lou. 2021. Push the limit of device-free acoustic sensing on commercial mobile devices. In Proceedings of IEEE INFOCOM. 1–10.
[10]
Taesik Gong, Yeonsu Kim, Jinwoo Shin, and Sung-Ju Lee. 2019. Metasense: Few-shot adaptation to untrained conditions in deep mobile sensing. In Proceedings of ACM SenSys. 110–123.
[11]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, koray kavukcuoglu, Remi Munos, and Michal Valko. 2020. Bootstrap your own latent - a new approach to self-supervised learning. In Proceedings of NeurlPS. 21271–21284.
[12]
Kaiwen Guo, Hao Zhou, Ye Tian, Wangqiu Zhou, Yusheng Ji, and Xiang-Yang Li. 2022. Mudra: A multi-modal smartwatch interactive system with hand gesture recognition and user identification. In Proceedings of IEEE INFOCOM. 100–109.
[13]
Sidhant Gupta, Daniel Morris, Shwetak Patel, and Desney Tan. 2012. Soundwave: Using the doppler effect to sense gestures. In Proceedings of SIGCHI. 1911–1914.
[14]
Zijun Han, Lingchao Guo, Zhaoming Lu, Xiangming Wen, and Wei Zheng. 2020. Deep adaptation networks based gesture recognition using commodity WiFi. In Proceedings of IEEE WCNC. 1–7.
[15]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of IEEE CVPR. 9729–9738.
[16]
Eugene Hogenauer. 1981. An economical class of digital filters for decimation and interpolation. IEEE Trans. Acoust. Speech Sign. Process. 29, 2 (1981), 155–162.
[17]
Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, and Longbo Huang. 2021. What makes multi-modal learning better than single (provably). In Proceedings of NeurlPS. 10944–10956.
[18]
Hang Li, Xi Chen, Ju Wang, Di Wu, and Xue Liu. 2022. DAFI: WiFi-based device-free indoor localization via domain adaptation. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 4 (2022), 1–21.
[19]
Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, and Dina Katabi. 2019. Making the invisible bisible: Action recognition through walls and occlusions. In Proceedings of IEEE ICCV. 872–881.
[20]
Kang Ling, Haipeng Dai, Yuntang Liu, Alex X. Liu, Wei Wang, and Qing Gu. 2022. UltraGesture: Fine-grained gesture sensing and recognition. IEEE Trans. Mob. Comput. 21, 7 (2022), 2620–2636.
[21]
Chris Xiaoxuan Lu, Muhamad Risqi U. Saputra, Peijun Zhao, Yasin Almalioglu, Pedro P. B. De Gusmao, Changhao Chen, Ke Sun, Niki Trigoni, and Andrew Markham. 2020. MilliEgo: Single-chip MMWave radar aided egomotion estimation via deep sensor fusion. In Proceedings of ACM SenSys. 109–122.
[22]
Yang Qifan, Tang Hao, Zhao Xuebing, Li Yin, and Zhang Sanfeng. 2014. Dolphin: Ultrasonic-based gesture recognition on smartphone platform. In Proceedings of IEEE CSE. 1461–1468.
[23]
Wenjie Ruan, Quan Z. Sheng, Lei Yang, Tao Gu, Peipei Xu, and Longfei Shangguan. 2016. AudioGest: Enabling fine-grained hand gesture detection by decoding echo signal. In Proceedings of ACM UbiComp. 474–485.
[24]
Andrea Rosales Sanabria, Franco Zambonelli, and Juan Ye. 2021. Unsupervised domain adaptation in activity recognition: A GAN-based approach. IEEE Access 9 (2021), 19421–19438.
[25]
Nikunj Saunshi, Orestis Plevrakis, Sanjeev Arora, Mikhail Khodak, and Hrishikesh Khandeparkar. 2019. A theoretical analysis of contrastive unsupervised representation learning. In Proceedings of ICML. 5628–5637.
[26]
Zhiyao Sheng, Huatao Xu, Qian Zhang, and Dong Wang. 2022. Facilitating radar-based gesture recognition with self-supervised learning. In Proceedings of IEEE SECON. 154–162.
[27]
Ruiyuan Song, Dongheng Zhang, Zhi Wu, Cong Yu, Chunyang Xie, Shuai Yang, Yang Hu, and Yan Chen. 2022. RF-URL: Unsupervised representation learning for RF sensing. In Proceedings of ACM MobiCom. 282–295.
[28]
Ke Sun, Chen Chen, and Xinyu Zhang. 2020. “Alexa, stop spying on me!” speech privacy protection against voice assistants. In Proceedings of ACM SenSys. 298–311.
[29]
Ke Sun and Xinyu Zhang. 2021. UltraSE: Single-channel speech enhancement using ultrasound. In Proceedings of ACM MobiCom. 160–173.
[30]
Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of ACM MobiCom. 591–605.
[31]
Chi Ian Tang, Ignacio Perez-Pozuelo, Dimitris Spathis, Soren Brage, Nick Wareham, and Cecilia Mascolo. 2021. SelfHAR: Improving human activity recognition through self-training with unlabeled data. Proc. ACM Interact. Mob. Wear. Ubiq. Technol. 5, 1 (2021), 1–30.
[32]
Yonglong Tian, Dilip Krishnan, and Phillip Isola. 2020. Contrastive multiview coding. In Proceedings of ECCV. 776–794.
[33]
Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv:1807.03748. Retrieved from http://arxiv.org/abs/1807.03748.
[34]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008), 2579–2605.
[35]
Haoran Wan, Shuyu Shi, Wenyu Cao, Wei Wang, and Guihai Chen. 2021. RespTracker: Multi-user room-scale respiration tracking with commercial acoustic devices. In Proceedings of IEEE INFOCOM. 1–10.
[36]
Wei Wang, Alex X. Liu, and Ke Sun. 2016. Device-free gesture tracking using acoustic signals. In Proceedings of ACM MobiCom. 82–94.
[37]
Xun Wang, Ke Sun, Ting Zhao, Wei Wang, and Qing Gu. 2020. Dynamic speed warping: Similarity-based one-shot learning for device-free gesture signals. In Proceedings of IEEE INFOCOM. 556–565.
[38]
Yanwen Wang, Jiaxing Shen, and Yuanqing Zheng. 2022. Push the limit of acoustic gesture recognition. IEEE Trans. Mob. Comput. 21, 5 (2022), 1798–1811.
[39]
Rui Xiao, Jianwei Liu, Jinsong Han, and Kui Ren. 2021. OneFi: One-shot recognition for unseen gesture via COTS WiFi. In Proceedings of ACM SenSys. 206–219.
[40]
Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021. LIMU-BERT: Unleashing the potential of unlabeled data for IMU sensing applications. In Proceedings of ACM SenSys. 220–233.
[41]
Mang Ye, Xu Zhang, Pong C. Yuen, and Shih-Fu Chang. 2019. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of IEEE CVPR. 6210–6219.
[42]
Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen, and Yann LeCun. 2022. Decoupled contrastive learning. In Proceedings of ECCV. 668–684.
[43]
Sangki Yun, Yi-Chao Chen, Huihuang Zheng, Lili Qiu, and Wenguang Mao. 2017. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of ACM MobiSys. 15–28.
[44]
Jie Zhang, Zhanyong Tang, Meng Li, Dingyi Fang, Petteri Nurmi, and Zheng Wang. 2018. CrossSense: Towards cross-site and large-scale WiFi sensing. In Proceedings of ACM MobiCom. 305–320.
[45]
Mingmin Zhao, Tianhong Li, Mohammad Abu Alsheikh, Yonglong Tian, Hang Zhao, Antonio Torralba, and Dina Katabi. 2018. Through-wall human pose estimation using radio signals. In Proceedings of IEEE CVPR. 7356–7365.
[46]
Mingmin Zhao, Yonglong Tian, Hang Zhao, Mohammad Abu Alsheikh, Tianhong Li, Rumen Hristov, Zachary Kabelac, Dina Katabi, and Antonio Torralba. 2018. RF-based 3D skeletons. In Proceedings of ACM SIGCOMM. 267–281.
[47]
Han Zou, Jianfei Yang, Yuxun Zhou, Lihua Xie, and Costas J. Spanos. 2018. Robust WiFi-enabled device-free gesture recognition via unsupervised adversarial domain adaptation. In Proceedings of IEEE ICCCN. 1–8.

Cited By

View all

Index Terms

  1. UltraCLR: Contrastive Representation Learning Framework for Ultrasound-based Sensing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Sensor Networks
    ACM Transactions on Sensor Networks  Volume 20, Issue 4
    July 2024
    603 pages
    EISSN:1550-4867
    DOI:10.1145/3618082
    • Editor:
    • Wen Hu
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 11 May 2024
    Online AM: 29 May 2023
    Accepted: 08 May 2023
    Revised: 09 March 2023
    Received: 22 December 2022
    Published in TOSN Volume 20, Issue 4

    Check for updates

    Author Tags

    1. Ultrasound-based sensing
    2. contrastive learning
    3. gesture recognition

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • program B for Outstanding Ph.D. candidate of Nanjing University

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)301
    • Downloads (Last 6 weeks)31
    Reflects downloads up to 14 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media