skip to main content
10.1145/3581783.3611935acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Follow-me: Deceiving Trackers with Fabricated Paths

Published: 27 October 2023 Publication History

Abstract

Convolutional Neural Networks (CNNs) are vulnerable to adversarial attacks in which visually imperceptible perturbations can deceive CNN-based models. While current research on adversarial attacks in single object tracking exists, it overlooks a critical aspect of manipulating predicted trajectories to follow user-defined paths regardless of the actual location of the targeted object. To address this, we propose the very first white-box attack algorithm that is capable of deceiving victim trackers by compelling them to generate trajectories that adhere to predetermined counterfeit paths. Specifically, we focus on Siamese-based trackers as our victim models. Given an arbitrary counterfeit path, we first decompose it into discrete target locations in each frame, with the assumption of constant velocity. These locations are converted to heatmap anchors, which represent the offset of their location from the target object's location in the previous frame. Later on, we design a novel loss function to minimize the gap between above-mentioned anchors and our predicted ones. Finally, the gradients computed by such loss are used to update the original video, resulting in our adversarial video. To validate our ideas, we design three sets of counterfeit paths as well as novel evaluation metrics to measure the path-following properties. Experiments with two victim models on three publicly available datasets, OTB100, VOT2018, and VOT2016, demonstrate that our algorithm not only outperforms SOTA methods significantly under conventional evaluation metrics, e.g. 90% and 68.4% precision and successful rate drop on OTB100, but also follows the counterfeit paths well, which is beyond any existing attack methods. The source code is available at https://github.com/loushengtao/Follow-me.

References

[1]
Alexandru O. Balan and Michael J. Black. 2006. An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 1 (2006), 758--765.
[2]
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah. 1993. Signature verification using a "siamese" time delay neural network. Advances in neural information processing systems, Vol. 6 (1993).
[3]
Kai Chen and Wenbing Tao. 2017. Once for all: a two-flow convolutional neural network for visual tracking. IEEE Transactions on Circuits and Systems for Video Technology, Vol. 28, 12 (2017), 3377--3386.
[4]
Xuesong Chen, Xiyu Yan, Feng Zheng, Yong Jiang, Shu-Tao Xia, Yong Zhao, and Rongrong Ji. 2020a. One-shot adversarial attacks on visual tracking with dual attention. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10176--10185.
[5]
Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, and Rongrong Ji. 2020b. Siamese box adaptive network for visual tracking In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6668--6677.
[6]
Siyuan Cheng, Bineng Zhong, Guorong Li, Xin Liu, Zhenjun Tang, Xianxian Li, and Jing Wang. 2021. Learning to filter: Siamese relation network for robust tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4421--4431.
[7]
Dorin Comaniciu, Visvanathan Ramesh, and Peter Meer. 2000. Real-time tracking of non-rigid objects using mean shift. Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662), Vol. 2 (2000), 142--149 vol.2.
[8]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[9]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
[10]
Dongyan Guo, Yanyan Shao, Ying Cui, Zhenhua Wang, Liyan Zhang, and Chunhua Shen. 2021. Graph attention tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9543--9552.
[11]
Dongyan Guo, Jun Wang, Ying Cui, Zhenhua Wang, and Shengyong Chen. 2020a. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6269--6277.
[12]
Qing Guo, Xiaofei Xie, Felix Juefei-Xu, Lei Ma, Zhongguo Li, Wanli Xue, Wei Feng, and Yang Liu. 2020b. Spark: Spatial-aware online incremental attack against visual tracking. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXV 16. Springer, 202--219.
[13]
SJ Hadfield, R Bowden, and K Lebeda. 2016. The visual object tracking VOT2016 challenge results. Lecture Notes in Computer Science, Vol. 9914 (2016), 777--823.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[15]
Sudan Jha, Changho Seo, Eunmok Yang, and Gyanendra Prasad Joshi. 2021. Real time object detection and trackingsystem for video surveillance system. Multimedia Tools and Applications, Vol. 80 (2021), 3981--3996.
[16]
Shuai Jia, Yibing Song, Chao Ma, and Xiaokang Yang. 2021. Iou attack: Towards temporally coherent black-box adversarial attack for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6709--6718.
[17]
Matej Kristan, Ales Leonardis, Jiri Matas, Michael Felsberg, Roman Pflugfelder, Luka Čehovin Zajc, Tomas Vojir, Goutam Bhat, Alan Lukezic, Abdelrahman Eldesokey, et al. 2018. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European conference on computer vision (ECCV) workshops. 0--0.
[18]
Matej Kristan, Roman Pflugfelder, Ales Leonardis, Jiri Matas, Fatih Porikli, Luka Cehovin, Georg Nebehay, Gustavo Fernandez, Tomas Vojir, et al. 2014. The vot2013 challenge: overview and additional results. (2014).
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.
[20]
Alexey Kurakin, Ian J Goodfellow, and Samy Bengio. 2018. Adversarial examples in the physical world. In Artificial intelligence safety and security. Chapman and Hall/CRC, 99--112.
[21]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4282--4291.
[22]
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8971--8980.
[23]
Siao Liu, Zhaoyu Chen, Wei Li, Jiwei Zhu, Jiafeng Wang, Wenqiang Zhang, and Zhongxue Gan. 2022a. Efficient universal shuffle attack for visual object tracking. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2739--2743.
[24]
Yubin Liu, CB Sivaparthipan, and Achyut Shankar. 2022b. Human-computer interaction based visual feedback system for augmentative and alternative communication. International Journal of Speech Technology (2022), 1--10.
[25]
Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, and Luc Van Gool. 2022a. Transforming Model Prediction for Tracking. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022), 8721--8730.
[26]
Christoph Mayer, Martin Danelljan, Goutam Bhat, Matthieu Paul, Danda Pani Paudel, Fisher Yu, and Luc Van Gool. 2022b. Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8731--8740.
[27]
Seyed-Mohsen Moosavi-Dezfooli and Omar Fawzi Alhussein Fawzi. 2017. Pascal Frossard.". In Universal adversarial perturbations." 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28]
Banoth Thulasya Naik, Mohammad Farukh Hashmi, Zong Woo Geem, and Neeraj Dhanraj Bokde. 2022. DeepPlayer-track: player and referee tracking with jersey color recognition in soccer. IEEE Access, Vol. 10 (2022), 32494--32509.
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, Vol. 32 (2019).
[30]
Chinthaka Premachandra, Shohei Ueda, and Yuya Suzuki. 2020. Detection and tracking of moving objects at road intersections using a 360-degree camera for driver assistance and automated driving. IEEE Access, Vol. 8 (2020), 135652--135660.
[31]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015a. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, Vol. 28 (2015).
[32]
Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015b. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39 (2015), 1137--1149.
[33]
Erik B. Sudderth, Michael I. Mandel, William T. Freeman, and Alan S. Willsky. 2004. Visual Hand Tracking Using Nonparametric Belief Propagation. 2004 Conference on Computer Vision and Pattern Recognition Workshop (2004), 189--189.
[34]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[35]
Paul Voigtlaender, Jonathon Luiten, Philip HS Torr, and Bastian Leibe. 2020. Siam r-cnn: Visual tracking by re-detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6578--6588.
[36]
Walid Walid, Muhammad Awais, Ashfaq Ahmed, Guido Masera, and Maurizio Martina. 2021. Real-time implementation of fast discriminative scale space tracking algorithm. Journal of Real-Time Image Processing, Vol. 18, 6 (2021), 2347--2360.
[37]
Ning Wang, Wengang Zhou, Jie Wang, and Houqiang Li. 2021. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1571--1580.
[38]
Yi Wu, Jongwoo Lim, and Ming-Hsuan Yang. 2015. Object Tracking Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37 (2015), 1834--1848.
[39]
Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan Yuille. 2017. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE international conference on computer vision. 1369--1378.
[40]
Daitao Xing, Nikolaos Evangeliou, Athanasios Tsoukalas, and Anthony Tzes. 2021. Siamese Transformer Pyramid Networks for Real-Time UAV Tracking. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2021), 1898--1907.
[41]
Tianyang Xu, Zhenhua Feng, Xiao-Jun Wu, and Josef Kittler. 2021. Adaptive channel selection for robust visual object tracking with discriminative correlation filters. International Journal of Computer Vision, Vol. 129 (2021), 1359--1375.
[42]
Bin Yan, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2020. Cooling-shrinking attack: Blinding the tracker with imperceptible noises. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 990--999.
[43]
Seungwoo Yoo, Wonjun Kim, and Changick Kim. 2014. Saliency Combined Particle Filtering for Aircraft Tracking. Journal of Signal Processing Systems, Vol. 76 (2014), 19--31.
[44]
Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV). 101--117.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '23: Proceedings of the 31st ACM International Conference on Multimedia
October 2023
9913 pages
ISBN:9798400701085
DOI:10.1145/3581783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adversarial attack
  2. visual object tracking

Qualifiers

  • Research-article

Funding Sources

Conference

MM '23
Sponsor:
MM '23: The 31st ACM International Conference on Multimedia
October 29 - November 3, 2023
Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 116
    Total Downloads
  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)6
Reflects downloads up to 21 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media