skip to main content
research-article

Unsupervised Person Re-identification: Clustering and Fine-tuning

Published: 10 October 2018 Publication History

Abstract

The superiority of deeply learned pedestrian representations has been reported in very recent literature of person re-identification (re-ID). In this article, we consider the more pragmatic issue of learning a deep feature with no or only a few labels. We propose a progressive unsupervised learning (PUL) method to transfer pretrained deep representations to unseen domains. Our method is easy to implement and can be viewed as an effective baseline for unsupervised re-ID feature learning. Specifically, PUL iterates between (1) pedestrian clustering and (2) fine-tuning of the convolutional neural network (CNN) to improve the initialization model trained on the irrelevant labeled dataset. Since the clustering results can be very noisy, we add a selection operation between the clustering and fine-tuning. At the beginning, when the model is weak, CNN is fine-tuned on a small amount of reliable examples that locate near to cluster centroids in the feature space. As the model becomes stronger, in subsequent iterations, more images are being adaptively selected as CNN training samples. Progressively, pedestrian clustering and the CNN model are improved simultaneously until algorithm convergence. This process is naturally formulated as self-paced learning. We then point out promising directions that may lead to further improvement. Extensive experiments on three large-scale re-ID datasets demonstrate that PUL outputs discriminative features that improve the re-ID accuracy. Our code has been released at https://github.com/hehefan/Unsupervised-Person-Re-identification-Clustering-and-Fine-tuning.

References

[1]
David Arthur and Sergei Vassilvitskii. 2007. k-means++: The advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). 1027--1035.
[2]
Boris Babenko, Ming-Hsuan Yang, and Serge J. Belongie. 2009. Visual tracking with online multiple instance learning. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’09). 983--990. Retrieved from
[3]
Song Bai, Xiang Bai, and Qi Tian. 2017. Scalable person re-identification on supervised smoothed manifold. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3356--3365. Retrieved from
[4]
S. Bai, X. Bai, Q. Tian, and L. J. Latecki. 2018. Regularized diffusion process on bidirectional context for object retrieval. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved from
[5]
Song Bai, Zhichao Zhou, Jingdong Wang, Xiang Bai, Longin Jan Latecki, and Qi Tian. 2017. Ensemble diffusion for retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 774--783. Retrieved from
[6]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09). 41--48. Retrieved from
[7]
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1268--1277. Retrieved from
[8]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1320--1329. Retrieved from
[9]
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. 2016. Person re-identification by multi-channel parts-based CNN with improved triplet loss function. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1335--1344. Retrieved from
[10]
Cheng Deng, Zhaojia Chen, Xianglong Liu, Xinbo Gao, and Dacheng Tao. 2018. Triplet-based deep hashing network for cross-modal retrieval. IEEE Trans. Image Processing 27, 8 (2018), 3893--3903. Retrieved from
[11]
Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, and Deyu Meng. 2018. Few-example object detection with model communication. IEEE Trans. Pattern Anal. Mach. Intell. (2018). Retrieved from
[12]
Hehe Fan, Xiaojun Chang, De Cheng, Yi Yang, Dong Xu, and Alexander G. Hauptmann. 2017. Complex event detection by identifying reliable shots from untrimmed videos. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 736--744. Retrieved from
[13]
Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 2360--2367. Retrieved from
[14]
Pedro F. Felzenszwalb, Ross B. Girshick, David A. McAllester, and Deva Ramanan. {n.d.}. Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32, 9, 1627--1645. Retrieved from
[15]
Mengyue Geng, Yaowei Wang, Tao Xiang, and Yonghong Tian. 2016. Deep transfer learning for person re-identification. arXiv abs/1611.05244.
[16]
Douglas Gray and Hai Tao. 2008. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision (ECCV’08). 262--275. Retrieved from
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778. Retrieved from
[18]
Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv abs/1703.07737.
[19]
Lu Jiang, Deyu Meng, Shoou-I Yu, Zhen-Zhong Lan, Shiguang Shan, and Alexander G. Hauptmann. 2014. Self-paced learning with diversity. In Proceedings of the Annual Conference on Neural Information Processing Systems. 2078--2086.
[20]
Elyor Kodirov, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2016. Person re-identification by unsupervised l<sub>1</sub> graph learning. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 178--195. Retrieved from
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems. 1106--1114.
[22]
M. Pawan Kumar, Benjamin Packer, and Daphne Koller. 2010. Self-paced learning for latent variable models. In Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 1189--1197.
[23]
Wei Li, Rui Zhao, Tong Xiao, and Xiaogang Wang. 2014. DeepReID: Deep filter pairing neural network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 152--159. Retrieved from
[24]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z. Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2197--2206. Retrieved from
[25]
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, and Yi Yang. 2017. Improving person re-identification by attribute and identity learning. arXiv abs/1703.07220.
[26]
Chunxiao Liu, Chen Change Loy, Shaogang Gong, and Guijin Wang. 2013. POP: Person re-identification post-rank optimisation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 441--448. Retrieved from
[27]
Hao Liu, Jiashi Feng, Meibin Qi, Jianguo Jiang, and Shuicheng Yan. 2017. End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Processing 26, 7 (2017), 3492--3506. Retrieved from
[28]
Jiawei Liu, Zheng-Jun Zha, Q. I. Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet CNN for person re-identification. In Proceedings of the 2016 ACM Conference on Multimedia Conference (MM’16). 192--196. Retrieved from
[29]
Xinchen Liu, Wu Liu, Tao Mei, and Huadong Ma. 2018. PROVID: Progressive and multimodal vehicle reidentification for large-scale urban surveillance. IEEE Trans. Multimedia 20, 3 (2018), 645--658. Retrieved from
[30]
Fan Ma, Deyu Meng, Qi Xie, Zina Li, and Xuanyi Dong. 2017. Self-paced co-training. In Proceedings of the 34th International Conference on Machine Learning (ICML’17). 2275--2284.
[31]
Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210. Retrieved from
[32]
Zhigang Ma, Xiaojun Chang, Yi Yang, Nicu Sebe, and Alexander G. Hauptmann. 2017. The many shades of negativity. IEEE Trans. Multimedia 19, 7 (2017), 1558--1568. Retrieved from
[33]
Peixi Peng, Tao Xiang, Yaowei Wang, Massimiliano Pontil, Shaogang Gong, Tiejun Huang, and Yonghong Tian. 2016. Unsupervised cross-dataset transfer learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1306--1315. Retrieved from
[34]
Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2016. CNN image retrieval learns from BoW: Unsupervised fine-tuning with hard examples. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 3--20. Retrieved from
[35]
Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In Proceedings of the European Conference on Computer Vision (ECCV’16). 17--35. Retrieved from
[36]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115, 3 (2015), 211--252. Retrieved from
[37]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 815--823. Retrieved from
[38]
Yifan Sun, Liang Zheng, Weijian Deng, and Shengjin Wang. 2017. SVDNet for pedestrian retrieval. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3820--3828. Retrieved from
[39]
Rahul Rama Varior, Mrinal Haloi, and Gang Wang. 2016. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 791--808. Retrieved from
[40]
Hanxiao Wang, Shaogang Gong, Xiatian Zhu, and Tao Xiang. 2016. Human-in-the-loop person re-identification. In Proceedings of the 14th European Conference on Computer Vision (ECCV’16). 405--422. Retrieved from
[41]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2014. Person re-identification by video ranking. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 688--703. Retrieved from
[42]
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2017. Person transfer GAN to bridge domain gap for person re-identification. arXiv abs/1711.08565.
[43]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5177--5186.
[44]
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1249--1258. Retrieved from
[45]
Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3376--3385. Retrieved from
[46]
Chenggang Yan, Hongtao Xie, Shun Liu, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Effective uyghur language text detection in complex background images for traffic prompt identification. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 220--229. Retrieved from
[47]
Chenggang Yan, Hongtao Xie, Dongbao Yang, Jian Yin, Yongdong Zhang, and Qionghai Dai. 2018. Supervised hash coding with deep neural network for environment perception of intelligent vehicles. IEEE Trans. Intell. Transport. Syst. 19, 1 (2018), 284--295. Retrieved from
[48]
Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. {n. d.}. Enhancing person re-identification in a self-trained subspace. TOMCCAP 13, 3, 27:1--27:23. Retrieved from
[49]
Yi Yang, Zhigang Ma, Alexander G. Hauptmann, and Nicu Sebe. 2013. Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15, 3 (2013), 661--669. Retrieved from
[50]
Yang Yang, Longyin Wen, Siwei Lyu, and Stan Z. Li. 2017. Unsupervised learning of multi-level descriptors for person re-identification. In Proceedings of the 21st AAAI Conference on Artificial Intelligence. 4306--4312.
[51]
Mang Ye, Chao Liang, Yi Yu, Zheng Wang, Qingming Leng, Chunxia Xiao, Jun Chen, and Ruimin Hu. 2016. Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing. IEEE Trans. Multimedia 18, 12 (2016), 2553--2566. Retrieved from
[52]
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li. 2014. Deep metric learning for person re-identification. In Proceedings of the 22nd International Conference on Pattern Recognition (ICPR’14). 34--39. Retrieved from
[53]
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1239--1248. Retrieved from
[54]
Ying Zhang, Baohua Li, Huchuan Lu, Atshushi Irie, and Xiang Ruan. 2016. Sample-specific SVM learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1278--1287. Retrieved from
[55]
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Person re-identification by salience matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). 2528--2535. Retrieved from
[56]
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2013. Unsupervised salience learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3586--3593. Retrieved from
[57]
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Learning mid-level filters for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 144--151. Retrieved from
[58]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In Proceedings of the 14th European Conference On Compuer Vision (ECCV’16). 868--884. Retrieved from
[59]
Liang Zheng, Yujia Huang, Huchuan Lu, and Yi Yang. 2017. Pose invariant embedding for deep person re-identification. arXiv abs/1701.07732 (2017).
[60]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1116--1124. Retrieved from
[61]
Liang Zheng, Shengjin Wang, Lu Tian, Fei He, Ziqiong Liu, and Qi Tian. 2015. Query-adaptive late fusion for image search and person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1741--1750. Retrieved from
[62]
Liang Zheng, Yi Yang, and Alexander G. Hauptmann. 2016. Person re-identification: Past, present and future. arXiv abs/1610.02984 (2016).
[63]
Liang Zheng, Yi Yang, and Qi Tian. 2018. SIFT meets CNN: A decade survey of instance retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1224--1244. Retrieved from
[64]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 3774--3782. Retrieved from
[65]
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 3652--3661. Retrieved from

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 14, Issue 4
Special Section on Deep Learning for Intelligent Multimedia Analytics
November 2018
221 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3282485
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2018
Accepted: 01 July 2018
Revised: 01 June 2018
Received: 01 February 2018
Published in TOMM Volume 14, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Large-scale person re-identification
  2. clustering
  3. convolutional neural network
  4. unsupervised learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Zhejiang Province Nature Science Foundation of China
  • 111 Project
  • Cooperative Research Centres Programme
  • Data to Decisions CRC (D2D CRC)
  • National Nature Science Foundation of China
  • National Key Research and Development Program of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)144
  • Downloads (Last 6 weeks)10
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media