skip to main content
10.1145/3200947.3201007acmotherconferencesArticle/Chapter ViewAbstractPublication PagessetnConference Proceedingsconference-collections
research-article

Fully Unsupervised Convolutional Learning for Fast Image Retrieval

Published: 09 July 2018 Publication History

Abstract

In this paper, we propose a fully unsupervised method for optimizing the Convolutional Neural Network (CNN) representations towards image retrieval. To accomplish this goal, we obtain the feature representations from the activations of the convolutional layers of a CNN pretrained model, and then we adapt it to a lightweight fully convolutional network, which allows for producing compact image representations, reducing the storage requirements. Subsequently, we retrain the weights of the convolutional layers in order to produce more efficient compact image descriptors, which improve the retrieval performance both in terms of time and precision, in a fully unsupervised fashion. The experimental evaluation on three publicly available image retrieval datasets indicates the effectiveness of the proposed method in learning more efficient representations for the retrieval task, outperforming other unsu-pervised CNN-based retrieval techniques, as well as conventional hand-crafted feature-based approaches in all the used datasets.

References

[1]
Ahmad Alzu'bi, Abbes Amira, and Naeem Ramzan. 2017. Content-based image retrieval with compact deep convolutional features. Neurocomputing 249 (2017), 95--105.
[2]
Relja Arandjelovic and Andrew Zisserman. 2013. All about VLAD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1578--1585.
[3]
Artem Babenko and Victor Lempitsky. 2015. Aggregating local deep features for image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 1269--1277.
[4]
Artem Babenko, Anton Slesarev, Alexandr Chigorin, and Victor Lempitsky. 2014. Neural codes for image retrieval. In European Conference on Computer Vision (ECCV). Springer, 584--599.
[5]
Ritendra Datta, Jia Li, and James Z Wang. 2005. Content-based image retrieval: approaches and trends of the new age. In Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval. ACM, 253--262.
[6]
Li Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), e2.
[7]
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition. In ICML. 647--655.
[8]
Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. 2014. Multi-scale orderless pooling of deep convolutional activation features. In European Conference on Computer Vision (ECCV). Springer, 392--407.
[9]
Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus. 2016. Deep image retrieval: Learning global representations for image search. In European Conference on Computer Vision. Springer, 241--257.
[10]
Yanming Guo, Yu Liu, Ard Oerlemans, Songyang Lao, Song Wu, and Michael S Lew. 2016. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27--48.
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision. 1026--1034.
[12]
Hervé Jégou, Florent Perronnin, Matthijs Douze, Javier Sanchez, Pablo Perez, and Cordelia Schmid. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 9 (2012), 1704--1716.
[13]
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 675--678.
[14]
Yannis Kalantidis, Clayton Mellina, and Simon Osindero. 2015. Cross-dimensional weighting for aggregated deep convolutional features. In European Conference on Computer Vision (ECCV) Workshops. Springer, 685--701.
[15]
Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[16]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[17]
B Boser Le Cun, John S Denker, D Henderson, Richard E Howard, WHubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems 2. Morgan Kaufmann Publishers Inc., 396--404.
[18]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[19]
Ziqiong Liu, Shengjin Wang, and Qi Tian. 2016. Fine-residual VLAD for image retrieval. Neurocomputing 173 (2016), 1183--1191.
[20]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.
[21]
Eva Mohedano, Kevin McGuinness, Noel E O'Connor, Amaia Salvador, Ferran Marques, and Xavier Giro-i Nieto. 2016. Bags of Local Convolutional Features for Scalable Instance Search. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval. ACM, 327--331.
[22]
Joe Ng, Fan Yang, and Larry Davis. 2015. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 53--61.
[23]
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML). 689--696.
[24]
David Nister and Henrik Stewenius. 2006. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. IEEE, 2161--2168.
[25]
Nikolaos Passalis and Anastasios Tefas. 2016. Spatial bag of features learning for large scale face image retrieval. In INNS Conference on Big Data. Springer, 8--17.
[26]
Florent Perronnin, Yan Liu, Jorge Sánchez, and Hervé Poirier. 2010. Large-scale image retrieval with compressed fisher vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3384--3391.
[27]
James Philbin, Ond&rnrej Chum, Michael Isard, Josef Sivic, and Andrew Zisserman. 2008. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1--8.
[28]
Ali Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 806--813.
[29]
Joseph Redmon and Ali Farhadi. 2016. YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016).
[30]
Arnold WM Smeulders, Marcel Worring, Simone Santini, Amarnath Gupta, and Ramesh Jain. 2000. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 12 (2000), 1349--1380.
[31]
Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1701--1708.
[32]
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. CoRR abs/1511.05879 (2015).
[33]
Alexander Toshev and Christian Szegedy. 2014. Deeppose: Human pose estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1653--1660.
[34]
Danai Triantafyllidou, Paraskevi Nousi, and Anastasios Tefas. 2017. Lightweight two-stream convolutional face detection. In Signal Processing Conference (EU-SIPCO), 2017 25th European. IEEE, 1190--1194.
[35]
Maria Tzelepi and Anastasios Tefas. 2016. Exploiting supervised learning for finetuning deep CNNs in Content Based Image Retrieval. In 23nd International Conference on Pattern Recognition (ICPR). IEEE.
[36]
Maria Tzelepi and Anastasios Tefas. 2016. Relevance Feedback in Deep Convolutional Neural Networks for Content Based Image Retrieval. In Proceedings of the 9th Hellenic Conference on Artificial Intelligence (SETN '16). ACM, Article 27, 7 pages.
[37]
Maria Tzelepi and Anastasios Tefas. 2018. Deep convolutional image retrieval: A general framework. Signal Processing: Image Communication 63 (2018), 30--43.
[38]
Ellen M Voorhees. 1985. The cluster hypothesis revisited. In Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 188--196.
[39]
Ji Wan, Dayong Wang, Steven Chu Hong Hoi, Pengcheng Wu, Jianke Zhu, Yong-dong Zhang, and Jintao Li. 2014. Deep learning for content-based image retrieval: A comprehensive study. In Proceedings of the ACM International Conference on Multimedia. ACM, 157--166.
[40]
Wei Yu, Kuiyuan Yang, Hongxun Yao, Xiaoshuai Sun, and Pengfei Xu. 2017. Exploiting the complementary strengths of multi-layer CNN features for image retrieval. Neurocomputing 237 (2017), 235--241.
[41]
Wan-Lei Zhao, Hervé Jégou, and Guillaume Gravier. 2013. Oriented pooling for dense and non-dense rotation-invariant features. In BMVC-24th British Machine Vision Conference.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SETN '18: Proceedings of the 10th Hellenic Conference on Artificial Intelligence
July 2018
339 pages
ISBN:9781450364331
DOI:10.1145/3200947
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • EETN: Hellenic Artificial Intelligence Society
  • UOP: University of Patras
  • University of Thessaly: University of Thessaly, Volos, Greece

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Content Based Image Retrieval
  2. Convolutional Neural Networks
  3. Fully Unsupervised Retraining

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SETN '18

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media