skip to main content
10.1145/3204949.3208122acmconferencesArticle/Chapter ViewAbstractPublication PagesmmsysConference Proceedingsconference-collections
research-article

Subdiv17: a dataset for investigating subjectivity in the visual diversification of image search results

Published: 12 June 2018 Publication History

Abstract

In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.

References

[1]
Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. 2008. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding 110, 3 (2008), 346--359.
[2]
Anna Bosch, Andrew Zisserman, and Xavier Munoz. 2007. Representing Shape with a Spatial Pyramid Kernel. In ACM International Conference on Image and Video Retrieval (CIVR). ACM, New York, NY, USA, 401--408.
[3]
Olivier Chapelle, Donald Metlzer, Ya Zhang, and Pierre Grinspan. 2009. Expected Reciprocal Rank for Graded Relevance. In ACM Conference on Information and Knowledge Management (CIKM). ACM, New York, NY, USA, 621--630.
[4]
Savvas A. Chatzichristofis and Yiannis S. Boutalis. 2008. CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. In International Conference on Computer Vision Systems (ICCV). Springer-Verlag, Berlin, Heidelberg, 312--322.
[5]
Savvas A. Chatzichristofis and Yiannis S. Boutalis. 2008. FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval. In Internationsl Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS). IEEE Computer Society, Washington, DC, USA, 191--196.
[6]
Savvas A. Chatzichristofis, Yannis S. Boutalis, and Mathias Lux. 2009. Selection of the proper compact composite descriptor for improving content based image retrieval. In Signal Processing, Pattern Recognition and Applications (SPPRA). ACTA Press, Calgary, Canada, 134--140.
[7]
Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova. Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and Diversity in Information Retrieval Evaluation. In International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, USA. 659--666.
[8]
Jacob Cohen. 1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20, 1 (1960), 37--46.
[9]
Giorgos Giannopoulos, Marios Koniaris, Ingmar Weber, Alejandro Jaimes, and Timos Sellis. 2015. Algorithms and criteria for diversification of news article comments. Journal of Intelligent Information Systems 44, 1 (2015), 1--47.
[10]
Jean D. Gibbons and Subhabrata Chakraborti. 2010. Nonparametric Statistical Inference (Statistics: a Series of Textbooks and Monogrphs) (5 ed.). CRC, Boca Raton, FL, USA.
[11]
Alexandra Lucian Gînscă, Adrian Popescu, Bogdan Ionescu, Anil Armagan, and Ioannis Kanellos. 2014. Toward an Estimation of User Tagging Credibility for Social Image Retrieval. In ACM International Conference on Multimedia. ACM, New York, NY, USA, 1021--1024.
[12]
Kilem Li Gwet. 2014. Handbook of Inter-Rater Reliability: The Definitive Guide to Measuring The Extent of Agreement Among Raters (4 ed.). Advanced Analytics, LLC, Gaithersburg, MD, USA.
[13]
Bogdan Ionescu, Alexandra Lucian Gînscă, Bogdan Boteanu, Mihai Lupu, Adrian Popescu, and Henning Müller. 2016. Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries. In ACM Multimedia Systems Conference. ACM, New York, NY, USA, 46:1--46:6.
[14]
Bogdan Ionescu, Anca-Livia Radu, María Menéndez, Henning Müller, Adrian Popescu, and Babak Loni. 2014. Div400: A Social Image Retrieval Result Diversification Dataset. In ACM Multimedia Systems Conference. ACM, New York, NY, USA, 29--34.
[15]
Bogdan Ionescu, Anca-Livia Radu, María Menéndez, Henning Müller, Adrian Popescu, and Babak Loni. 2015. Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset. In ACM Multimedia Systems Conference. ACM, New York, NY, USA, 207--212.
[16]
Christoph Kofler, Martha Larson, and Alan Hanjalic. 2016. User Intent in Multimedia Search: A Survey of the State of the Art and Future Challenges. Comput. Surveys 49, 2 (2016), 36:1--36:37.
[17]
Marios Koniaris, Giorgos Giannopoulos, Timos Sellis, and Yiannis Vasileiou. 2014. Diversifying Microblog Posts. Springer International Publishing, Cham, 189--198.
[18]
Shangsong Liang, Fei Cai, Zhaochun Ren, and Maarten de Rijke. 2016. Efficient Structured Learning for Personalized Diversification. IEEE Transactions on Knowledge and Data Engineering 28, 11 (2016), 2958--2973.
[19]
Shangsong Liang, Emine Yilmaz, Hong Shen, Maarten De Rijke, and W. Bruce Croft. 2017. Search Result Diversification in Short Text Streams. ACM Transactions on Information Systems 36, 1 (2017), 8:1--8:35.
[20]
Mengwen Liu, Yi Fang, Alexander G. Choulos, Dae Hoon Park, and Xiaohua Hu. 2017. Product review summarization through question retrieval and diversification. Information Retrieval Journal 20, 6 (2017), 575--605.
[21]
Mathias Lux. 2011. Content Based Image Retrieval with LIRe. In ACM International Conference on Multimedia. ACM, New York, NY, USA, 735--738.
[22]
B.S. Manjunath, Jens-Rainer Ohm, Vinod V. Vasudevan, and Akio Yamada. 2001. Color and texture descriptors. IEEE Transactions on Circuits and Systems for Video Technology 11, 6 (2001), 703--715.
[23]
Mandar Mitra, Ramin Zabih, Jing Huang, Wei-Jing Zhu, and S. Ravi Kumar. 1997. Image Indexing Using Color Correlograms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Washington, DC, USA, 762--768.
[24]
Kaweh Djafari Naini, Ismail Sengor Altingovde, and Wolf Siberski. 2016. Scalable and Efficient Web Search Result Diversification. ACM Transactions on the Web 10, 3 (2016), 15:1--15:30.
[25]
Stefanie Nowak and Stefan Rüger. 2010. How Reliable Are Annotations via Crowdsourcing: A Study About Inter-annotator Agreement for Multi-label Image Annotation. In International Conference on Multimedia Information Retrieval (ICMR). ACM, New York, NY, USA, 557--566.
[26]
Makbule Gulcin Ozsoy, Kezban Dilek Onal, and Ismail Sengor Altingovde. 2014. Result Diversification for Tweet Search. Springer International Publishing, Cham, 78--89.
[27]
Robert Gilmore Pontius, Jr and Marco Millones. 2011. Death to Kappa: Birth of Quantity Disagreement and Allocation Disagreement for Accuracy Assessment. International Journal of Remote Sensing 32, 15 (2011), 4407--4429.
[28]
William M. Rand. 1971. Objective Criteria for the Evaluation of Clustering Methods. J. Amer. Statist. Assoc. 66, 336 (1971), 846--850.
[29]
Jean-Michel Renders and Gabriela Csurka. 2017. NLE@MediaEval'17: Combining Cross-Media Similarity and Embeddings for Retrieving Diverse Social Images. In MediaEval 2017 Multimedia Benchmark Workshop, Vol. 1984. CEUR-WS.org.
[30]
Rodrygo L. T. Santos, Craig Macdonald, and Iadh Ounis. 2015. Search Result Diversification. Foundations and Trends in Information Retrieval 9, 1 (2015), 1--90.
[31]
Alexander Strehl and Joydeep Ghosh. 2003. Cluster Ensembles - a Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Machine Learning Research 3 (2003), 583--617.
[32]
Aixin Sun and Sourav S. Bhowmick. 2009. Image Tag Clarity: In Search of Visual-representative Tags for Social Images. In SIGMM Workshop on Social Media. ACM, New York, NY, USA, 19--26.
[33]
Duong Chi Thang, Nguyen Thanh Tam, Nguyen Quoc Viet Hung, and Karl Aberer. 2015. An Evaluation of Diversification Techniques. Springer International Publishing, Cham, 215--231.
[34]
Jun Xu, Long Xia, Yanyan Lan, Jiafeng Guo, and Xueqi Cheng. 2017. Directly Optimize Diversity Evaluation Measures: A New Approach to Search Result Diversification. ACM Transactions on Intelligent Systems and Technology 8, 3 (2017), 41:1--41:26.
[35]
Maia Zaharieva, Bogdan Ionescu, Alexandru Lucian Gînscă, Rodrygo L.T. Santos, and Henning Müller. 2017. Retrieving Diverse Social Images at MediaEval 2017: Challenges, Dataset and Evaluation. In MediaEval 2017 Multimedia Benchmark Workshop, Vol. 1984. CEUR-WS.org.
[36]
Kaiping Zheng, Hongzhi Wang, Zhixin Qi, Jianzhong Li, and Hong Gao. 2016. A survey of query result diversification. Knowledge and Information Systems 51, 1 (2016), 1--36.
[37]
Yadong Zhu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng, and Shuzi Niu. 2014. Learning for Search Result Diversification. In ACM SIGIR Conference on Research Development in Information Retrieval. ACM, 293--302.

Cited By

View all

Index Terms

  1. Subdiv17: a dataset for investigating subjectivity in the visual diversification of image search results

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MMSys '18: Proceedings of the 9th ACM Multimedia Systems Conference
      June 2018
      604 pages
      ISBN:9781450351928
      DOI:10.1145/3204949
      • General Chair:
      • Pablo Cesar,
      • Program Chairs:
      • Michael Zink,
      • Niall Murray
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      In-Cooperation

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 June 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. annotation subjectivity
      2. benchmark dataset
      3. flickr
      4. image retrieval
      5. mediaeval
      6. search result diversification

      Qualifiers

      • Research-article

      Conference

      MMSys '18
      Sponsor:
      MMSys '18: 9th ACM Multimedia Systems Conference
      June 12 - 15, 2018
      Amsterdam, Netherlands

      Acceptance Rates

      Overall Acceptance Rate 176 of 530 submissions, 33%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 30 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media