skip to main content
10.1145/2072298.2072335acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Reading between the tags to predict real-world size-class for visually depicted objects in images

Published: 28 November 2011 Publication History

Abstract

Multimedia information retrieval stands to benefit from the availability of additional information about tags and how they relate to the content visually depicted in images. We propose a generic approach that contributes to improving the informativeness of image tags by combining generalizations about the distributional tendencies of physical objects in the real world and statistics of natural language use patterns that have been mined from the Web. The approach, which we refer to as 'Reading between the Tags,' provides for each tag associated with an image, first, a prediction concerning corporeality, i.e., whether or not the tag denotes a physical entity, and, then, concerning the real-world size of that entity, i.e., large, medium or small. Mining takes place using a set of Language Use Frames (LUFs) that are composed of natural language neighborhoods characteristic of tag classes. We validate our approach with a series of experiments on a set of images from the MIRFLICKR data set using ground truth created with standard crowdsourcing techniques. The main experiments demonstrate the effectiveness of our approach for size-class prediction. A further experiment shows that size-class prediction can be improved and made image-specific using general and relatively small sets of visual concepts. A final experiment confirms that the set of LUFs can also be chosen automatically via statistical feature selection.

References

[1]
Berg, T.L., and Berg, A.C. 2009. Finding iconic images, In Proc. of the Internet Vision Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR '09), 1--8.
[2]
Brodley, C., Lane, T., and Stough, T. 1999. Knowledge Discovery and Data Mining. American Scientist. 87(1), 5410.
[3]
Cilibrasi, R.L. and Vitanyi, P.M.B. 2007. The Google Simi-larity Distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370--383.
[4]
Doursat, R. and Petitot, J. 2005. Bridging the gap between vision and language: A mophodynamical model of spatial categories. In Proceedings of the International Joint Conference on Neural Networks (IJCNN '05).
[5]
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. Bradford Books.
[6]
Gupta, A. and Davis, L.S. 2008. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers. In Proceedings of the 10th European Conference on Computer Vision: Part I (ECCV '08), 16--29.
[7]
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1).
[8]
Hayward, W. and Tarr, M. 1995. Spatial language and spatial representation. Cognition. 55(1), 39--84.
[9]
Hearst, M. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics -- Vol. 2 (COLING '92), 539--545.
[10]
Huiskes, M.J. and Lew, M.S. 2008. The MIR Flickr retrieval evaluation. In Proceedings of the ACM International Confer-ence on Multimedia Information Retrieval (MIR '08), 39--43.
[11]
Huiskes, M.J., Thomee, B., and Lew, M.S. 2010. New trends and ideas in visual concept detection: the MIR Flickr retrieval evaluation initiative. In Proceedings of the ACM International Conference on Multimedia Information Retrieval (MIR '10), 527--536.
[12]
Hung, S.-H., Lin, C.-H., and Hong, J.-S. 2010. Web mining for event-based commonsense knowledge using lexico-syntactic pattern matching and semantic role labeling, Expert Systems with Applications, 37(1), 341--347.
[13]
Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29(3), 459--484.
[14]
Lee, S., Neve, W.D., and Ro, Y.M. 2010. Tag refinement in an image folksonomy using visual similarity and tag co-occurrence statistics. Image Commun. 25(10), 761--773.
[15]
Li, X., Snoek, C.G.M., and Worring, M. 2008. Learning tag relevance by neighbor voting for social image retrieval. In Proceedings of the ACM International Conference on Multi-media Information Retrieval (MIR '08), 180--187.
[16]
Liu, D., Hua, X.-S., Yang, L., Wang, M., and Zhang, H.-J. 2009. Tag ranking. In Proceedings of the International World Wide Web Conference (WWW '09), 351--360.
[17]
Purdue Online Writing Lab, Retrieved April 11, 2011, from http://owl.english.purdue.edu/owl/resource/594/01/
[18]
Quick Shot Artist. How to Compose a Picture, Retrieved April 11, 2011, from http://quickshotartist.com/Compose/
[19]
Randolph, J. J. 2008. Online Kappa Calculator. Retrieved April 10, 2011, from http://justus.randolph.name/kappa
[20]
Resnik, P. 1997. Selectional preference and sense disambig-uation. In Proc. of the SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How?, 52--57.
[21]
Sánchez, D. 2010. A methodology to learn ontological at-tributes from the web. Data & Knowledge Engineering, 69(6), 573--597.
[22]
Sawant, N., Li, J., Wang, J. 2011. Automatic image semantic interpretation using social action and tagging data. Multimedia Tools and Applications, 51(1), 213--246.
[23]
Sigurbjörnsson, B. and van Zwol, R. 2008. Flickr tag rec-ommendation based on collective knowledge. In Proceedings of the International World Wide Web Conference (WWW '08), 327--336.
[24]
Snoek, C. G. M. and Worring, M. 2009. Concept-Based Video Retrieval, Foundations and Trends in Information Re-trieval, 4(2), 215--322.
[25]
Wan, K.-W., Roy, S. 2010. Identifying and learning visual attributes for object recognition. In Proceedings of the Inter-national Conference on Image processing (ICIP'2010), 3893--3896.
[26]
Yang, K., Hua, X.-S., Wang, M., and Zhang, H.-J. 2010. Tagging tags. In Proceedings of ACM Multimedia (MM '10), 619--622.

Cited By

View all

Index Terms

  1. Reading between the tags to predict real-world size-class for visually depicted objects in images

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '11: Proceedings of the 19th ACM international conference on Multimedia
    November 2011
    944 pages
    ISBN:9781450306164
    DOI:10.1145/2072298
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 November 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. crowdsourcing
    2. image annotation
    3. lexico-syntactic patterns
    4. real-world scale
    5. selectional restrictions
    6. size
    7. user-contributed tags

    Qualifiers

    • Research-article

    Conference

    MM '11
    Sponsor:
    MM '11: ACM Multimedia Conference
    November 28 - December 1, 2011
    Arizona, Scottsdale, USA

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media