research-article

Descriptive visual words and visual phrases for image applications

Authors:

Shiliang Zhang,

Qi Tian,

Gang Hua,

Qingming Huang,

Shipeng LiAuthors Info & Claims

MM '09: Proceedings of the 17th ACM international conference on Multimedia

Pages 75 - 84

https://doi.org/10.1145/1631272.1631285

Published: 19 October 2009 Publication History

Get Access

Abstract

The Bag-of-visual Words (BoW) image representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the words in texts. However, massive experiments show that the commonly used visual words are not as expressive as the text words, which is not desirable because it hinders their effectiveness in various applications. In this paper, Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, novel descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs from classic visual words for various applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain scenes or objects are identified as the DVWs and DVPs. Experiments show that the DVWs and DVPs are compact and descriptive, thus are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including image retrieval, image re-ranking, and object recognition. The DVW and DVP combination outperforms the classic visual words by 19.5% and 80% in image retrieval and object recognition tasks, respectively. The DVW and DVP based image re-ranking algorithm: DWPRank outperforms the state-of-the-art VisualRank by 12.4% in accuracy and about 11 times faster in efficiency.

References

[1]

S. Battiato, G. M. Farinella, G. Gallo, and D. Ravi. Spatial hierarchy of textons distribution for scene classification. Proc. Eurocom Multimedia Modeling, pp. 333--342, 2009.

Abstract

References

Cited By

Index Terms

Recommendations

Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications

Visual content representation using semantically similar visual words

Fractal dimension of bag-of-visual words

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations