article

The Pascal Visual Object Classes Challenge: A Retrospective

Authors:

Mark Everingham,

Christopher K. Williams,

Andrew ZissermanAuthors Info & Claims

International Journal of Computer Vision, Volume 111, Issue 1

Pages 98 - 136

https://doi.org/10.1007/s11263-014-0733-5

Published: 01 January 2015 Publication History

Abstract

The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008---2012. The paper is intended for two audiences: algorithm designers , researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers , who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community's progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012 ) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

References

[1]

Alexe, B., Deselaers, T., & Ferrari, V. (2010). What is an object? In Proceedings of Conference on Computer Vision and Pattern Recognition (pp. 73-80).

[2]

Alexiou, I., & Bharath, A. (2012). Efficient Kernels couple visual words through categorical opponency. In Proceedings of British Machine Vision Conference .

[3]

Bertail, P., Clémençon, S. J., & Vayatis, N. (2009). On bootstrapping the ROC curve. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in Neural Information Processing Systems (Vol. 21, pp. 137-144). Red Hook, NY: Curran Associates, Inc.

[4]

Carreira, J., Caseiro, R., Batista, J., & Sminchisescu, C. (2012). Semantic segmentation with second-order pooling. In Proceedings of European Conference on Computer Vision .

[5]

Chang, C. C., & Lin, C. J. (2011). LIBSVM: A library for support vector machines. Transactions on Intelligent Systems and Technology, 2 , 27:1-27:27. Software available at http://www.csie.ntu.edu. tw/~cjlin/libsvm.

[6]

Chen, Q., Song, Z., Hua, Y., Huang, Z., & Yan, S. (2012). Generalized hierarchical matching for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[7]

Csurka, G., Dance, C., Fan, L., Williamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In Proceedings of ECCV2004 Workshop on Statistical Learning in Computer Vision (pp. 59-74).

[8]

Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[9]

Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. CoRR abs/1310.1531.

[10]

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision , 88 , 303-338.

Digital Library

[11]

Farhadi, A., Endres, I., Hoiem, D., & Forsyth, D. (2009). Describing objects by their attributes. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1778-1785).

[12]

Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Transactions on Pattern Analysis and Machine Intelligence , 32 (9), 1627-1645.

Digital Library

[13]

Flickr website. (2013). http://www.flickr.com/.

[14]

Girshick, R. B., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[15]

Hall, P., Hyndman, R., & Fan, Y. (2004). Nonparametric confidence intervals for receiver operating characteristic curves. Biometrika , 91 , 743-50.

[16]

Hoai, M., Ladicky, L., & Zisserman, A. (2012). Action Recognition from Still Images by Aligning Body Parts. http://pascallin.ecs.soton. ac.uk/challenges/VOC/voc2012/workshop/segmentation_action_layout.pdf. Slides contained in the presentation by Luc van Gool on Overview and results of the segmentation challenge and action taster.

[17]

Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Proceedings of European Conference on Computer Vision .

[18]

Ion, A., Carreira, J., Sminchisescu, C. (2011a). Image segmentation by figure-ground composition into maximal cliques. In Proceedings of International Conference on Computer Vision .

[19]

Ion, A., Carreira, J., & Sminchisescu, C. (2011b). Probabilistic joint image segmentation and labeling. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 24, pp. 1827-1835). Red Hook, NY: Curran Associates, Inc.

[20]

Karaoglu, S., Van Gemert, J., & Gevers, T. (2012). Object reading: Text recognition for object recognition. In Proceedings of ECCV 2012 Workshops and Gemonstrations .

[21]

Khan, F., Anwer, R., Van de Weijer, J., Bagdanov, A., Vanrell, M., & Lopez, A. M. (2012a). Color attributes for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[22]

Khan, F., Van de Weijer, J., & Vanrell, M. (2012b). Modulating shape features by color attention for object recognition. International Journal of Computer Vision , 98 (1), 49-64.

Digital Library

[23]

Khosla, A., Yao, B., & Fei-Fei, L. (2011). Combining randomization and discrimination for fine-grained image categorization. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[24]

Krizhevsky, A., Sutskever, I., Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (Vol. 25, pp. 1106-1114). Red Hook, NY: Curran Associates, Inc.

[25]

Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of Conference on Computer Vision and Pattern Recognition (pp 2169-2178).

[26]

Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In Proceedings of ECCV Workshop on Statistical Learning in Computer Vision .

[27]

Lempitsky, V., & Zisserman, A. (2010). Learning to count objects in images. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel & A. Culotta (Eds.), Advances in Neural Information Processing Systems (Vol. 23, pp. 1324-1332). Red Hook, NY: Curran Associates, Inc.http://papers.nips.cc/paper/4043-learning-to-count-objects-in-images.pdf

[28]

Li, F., Carreira, J., Lebanon, G., & Sminchisescu, C. (2013). Composite statistical inference for semantic segmentation. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[29]

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision , 60 (2), 91-110.

Digital Library

[30]

Nanni, L., & Lumini, A. (2013). Heterogeneous bag-of-features for object/scene recognition. Applied Soft Computing , 13 (4), 2171-2178.

Digital Library

[31]

O'Connor, B. (2010). A response to "comparing Precision-Recall curves the Bayesian way?". A comment on the blog post by Bob Carpenter on Comparing Precision-Recall Curves the Bayesian Way? http://lingpipe-blog.com/2010/01/29/comparing-precision-recall-curves-bayesian-way/.

[32]

Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[33]

Russakovsky, O., Lin, Y., Yu, K., & Fei-Fei, L. (2012). Object-centric spatial pooling for image classification. In Proceedings of European Conference on Computer Vision .

[34]

Russell, B., Torralba, A., Murphy, K., & Freeman, W. T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, 77 (1-3), 157-173. http://labelme.csail.mit.edu/

Digital Library

[35]

Salton, G., & Mcgill, M. J. (1986). Introduction to modern information retrieval . New York, NY: McGraw-Hill Inc.

[36]

Sener, F., Bas, C., Ikizler-Cinbis, N. (2012). On recognizing actions in still images via multiple features. In Proceedings of ECCV Workshop on Action Recognition and Pose Estimation in Still Images .

[37]

Song, Z., Chen, Q., Huang, Z., Hua, Y., & Yan, S. (2011). Contextualizing object detection and classification. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[38]

Pascal VOC 2012 challenge results. (2012). http://pascallin.ecs.soton. ac.uk/challenges/VOC/voc2012/results/index.html.

[39]

Pascal VOC annotation guidelines. (2012). http://pascallin.ecs.soton. ac.uk/challenges/VOC/voc2012/guidelines.html.

[40]

Pascal VOC best practice guidelines. (2012). http://pascallin.ecs.soton. ac.uk/challenges/VOC/#bestpractice.

[41]

Pascal VOC evaluation server. (2012) http://host.robots.ox.ac.uk:8080/.

[42]

Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Proceedings of Conference on Computer Vision and Pattern Recognition, IEEE (pp. 1521-1528).

[43]

Uijlings, J., Van de Sande, K., Gevers, T., & Smeulders, A. (2013). Selective search for object recognition. International Journal of Computer Vision , 104 (2), 154-171.

Digital Library

[44]

Van de Sande, K., Uijlings, J., Gevers, T., & Smeulders, A. (2011). Segmentation as selective search for object recognition. In Proceedings of International Conference on Computer Vision .

[45]

Van Gemert, J. (2011). Exploiting photographic style for category-level image classification by generalizing the spatial pyramid. In Proceedings of International Conference on Multimedia Retrieval .

[46]

Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In International Conference on Computer Vision .

[47]

Viola, P., & Jones, M. (2004). Robust real-time object detection. International Journal of Computer Vision , 57 (2), 137-154.

Digital Library

[48]

Wang, X., Lin, L., Huang, L., & Yan, S. (2013). Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[49]

Wasserman, L. (2004). All of statistics . Berlin: Springer.

[50]

Xia, W., Song, Z., Feng, J., Cheong, L. F., & Yan, S. (2012). Segmentation over detection by coupled global and local sparse representations. In Proceedings of European Conference on Computer Vision .

[51]

Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[52]

Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR abs/1311.2901.

[53]

Zhu, L., Chen, Y., Yuille, A., & Freeman, W. (2010). Latent hierarchical structural learning for object detection. In Proceedings of Conference on Computer Vision and Pattern Recognition .

[54]

Zisserman, A., Winn, J., Fitzgibbon, A., Van Gool, L., Sivic, J., Williams, C., et al. (2012). In memoriam: Mark Everingham. Transactions on Pattern Analysis and Machine Intelligence , 34 (11), 2081-2082.

Digital Library

Cited By

Cheng DYin J(2024)Contrastive Representation Learning With Mixture-of-Instance-and-PixelInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.35649517:1(1-17)Online publication date: 16-Oct-2024
https://dl.acm.org/doi/10.4018/IJITSA.356495
Yang JLin CNie LKong ZWang JZhao Y(2024)Toward Oriented Fisheye Object Detection: Dataset and BaselineACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370264021:1(1-19)Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702640
Liu YWei P(2024)Cross-Prompt Adversarial Attack on Segment Anything ModelProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688653(34-39)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3688636.3688653
Show More Cited By

The Pascal Visual Object Classes Challenge: A Retrospective
1. Computing methodologies

Recommendations

The Pascal Visual Object Classes (VOC) Challenge

The Pascal Visual Object Classes (VOC) challenge is a benchmark in visual object category recognition and detection, providing the vision and machine learning communities with a standard dataset of images and annotation, and standard evaluation ...
ImageNet Large Scale Visual Recognition Challenge

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting ...
The 2005 PASCAL visual object classes challenge
MLCW'05: Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

The PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected:...

Comments

Information & Contributors

Information

Published In

cover image International Journal of Computer Vision

International Journal of Computer Vision Volume 111, Issue 1

January 2015

136 pages

ISSN:0920-5691

Issue’s Table of Contents

Copyright © Copyright © 2015 Springer Science+Business Media New York.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 January 2015

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,084
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng DYin J(2024)Contrastive Representation Learning With Mixture-of-Instance-and-PixelInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.35649517:1(1-17)Online publication date: 16-Oct-2024
https://dl.acm.org/doi/10.4018/IJITSA.356495
Yang JLin CNie LKong ZWang JZhao Y(2024)Toward Oriented Fisheye Object Detection: Dataset and BaselineACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370264021:1(1-19)Online publication date: 2-Nov-2024
https://dl.acm.org/doi/10.1145/3702640
Liu YWei P(2024)Cross-Prompt Adversarial Attack on Segment Anything ModelProceedings of the 2024 12th International Conference on Communications and Broadband Networking10.1145/3688636.3688653(34-39)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3688636.3688653
Li YHou YXie YSheng MWang MMa T(2024)Enhancing Non-Coal Object Recognition Using Deep Learning on Conveyor BeltsProceedings of the 2024 7th International Conference on Signal Processing and Machine Learning10.1145/3686490.3686500(67-74)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3686490.3686500
Guo ZHe XYang YQing LChen H(2024)DAG-YOLO: A Context-Feature Adaptive fusion Rotating Detection Network in Remote Sensing ImagesACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367497820:10(1-24)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3674978
Liu ZFeng YXu JXu B(2024)ObjTest: Object-Level Mutation for Testing Object Detection SystemsProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671400(61-70)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671400
Liu GLiu ZRen YLi HDing NLi K(2024)CSCNet: Cross-stage Shiff Convolution Network for Object DetectionProceedings of the 2024 5th International Conference on Computing, Networks and Internet of Things10.1145/3670105.3670158(314-319)Online publication date: 24-May-2024
https://dl.acm.org/doi/10.1145/3670105.3670158
Gao YZhang YHuang ZLiu NHuang DCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object DetectionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681176(8691-8700)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681176
Zhang XPiazentin Ono JHe WGou LSachan MMa KRen L(2024)Slicing, Chatting, and Refining: A Concept-Based Approach for Machine Learning Model Validation with ConceptSlicerProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645163(274-287)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645163
Xu YZhang DZhang SWu SFeng ZChen G(2024)Predictive and Near-Optimal Sampling for View Materialization in Video DatabasesProceedings of the ACM on Management of Data10.1145/36392742:1(1-27)Online publication date: 26-Mar-2024
https://dl.acm.org/doi/10.1145/3639274
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents