skip to main content
research-article

Residual objectness for imbalance reduction

Published: 01 October 2022 Publication History

Highlights

We discover that the foreground-background imbalance in object detection could be addressed in a learning-based manner, without any hard-crafted resampling and reweighting schemes.
We propose a novel Residual Objectness (ResObj) mechanism to address the foreground-background imbalance in training object detectors. With a cascade architecture to gradually refine the objectness estimation, our ResObj module could address the imbalance in an endto- end way, thus avoiding laborious hyper-parameters tuning required by resampling and reweighting schemes.
We validate the proposed method on the COCO dataset with thorough ablation studies. For various detectors, our Residual Objectness steadily improves relative 3 % ∼ 4 % detection accuracy.

Abstract

As most object detectors rely on dense candidate samples to cover objects, they have always suffered from the extreme imbalance between very few foreground samples and numerous background samples during training, i.e., the foreground-background imbalance. Although several resampling and reweighting schemes (e.g., OHEM, Focal Loss, GHM) have been proposed to alleviate the imbalance, they are usually heuristic with multiple hyper-parameters, which is difficult to generalize on different object detectors and datasets. In this paper, we propose a novel Residual Objectness (ResObj) mechanism that adaptively learns how to address the foreground-background imbalance problem in object detection. Specifically, we first formulate the imbalance problems on all object classes as an imbalance problem on an “objectness” class. Then, we design multiple cascaded objectness estimators with residual connections for that objectness class to progressively distinguish the foreground samples from background samples. With our residual objectness mechanism, object detectors can learn how to address the foreground-background problem in an end-to-end way, rather than rely on hand-crafted resampling or reweighting schemes. Extensive experiments on the COCO benchmark demonstrate the effectiveness and compatibility of our method for various object detectors: the RetinaNet-ResObj, YOLOv3-ResObj and FasterRCNN-ResObj achieve relative 3 % ∼ 4 % Average Precision (AP) improvements compared with their vanilla models, respectively.

References

[1]
R.B. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014, pp. 580–587.
[2]
R.B. Girshick, Fast R-CNN, ICCV, 2015, pp. 1440–1448.
[3]
S. Ren, K. He, R.B. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Trans Pattern Anal Mach Intell 39 (6) (2017) 1137–1149.
[4]
J. Dai, Y. Li, K. He, J. Sun, R-FCN: object detection via region-based fully convolutional networks, NeurIPS, 2016, pp. 379–387.
[5]
K. He, G. Gkioxari, P. Dollár, R.B. Girshick, Mask R-CNN, ICCV, 2017, pp. 2980–2988.
[6]
Z. Cai, N. Vasconcelos, Cascade R-CNN: delving into high quality object detection, CVPR, 2018, pp. 6154–6162.
[7]
J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, D. Lin, Libra R-CNN: towards balanced learning for object detection, CVPR, 2019, pp. 821–830.
[8]
L. Zhu, Z. Xie, L. Liu, B. Tao, W. Tao, Iou-uniform R-CNN: breaking through the limitations of RPN, Pattern Recognit. 112 (2021) 107816.
[9]
J.R.R. Uijlings, K.E.A. van de Sande, T. Gevers, A.W.M. Smeulders, Selective search for object recognition, Int J Comput Vis 104 (2) (2013) 154–171.
[10]
C.L. Zitnick, P. Dollár, Edge boxes: Locating object proposals from edges, ECCV, 2014, pp. 391–405.
[11]
J. Redmon, S.K. Divvala, R.B. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, CVPR, 2016, pp. 779–788.
[12]
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S.E. Reed, C. Fu, A.C. Berg, SSD: single shot multibox detector, ECCV, 2016, pp. 21–37.
[13]
J. Redmon, A. Farhadi, YOLO9000: better, faster, stronger, CVPR, 2017, pp. 6517–6525.
[14]
T. Lin, P. Goyal, R.B. Girshick, K. He, P. Dollár, Focal loss for dense object detection, ICCV, 2017, pp. 2999–3007.
[15]
S. Zhang, L. Wen, X. Bian, Z. Lei, S.Z. Li, Single-shot refinement neural network for object detection, CVPR, 2018, pp. 4203–4212.
[16]
J. Redmon, A. Farhadi, Yolov3: an incremental improvement, arXiv:1804.02767 (2018).
[17]
L. Huang, Y. Yang, Y. Deng, Y. Yu, Densebox: unifying landmark localization with end to end object detection, arXiv:1509.04874 (2015).
[18]
H. Law, J. Deng, Cornernet: Detecting objects as paired keypoints, ECCV, 2018, pp. 765–781.
[19]
X. Zhou, J. Zhuo, P. Krähenbühl, Bottom-up object detection by grouping extreme and center points, CVPR, 2019, pp. 850–859.
[20]
J. Wang, K. Chen, S. Yang, C.C. Loy, D. Lin, Region proposal by guided anchoring, CVPR, 2019, pp. 2965–2974.
[21]
T. Kong, F. Sun, H. Liu, Y. Jiang, L. Li, J. Shi, Foveabox: beyond anchor-based object detector, IEEE Trans. Image Process. (2020) 7389–7398.
[22]
X. Zhou, D. Wang, P. Krähenbühl, Objects as points, arXiv:1904.07850 (2019).
[23]
Z. Tian, C. Shen, H. Chen, T. He, FCOS: fully convolutional one-stage object detection, ICCV, 2019, pp. 9627–9636.
[24]
B. Li, Y. Liu, X. Wang, Gradient harmonized single-stage detector, AAAI, 2019, pp. 8577–8584.
[25]
A. Shrivastava, A. Gupta, R.B. Girshick, Training region-based object detectors with online hard example mining, CVPR, 2016, pp. 761–769.
[26]
T. Kong, F. Sun, A. Yao, H. Liu, M. Lu, Y. Chen, RON: reverse connection with objectness prior networks for object detection, CVPR, 2017, pp. 5244–5252.
[27]
C. Chi, S. Zhang, J. Xing, Z. Lei, S.Z. Li, X. Zou, Selective refinement network for high performance face detection, AAAI, 2019, pp. 8231–8238.
[28]
J. Li, Y. Wang, C. Wang, Y. Tai, J. Qian, J. Yang, C. Wang, J. Li, F. Huang, DSFD: dual shot face detector, CVPR, 2019, pp. 5060–5069.
[29]
P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, L. Zhang, Bottom-up and top-down attention for image captioning and visual question answering, CVPR, 2018, pp. 6077–6086.
[30]
J. Yang, J. Lu, S. Lee, D. Batra, D. Parikh, Graph R-CNN for scene graph generation, ECCV, 2018, pp. 690–706.
[31]
S. Wu, J. Chen, T. Xu, L. Chen, L. Wu, Y. Hu, E. Chen, Linking the characters: Video-oriented social graph generation via hierarchical-cumulative GCN, ACM MM, 2021, pp. 4716–4724.
[32]
D. Li, T. Xu, P. Zhou, W. He, Y. Hao, Y. Zheng, E. Chen, Social context-aware person search in videos via multi-modal cues, ACM Trans. Inf. Syst. 40 (3) (2021) 1–25.
[33]
J. Johnson, R. Krishna, M. Stark, L. Li, D.A. Shamma, M.S. Bernstein, F. Li, Image retrieval using scene graphs, CVPR, 2015, pp. 3668–3678.
[34]
S. Jaradat, Deep cross-domain fashion recommendation, in: P. Cremonesi, F. Ricci, S. Berkovsky, A. Tuzhilin (Eds.), RecSys, 2017, pp. 407–410.
[35]
Z. Li, B. Wu, Q. Liu, L. Wu, H. Zhao, T. Mei, Learning the compositional visual coherence for complementary recommendations, IJCAI, 2020, pp. 3536–3543.
[36]
S. Yang, P. Luo, C.C. Loy, X. Tang, WIDER FACE: a face detection benchmark, CVPR, 2016, pp. 5525–5533.
[37]
J. Deng, J. Guo, E. Ververas, I. Kotsia, S. Zafeiriou, Retinaface: Single-shot multi-level face localisation in the wild, CVPR, 2020, pp. 5202–5211.
[38]
S. Zhao, H. Tao, Y. Zhang, T. Xu, K. Zhang, Z. Hao, E. Chen, A two-stage 3D CNN based learning method for spontaneous micro-expression recognition, Neurocomputing 448 (2021) 276–289.
[39]
P.A. Viola, M.J. Jones, Robust real-time face detection, Int J Comput Vis 57 (2) (2004) 137–154.
[40]
P.F. Felzenszwalb, R.B. Girshick, D.A. McAllester, Cascade object detection with deformable part models, CVPR, 2010, pp. 2241–2248.
[41]
M. Everingham, L.J.V. Gool, C.K.I. Williams, J.M. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge, Int J Comput Vis 88 (2) (2010) 303–338.
[42]
T. Lin, M. Maire, S.J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C.L. Zitnick, Microsoft COCO: common objects in context, ECCV, 2014, pp. 740–755.
[43]
Q. Chen, P. Wang, A. Cheng, W. Wang, Y. Zhang, J. Cheng, Robust one-stage object detection with location-aware classifiers, Pattern Recognit. 105 (2020) 107334.
[44]
B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, IEEE Trans Pattern Anal Mach Intell 34 (11) (2012) 2189–2202.
[45]
B. Alexe, T. Deselaers, V. Ferrari, What is an object?, CVPR, 2010, pp. 73–80.
[46]
M. Cheng, Y. Liu, W. Lin, Z. Zhang, P.L. Rosin, P.H.S. Torr, BING: Binarized normed gradients for objectness estimation at 300fps, Computational Visual Media 5 (1) (2019) 3–20.
[47]
M. Koziarski, Radial-based undersampling for imbalanced data classification, Pattern Recognit. 102 (2020) 107262.
[48]
Y. Cao, K. Chen, C.C. Loy, D. Lin, Prime sample attention in object detection, CVPR, 2020, pp. 11583–11591.
[49]
J. Wang, X. Tao, M. Xu, Y. Duan, J. Lu, Hierarchical objectness network for region proposal generation and object detection, Pattern Recognit 83 (2018) 260–272.
[50]
B. Yang, J. Yan, Z. Lei, S.Z. Li, CRAFT objects from images, CVPR, 2016, pp. 6043–6051.
[51]
M. Najibi, M. Rastegari, L.S. Davis, G-CNN: an iterative grid based object detector, CVPR, 2016, pp. 2369–2377.
[52]
A. Ghodrati, A. Diba, M. Pedersoli, T. Tuytelaars, L.V. Gool, Deepproposals: hunting objects and actions by cascading deep convolutional layers, Int J Comput Vis 124 (2) (2017) 115–131.
[53]
K. Cheng, Y. Chen, W. Fang, Improved object detection with iterative localization refinement in convolutional neural networks, IEEE Trans. Circuits Syst. Video Techn. 28 (9) (2018) 2261–2275.
[54]
H. Fan, H. Ling, Siamese cascaded region proposal networks for real-time visual tracking, CVPR, 2019, pp. 7952–7961.
[55]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CVPR, 2016, pp. 770–778.
[56]
T. Lin, P. Dollár, R.B. Girshick, K. He, B. Hariharan, S.J. Belongie, Feature pyramid networks for object detection, CVPR, 2017, pp. 936–944.
[57]
R. Girshick, I. Radosavovic, G. Gkioxari, P. Dollár, K. He, Detectron, 2018, https://github.com/facebookresearch/detectron.
[58]
J. Wang, X. Tao, M. Xu, Y. Duan, J. Lu, Hierarchical objectness network for region proposal generation and object detection, Pattern Recognit. 83 (2018) 260–272.
[59]
J. Redmon, Darknet: Open source neural networks in c, 2013–2016, http://pjreddie.com/darknet.
[60]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M.S. Bernstein, A.C. Berg, F. Li, Imagenet large scale visual recognition challenge, Int J Comput Vis 115 (3) (2015) 211–252.
[61]
M. Francisco, G. Ross, maskrcnn-benchmark, 2018, https://github.com/facebookresearch/maskrcnn-benchmark.

Index Terms

  1. Residual objectness for imbalance reduction
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Pattern Recognition
      Pattern Recognition  Volume 130, Issue C
      Oct 2022
      529 pages

      Publisher

      Elsevier Science Inc.

      United States

      Publication History

      Published: 01 October 2022

      Author Tags

      1. Object detection
      2. Class imbalance
      3. Residual objectness

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 21 Jan 2025

      Other Metrics

      Citations

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media