skip to main content
10.1145/3474085.3479204acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

A Solution to Multi-modal Ads Video Tagging Challenge

Published: 17 October 2021 Publication History

Abstract

In this paper, we present our solution to the Multi-modal Ads Video Tagging Challenge of Tencent Advertising Algorithm Competition in ACM Multimedia 2021 Grand Challenges. We extend the baseline model by redesigning the visual feature extraction procedure and we modify the loss function to cope with sparse positive targets. Moreover, we propose Semi-supervised Learning with Negative Masking to leverage both labeled data and unlabeled data from the preliminary contest which effectively enhances the training process. We further utilize Cross-Class Relevance Learning to boost the performance. We achieve 0.8237 GAP score via model ensemble and rank the second place among all submissions in the challenge.

Supplementary Material

MP4 File (MM21-gch3317.mp4)
In this video, we present our solution to the Multi-modal Ads Video Tagging Challenge in ACM Multimedia 2021 Grand Challenges. We extend the baseline model by redesigning the visual feature extraction procedure and we modify the loss function to cope with sparse positive targets. Moreover, we propose Semi-supervised Learning with Negative Masking to leverage both labeled data and unlabeled data from the preliminary contest which effectively enhances the training process. We further utilize Cross-Class Relevance Learning to boost the performance. We achieve 0.8237 GAP score via model ensemble and rank the second place among all submissions in the challenge.

References

[1]
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675 (2016).
[2]
Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2020. Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119 (2020).
[3]
Amanda Clare and Ross D King. 2001. Knowledge discovery in multi-label phenotype data. In European conference on principles of data mining and knowledge discovery. Springer, 42--53.
[4]
Shaoguo Cui, Lei Mao, Jingfeng Jiang, Chang Liu, and Shuyu Xiong. 2018. Automatic semantic segmentation of brain gliomas from MRI images using a deep cascaded neural network. Journal of healthcare engineering, Vol. 2018 (2018).
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[6]
Deepti Ghadiyaram, Du Tran, and Dhruv Mahajan. 2019. Large-scale weakly-supervised pre-training for video action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12046--12055.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[8]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et almbox. 2017. CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp). IEEE, 131--135.
[9]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[10]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 7132--7141.
[11]
Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H Maier-Hein. 2017. Brain tumor segmentation and radiomics survival prediction: Contribution to the brats 2017 challenge. In International MICCAI Brainlesion Workshop. Springer, 287--297.
[12]
Fabian Isensee, Philipp Kickingereder, Wolfgang Wick, Martin Bendszus, and Klaus H Maier-Hein. 2018. No new-net. In International MICCAI Brainlesion Workshop. Springer, 234--244.
[13]
Robert A Jacobs, Michael I Jordan, Steven J Nowlan, and Geoffrey E Hinton. 1991. Adaptive mixtures of local experts. Neural computation, Vol. 3, 1 (1991), 79--87.
[14]
Konstantinos Kamnitsas, Wenjia Bai, Enzo Ferrante, Steven McDonagh, Matthew Sinclair, Nick Pawlowski, Martin Rajchl, Matthew Lee, Bernhard Kainz, Daniel Rueckert, et al. 2017. Ensembles of multiple models and architectures for robust brain tumour segmentation. In International MICCAI brainlesion workshop. Springer, 450--462.
[15]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[16]
Rongcheng Lin, Jing Xiao, and Jianping Fan. 2018. Nextvlad: An efficient neural network to aggregate frame-level features for large-scale video classification. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops. 0--0.
[17]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.
[18]
Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Ilya Stanevich, and Guangwei Yu. 2019. Cross-Class Relevance Learning for Temporal Concept Localization. arXiv preprint arXiv:1911.08548 (2019).
[19]
Q. Meng. 2018. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. (2018).
[20]
Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine learning, Vol. 85, 3 (2011), 333--359.
[21]
Du Tran, Heng Wang, Lorenzo Torresani, and Matt Feiszli. 2019. Video classification with channel-separated convolutional networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5552--5561.
[22]
Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. 2010. Random k-labelsets for multilabel classification. IEEE transactions on knowledge and data engineering, Vol. 23, 7 (2010), 1079--1089.
[23]
Yixin Wang, Yao Zhang, Feng Hou, Yang Liu, Jiang Tian, Cheng Zhong, Yang Zhang, and Zhiqiang He. 2020. Modality-Pairing Learning for Brain Tumor Segmentation. arXiv preprint arXiv:2010.09277 (2020).
[24]
Hao Wu, Jiangchao Yao, Jiajie Wang, Yinru Chen, Ya Zhang, and Yanfeng Wang. 2019. Collaborative Label Correction via Entropy Thresholding. In 2019 IEEE International Conference on Data Mining (ICDM). 1390--1395. https://doi.org/10.1109/ICDM.2019.00179
[25]
Hao Wu, Jiangchao Yao, Ya Zhang, and Yanfeng Wang. 2021. Cooperative Learning for Noisy Supervision. In 2021 IEEE International Conference on Multimedia and Expo (ICME). https://doi.org/10.1109/ICME51207.2021.9428133
[26]
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, and Quoc V Le. 2020. Self-training with noisy student improves imagenet classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10687--10698.
[27]
Min-Ling Zhang and Zhi-Hua Zhou. 2007. ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, Vol. 40, 7 (2007), 2038--2048.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multi-label classification
  2. multi-modal learning
  3. semi-supervised learning with negative masking

Qualifiers

  • Short-paper

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)4
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media