skip to main content
10.1145/3437963.3441810acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

Published: 08 March 2021 Publication History

Abstract

Scalability and accuracy are well recognized challenges in deep extreme multi-label learning where the objective is to train architectures for automatically annotating a data point with the most relevant subset of labels from an extremely large label set. This paper develops the DeepXML framework that addresses these challenges by decomposing the deep extreme multi-label task into four simpler sub-tasks each of which can be trained accurately and efficiently. Choosing different components for the four sub-tasks allows DeepXML to generate a family of algorithms with varying trade-offs between accuracy and scalability. In particular, DeepXML yields the Astec algorithm that could be 2-12% more accurate and 5-30x faster to train than leading deep extreme classifiers on publically available short text datasets. Astec could also efficiently train on Bing short text datasets containing up to 62 million labels while making predictions for billions of users and data points per day on commodity hardware. This allowed Astec to be deployed on the Bing search engine for a number of short text applications ranging from matching user queries to advertiser bid phrases to showing personalized ads where it yielded significant gains in click-through-rates, coverage, revenue and other online metrics over state-of-the-art techniques currently in production. DeepXML's code is available at https://github.com/Extreme-classification/deepxml.

References

[1]
R. Agrawal, A. Gupta, Y. Prabhu, and M. Varma. 2013. Multi-label learning with millions of labels: Recommending advertiser bid phrases for web pages. In WWW.
[2]
R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
[3]
R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. ML (2019).
[4]
X. Bai, E. Ordentlich, Y. Zhang, A. Feng, A. Ratnaparkhi, R. Somvanshi, and A. Tjahjadi. 2018. Scalable Query N-Gram Embedding for Improving Matching and Relevance in Sponsored Search. In KDD.
[5]
E. J. Barezi, I. D. W., P. Fung, and H. R. Rabiee. 2019. A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems. In NAACL.
[6]
K. Bhatia, K. Dahiya, H. Jain, A. Mittal, Y. Prabhu, and M. Varma. 2016. The Extreme Classification Repository: Multi-label Datasets & Code. http://manikvarma.org/downloads/XC/XMLRepository.html
[7]
K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain. 2015. Sparse Local Embeddings for Extreme Multi-label Classification. In NeurIPS.
[8]
W. Bi and J. Kwok. 2013. Efficient multi-label classification with many labels. In ICML.
[9]
A. Broder, P. Ciccolo, E. Gabrilovich, V. Josifovski, D. Metzler, L. Riedel, and J. Yuan. 2009. Online Expansion of Rare Queries for Sponsored Search. In WWW.
[10]
I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos. 2019. Extreme Multi-Label Legal Text Classification: A case study in EU Legislation. In NAACL.
[11]
W.-C. Chang, Yu H.-F., K. Zhong, Y. Yang, and I.-S. Dhillon. 2020 a. Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD.
[12]
W-C. Chang, F.-X. Yu, Y.-W. Chang, Y. Yang, and S. Kumar. 2020 b. Pre-training Tasks for Embedding-based Large-scale Retrieval. In ICLR.
[13]
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL (2019).
[14]
J. Gao, S. Xie, X. He, and A. Ali. 2012. Learning Lexicon Models from Search Logs for Query Expansion. In EMNLP.
[15]
C. Guo, A. Mousavi, X. Wu, D.-N. Holtmann-Rice, S. Kale, S. Reddi, and S. Kumar. 2019. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS.
[16]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
[17]
P. S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. 2013. Learning Deep Structured Semantic Models for Web Search using Clickthrough Data. In CIKM.
[18]
A. Ioannis, G. M. Hector, and C. C. Chi. 2008. Simrank+: Query Rewriting through Link Analysis of the Click Graph. In WWW.
[19]
H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma. 2019. Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM.
[20]
H. Jain, Y. Prabhu, and M. Varma. 2016. Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD.
[21]
K. Jasinska, K. Dembczynski, R. Busa-Fekete, K. Pfannschmidt, T. Klerx, and E. Hullermeier. 2016. Extreme F-measure Maximization using Sparse Probability Estimates. In ICML.
[22]
Y. Jernite, A. Choromanska, and D. Sontag. 2017. Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation. In ICML.
[23]
A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. 2017. Bag of Tricks for Efficient Text Classification. In EACL.
[24]
S. Khandagale, H. Xiao, and R. Babbar. 2019. Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification. Machine Learning (2019).
[25]
W. Krichene, N. Mayoraz, S. Rendle, L. Zhang, X. Yi, L. Hong, E. Chi, and J. Anderson. 2019. Efficient training on very large corpora via gramian estimation. In ICLR.
[26]
M. C. Lee, B. Gao, and R. Zhang. 2018. Rare Query Expansion Through Generative Adversarial Networks in Search Advertising. In KDD.
[27]
Y. Lian, Z. Chen, J. Hu, K. Zhang, C. Yan, M. Tong, W. Han, H. Guan, Y. Li, Y. Cao, Y. Yu, Z. Li, X. Liu, and Y. Wang. 2019. An end-to-end Generative Retrieval Method for Sponsored Search Engine -Decoding Efficiently into a Closed Target Domain. CoRR (2019).
[28]
J. Liu, W. Chang, Y. Wu, and Y. Yang. 2017. Deep Learning for Extreme Multi-label Text Classification. In SIGIR.
[29]
X. Liu, P. He, W. Chen, and J. Gao. 2019 a. Multi-Task Deep Neural Networks for Natural Language Understanding. In ACL.
[30]
Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019 b. Roberta: A robustly optimized bert pretraining approach. CoRR (2019).
[31]
A. Y. Malkov and D. A. Yashunin. 2016. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. CoRR (2016).
[32]
T. K. R. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava. 2019. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In NeurIPS.
[33]
Q. Mei, D. Zhou, and K. Church. 2008. Query Suggestion Using Hitting Time. In CIKM.
[34]
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In NeurIPS.
[35]
P. Mineiro and N. Karampatziakis. 2015. Fast Label Embeddings via Randomized Linear Algebra. In ECML/PKDD.
[36]
A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. 2021 a. DECAF: Deep Extreme Classification with Label Features. In WSDM.
[37]
A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma. 2021 b. ECLARE: Extreme Classification with Label Graph Correlations. In TheWebConf.
[38]
T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. CoRR (2018).
[39]
S. J. Pan and Q. Yang. 2010. A Survey on Transfer Learning. TKDE (2010).
[40]
Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma. 2018a. Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM.
[41]
Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma. 2018b. Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW.
[42]
Y. Prabhu and M. Varma. 2014. FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In KDD.
[43]
A. S Rawat, J. J. Chen, F. Yu, Suresh A. .T, and S. Kumar. 2019. Sampled softmax with random fourier features. In NeurIPS.
[44]
N. Reimers and I. Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. EMNLP.
[45]
D. Saini, A.K. Jain, Kushal. Dave, J. Jiao, A. Singh, R. Zhang, and M. Varma. 2021. GalaXC: Graph neural networks with labelwise attention for extreme classification. In TheWebConf.
[46]
W. Siblini, P. Kuntz, and F. Meyer. 2018. CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML.
[47]
Y. Tagami. 2017. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD.
[48]
Y. X. Wang, D. Ramanan, and M. Hebert. 2017. Learning to Model the Tail. In NeurIPS.
[49]
T. Wei, W. W. Tu, and Y. F. Li. 2019. Learning for Tail Label Data: A Label-Specific Feature Approach. In IJCAI.
[50]
M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski. 2018. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NeurIPS.
[51]
H. Ye, Z. Chen, D.-H. Wang, and B. D. Davison. 2020. Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification. In ICML.
[52]
C. Yejin, F. Marcus, G. Evgeniy, Vanja. J., M. Mauricio, and P. Bo. 2010. Using Landing Pages for Sponsored Search Ad Selection. In WWW.
[53]
E.H. I. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon. 2016. PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. In ICML.
[54]
I. Yen, S. Kale, F. Yu, D. Holtmann R., S. Kumar, and P. Ravikumar. 2018. Loss Decomposition for Fast Learning in Large Output Spaces. In ICML.
[55]
X. Yi, J. Yang, L. Hong, D. Z. Cheng, L. Heldt, A. Kumthekar, Z. Zhao, L. Wei, and E. Chi. 2019. Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations. In RecSys.
[56]
R. You, S. Dai, Z. Zhang, H. Mamitsuka, and S. Zhu. 2019. AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. In NeurIPS.
[57]
Z. Yuan, Z. Guo, Yu X., X. Wang, and T. Yang. 2020. Accelerating Deep Learning with Millions of Classes. In ECCV.
[58]
H. Zhou, M. Huang, Y. Mao, C. Zhu, P. Shu, and X. Zhu. 2019. Domain-Constrained Advertising Keyword Generation. In WWW.

Cited By

View all

Index Terms

  1. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining
    March 2021
    1192 pages
    ISBN:9781450382977
    DOI:10.1145/3437963
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bid-phrase recommendation
    2. extreme multi-label learning
    3. large-scale learning
    4. personalized ads
    5. short-text

    Qualifiers

    • Research-article

    Conference

    WSDM '21

    Acceptance Rates

    Overall Acceptance Rate 498 of 2,863 submissions, 17%

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media