skip to main content
research-article

Simple and Scalable Response Prediction for Display Advertising

Published: 29 December 2014 Publication History

Abstract

Clickthrough and conversation rates estimation are two core predictions tasks in display advertising. We present in this article a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: It is easy to implement and deploy, it is highly scalable (we have trained it on terabytes of data), and it provides models with state-of-the-art accuracy.

References

[1]
A. Agarwal, O. Chapelle, M. Dudík, and J. Langford. 2011. A reliable effective terascale linear learning system. CoRR abs/1110.4198 (2011).
[2]
D. Agarwal, R. Agrawal, R. Khanna, and N. Kota. 2010. Estimating rates of rare events with multiple hierarchies through scalable log-linear models. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 213--222.
[3]
Azin Ashkan, Charles L. A. Clarke, Eugene Agichtein, and Qi Guo. 2009. Estimating ad clickthrough rate through query intent analysis. In Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.
[4]
P. Auer, N. Cesa-Bianchi, and P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 2 (2002), 235--256.
[5]
Francis Bach, Rodolphe Jenatton, Julien Mairal, and Guillaume Obozinski. 2011. Optimization with sparsity-inducing penalties. Foundations and Trends in Machine Learning 4, 1 (2011), 1--106.
[6]
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc.
[7]
B. H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13, 7 (1970), 422--426.
[8]
K. Canini, T. Chandra, E. Ie, J. McFadden, K. Goldman, M. Gunter, J. Harmsen, K. LeFevre, D. Lepikhin, T. L. Llinares, I. Mukherjee, F. Pereira, J. Redstone, T. Shaked, and Y. Singer. 2012. Sibyl: A system for large scale supervised machine learning. (2012). Presentation at MLSS Santa Cruz, http://users.soe.ucsc.edu/niejiazhong/slides/chandra.pdf.
[9]
D. Chakrabarti, D. Agarwal, and V. Josifovski. 2008. Contextual advertising by combining relevance with click feedback. In Proceedings of the 17th International Conference on World Wide Web. 417--426.
[10]
Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. 2010. Training and testing low-degree polynomial data mappings via linear SVM. The Journal of Machine Learning Research 11 (2010), 1471--1490.
[11]
O. Chapelle and L. Li. 2011. An empirical evaluation of thompson sampling. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira, and K. Q. Weinberger (Eds.). 2249--2257.
[12]
S. F. Chen and J. Goodman. 1999. An empirical study of smoothing techniques for language modeling. Computer Speech & Language 13, 4 (1999), 359--393.
[13]
Haibin Cheng and Erick Cantú-Paz. 2010. Personalized click prediction in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 351--360.
[14]
Haibin Cheng, Roelof van Zwol, Javad Azimi, Eren Manavoglu, Ruofei Zhang, Yang Zhou, and Vidhya Navalpakkam. 2012. Multimedia features for click prediction of new ads in display advertising. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 777--785.
[15]
C. T. Chu, S. K. Kim, Y. A. Lin, Y. Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. 2007. Map-reduce for machine learning on multicore. In Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, Vol. 19.
[16]
Massimiliano Ciaramita, Vanessa Murdock, and Vassilis Plachouras. 2008. Online learning from click data for sponsored search. In Proceedings of the 17th International Conference on World Wide Web. 227--236.
[17]
C. Cortes, Y. Mansour, and M. Mohri. 2010. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems, Vol. 23. 442--450.
[18]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1 (2008), 107--113.
[19]
John Duchi, Elad Hazan, and Yoram Singer. 2010. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2010), 2121--2159.
[20]
Theodoros Evgeniou and Massimiliano Pontil. 2004. Regularized multi-task learning. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 109--117.
[21]
A. Gelman and J. Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
[22]
John C. Gittins. 1989. Multi-armed Bandit Allocation Indices. John Wiley & Sons.
[23]
T. Graepel, J. Quinonero Candela, T. Borchert, and R. Herbrich. 2010. Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning. 13--20.
[24]
I. Guyon and A. Elisseeff. 2003. An introduction to variable and feature selection. The Journal of Machine Learning Research 3 (2003), 1157--1182.
[25]
D. Hillard, E. Manavoglu, H. Raghavan, C. Leggetter, E. Cantú-Paz, and R. Iyer. 2011. The sum of its parts: Reducing sparsity in click estimation with query segments. Information Retrieval (2011), 1--22.
[26]
D. Hillard, S. Schroedl, E. Manavoglu, H. Raghavan, and C. Leggetter. 2010. Improving ad relevance in sponsored search. In Proceedings of the 3rd ACM International Conference on Web Search and Data Mining. 361--370.
[27]
Michael Kearns. 1993. Efficient noise-tolerant learning from statistical queries. In Proceedings of the 25th Annual ACM Symposium on the Theory of Computing. 392--401.
[28]
G. King and L. Zeng. 2001. Logistic regression in rare events data. Political Analysis 9, 2 (2001), 137--163.
[29]
H. Koepke and M. Bilenko. 2012. Fast prediction of new feature utility. In Proceedings of the 29th International Conference on Machine Learning. 791--798.
[30]
Nagaraj Kota and Deepak Agarwal. 2011. Temporal multi-hierarchy smoothing for estimating rates of rare events. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1361--1369.
[31]
T. L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985), 4--22.
[32]
J. Langford, L. Li, and A. Strehl. 2007. Vowpal Wabbit Open Source Project. https://github.com/JohnLangford/vowpal_wabbit/wiki. (2007).
[33]
L. Li, W. Chu, J. Langford, and R. E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web. 661--670.
[34]
L. Li, W. Chu, J. Langford, and X. Wang. 2011. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 297--306.
[35]
Yandong Liu, Sandeep Pandey, Deepak Agarwal, and Vanja Josifovski. 2012. Finding the right consumer: Optimizing for conversion in display advertising campaigns. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 473--482.
[36]
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A new framework for parallel machine learning. In The 26th Conference on Uncertainty in Artificial Intelligence.
[37]
James MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, CA, 281--297.
[38]
R. P. McAfee. 2011. The design of advertising exchanges. Review of Industrial Organization (2011), 1--17.
[39]
H. B. McMahan and M. Streeter. 2010. Adaptive bound optimization for online convex optimization. In Proceedings of the 23rd Annual Conference on Learning Theory. 244--256.
[40]
C. Meek, D. M. Chickering, and D. Wilson. 2005. Stochastic and contingent payment auctions. In Workshop on Sponsored Search Auctions, ACM Electronic Commerce.
[41]
Lukas Meier, Sara Van De Geer, and Peter Bühlmann. 2008. The group lasso for logistic regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 1 (2008), 53--71.
[42]
S. Menard. 2001. Applied Logistic Regression Analysis. Vol. 106. Sage Publications, Inc.
[43]
Aditya Krishna Menon, Krishna-Prasad Chitrapura, Sachin Garg, Deepak Agarwal, and Nagaraj Kota. 2011. Response prediction using collaborative filtering with hierarchies and side-information. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 141--149.
[44]
T. P. Minka. 2003. A Comparison of Numerical Optimizers for Logistic Regression. Technical Report. Microsoft Research. Retrieved from http://research.microsoft.com/en-us/um/people/minka/papers/logreg/.
[45]
S. Muthukrishnan. 2009. Ad exchanges: Research issues. In Proceedings of the 5th International Workshop on Internet and Network Economics. 1--12.
[46]
K. Nigam, J. Lafferty, and A. McCallum. 1999. Using maximum entropy for text classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Vol. 1. 61--67.
[47]
J. Nocedal. 1980. Updating quasi-Newton matrices with limited storage. Mathematics of Computation 35, 151 (1980), 773--782.
[48]
A. B. Owen. 2007. Infinitely imbalanced logistic regression. The Journal of Machine Learning Research 8 (2007), 761--773.
[49]
Moira Regelson and Daniel C. Fain. 2006. Predicting click-through rate using keyword clusters. In Proceedings of the Second Workshop on Sponsored Search Auctions.
[50]
M. Richardson, E. Dominowska, and R. Ragno. 2007. Predicting clicks: Estimating the click-through rate for new ads. In Proceedings of the 16th International Conference on World Wide Web. New York, NY, 521--530.
[51]
R. Rosales and O. Chapelle. 2011. Attribute selection by measuring information on reference distributions. In Tech Pulse Conference, Yahoo!. Retrieved from http://people.csail.mit.edu/romer/papers/RosChaTP11.pdf.
[52]
R. Rosales, H. Cheng, and E. Manavoglu. 2012. Post-click conversion modeling and analysis for non-guaranteed delivery display advertising. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining. ACM, 293--302.
[53]
J. Sarkar. 1991. One-armed bandit problems with covariates. The Annals of Statistics (1991), 1978--2002.
[54]
B. Schölkopf and A. J. Smola. 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
[55]
Q. Shi, J. Petterson, G. Dror, J. Langford, A. Smola, and SVN Vishwanathan. 2009. Hash kernels for structured data. The Journal of Machine Learning Research 10 (2009), 2615--2637.
[56]
C. Teo, Q. Le, A. Smola, and SVN Vishwanathan. 2007. A scalable modular convex solver for regularized risk minimization. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 727--736.
[57]
William R. Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3--4 (1933), 285--294.
[58]
K. Weinberger, A. Dasgupta, J. Langford, A. Smola, and J. Attenberg. 2009. Feature hashing for large scale multitask learning. In Proceedings of the 26th Annual International Conference on Machine Learning. 1113--1120.
[59]
Jerry Ye, Jyh-Herng Chow, Jiang Chen, and Zhaohui Zheng. 2009. Stochastic gradient boosted distributed decision trees. In Proceeding of the 18th ACM Conference on Information and Knowledge Management. 2061--2064.
[60]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 15--28.

Cited By

View all
  • (2024)NEUROBIOLOGICAL PROPERTIES OF THE STRUCTURE OF THE PARALLEL-HIERARCHICAL NETWORK AND ITS USAGE FOR PATTERN RECOGNITIONNEUROBIOLOGICZNE WŁAŚCIWOŚCI STRUKTURY SIECI RÓWNOLEGŁO-HIERARCHICZNEJ I JEJ WYKORZYSTANIE DO ROZPOZNAWANIA WZORCÓWInformatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska10.35784/iapgos.621214:3(35-38)Online publication date: 30-Sep-2024
  • (2024)Intelligent systems and consumer neuroscience in the age of computational advertisingManagement & Marketing10.2478/mmcks-2024-002019:3(441-470)Online publication date: 24-Oct-2024
  • (2024)THE EFFECT OF ATTITUDE TOWARDS SOCIAL MEDIA ADVERTISEMENTS ON BRAND VALUE AND CONSUMER BEHAVIORFırat Üniversitesi Sosyal Bilimler Dergisi10.18069/firatsbed.148906234:3(1327-1343)Online publication date: 18-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 5, Issue 4
Special Sections on Diversity and Discovery in Recommender Systems, Online Advertising and Regular Papers
January 2015
390 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2699158
  • Editor:
  • Huan Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 December 2014
Accepted: 01 August 2013
Revised: 01 April 2013
Received: 01 December 2012
Published in TIST Volume 5, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Display advertising
  2. click prediction
  3. distributed learning
  4. feature selection
  5. hashing
  6. machine learning

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)54
  • Downloads (Last 6 weeks)8
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)NEUROBIOLOGICAL PROPERTIES OF THE STRUCTURE OF THE PARALLEL-HIERARCHICAL NETWORK AND ITS USAGE FOR PATTERN RECOGNITIONNEUROBIOLOGICZNE WŁAŚCIWOŚCI STRUKTURY SIECI RÓWNOLEGŁO-HIERARCHICZNEJ I JEJ WYKORZYSTANIE DO ROZPOZNAWANIA WZORCÓWInformatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska10.35784/iapgos.621214:3(35-38)Online publication date: 30-Sep-2024
  • (2024)Intelligent systems and consumer neuroscience in the age of computational advertisingManagement & Marketing10.2478/mmcks-2024-002019:3(441-470)Online publication date: 24-Oct-2024
  • (2024)THE EFFECT OF ATTITUDE TOWARDS SOCIAL MEDIA ADVERTISEMENTS ON BRAND VALUE AND CONSUMER BEHAVIORFırat Üniversitesi Sosyal Bilimler Dergisi10.18069/firatsbed.148906234:3(1327-1343)Online publication date: 18-Sep-2024
  • (2024)Online conversion rate prediction via multi-interval screening and synthesizing under delayed feedbackProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i8.28726(8796-8804)Online publication date: 20-Feb-2024
  • (2024)Research on the Influence of Short Video Marketing Strategies on Its Communication Effect--An Empirical Analysis Based on Hangzhou Asian Games Brand Weibo Short VideoProceedings of the 2024 International Conference on Cloud Computing and Big Data10.1145/3695080.3695129(278-283)Online publication date: 26-Jul-2024
  • (2024)Warming Up Cold-Start CTR Prediction by Learning Item-Specific Feature InteractionsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671784(3233-3244)Online publication date: 25-Aug-2024
  • (2024)Understanding the Ranking Loss for Recommendation with Sparse User FeedbackProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671565(5409-5418)Online publication date: 25-Aug-2024
  • (2024)Confidence-Aware Multi-Field Model CalibrationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680043(5111-5118)Online publication date: 21-Oct-2024
  • (2024)A Truthful Pricing-Based Defending Strategy Against Adversarial Attacks in Budgeted Combinatorial Multi-Armed BanditsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333524836:11(5529-5543)Online publication date: Nov-2024
  • (2024)MIFI: Combining Multi-Interest Activation and Implicit Feature Interaction for CTR PredictionsIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.331362211:2(2889-2900)Online publication date: Apr-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media