skip to main content
10.1145/3637528.3671813acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Truthful Bandit Mechanisms for Repeated Two-stage Ad Auctions

Published: 24 August 2024 Publication History

Abstract

Online advertising platforms leverage a two-stage auction architecture to deliver personalized ads to users with low latency. The first stage efficiently selects a small subset of promising candidates out of the complete pool of ads. In the second stage, an auction is conducted within the subset to determine the winning ad for display, using click-through-rate predictions from the second-stage machine learning model. In this work, we investigate the online learning process of the first-stage subset selection policy, while ensuring game-theoretic properties in repeated two-stage ad auctions. Specifically, we model the problem as designing a combinatorial bandit mechanism with a general reward function, as well as additional requirements of truthfulness and individual rationality (IR). We establish an O(T) regret lower bound for truthful bandit mechanisms, which demonstrates the challenge of simultaneously achieving allocation efficiency and truthfulness. To circumvent this impossibility result, we introduce truthful α-approximation oracles and evaluate the bandit mechanism through α-approximation regret. Two mechanisms are proposed, both of which are ex-post truthful and ex-post IR. The first mechanism is an explore-then-commit mechanism with regret O(T2/3 ), and the second mechanism achieves an improved O(log T /ΔΦ2) regret where ΔΦ is a distribution-dependent gap, but requires additional assumptions on the oracles and information about the strategic bidders.

Supplemental Material

MP4 File - rtfp0782-video
Promotional Video for paper "Truthful Bandit Mechanisms for Repeated Two-stage Ad Auctions".

References

[1]
Kumar Abhishek, Shweta Jain, and Sujit Gujar. 2020. Designing Truthful Contextual Multi-Armed Bandits based Sponsored Search Auctions. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems (Auckland, New Zealand) (AAMAS '20). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1732--1734.
[2]
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th annual foundations of computer science. IEEE, 322--331.
[3]
Moshe Babaioff, Robert D Kleinberg, and Aleksandrs Slivkins. 2015. Truthful mechanisms with implicit payment computation. Journal of the ACM (JACM), Vol. 62, 2 (2015), 1--37.
[4]
Moshe Babaioff, Yogeshwer Sharma, and Aleksandrs Slivkins. 2014. Characterizing Truthful Multi-armed Bandit Mechanisms. SIAM J. Comput., Vol. 43, 1 (2014), 194--230.
[5]
Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, and Pinyan Lu. 2016. Combinatorial multi-armed bandit with general reward functions. Advances in Neural Information Processing Systems, Vol. 29 (2016).
[6]
Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, et al. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st workshop on deep learning for recommender systems. 7--10.
[7]
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM conference on recommender systems. 191--198.
[8]
Nikhil R Devanur and Sham M Kakade. 2009. The price of truthfulness for pay-per-click auctions. In Proceedings of the 10th ACM conference on Electronic commerce. 99--106.
[9]
Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz. 1956. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. The Annals of Mathematical Statistics (1956), 642--669.
[10]
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A system for recommending 3 billion items to 200 million users in real-time. In Proceedings of the 2018 world wide web conference. 1775--1784.
[11]
Zhe Feng, Christopher Liaw, and Zixin Zhou. 2023. Improved online learning algorithms for CTR prediction in ad auctions. In International Conference on Machine Learning. PMLR, 9921--9937.
[12]
Luke Gallagher, Ruey-Cheng Chen, Roi Blanco, and J. Shane Culpepper. 2019. Joint Optimization of Cascade Ranking Models. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM '19). 15--23.
[13]
Gagan Goel, Renato Paes Leme, Jon Schneider, David Thompson, and Hanrui Zhang. 2023. Eligibility Mechanisms: Auctions Meet Information Retrieval. In Proceedings of the ACM Web Conference 2023. 3541--3549.
[14]
F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), Vol. 5, 4 (2015), 1--19.
[15]
Jiri Hron, Karl Krauth, Michael Jordan, and Niki Kilbertus. 2021. On Component Interactions in Two-Stage Recommender Systems. In Advances in Neural Information Processing Systems, Vol. 34. 2744--2757.
[16]
Jiri Hron, Karl Krauth, Michael I. Jordan, and Niki Kilbertus. 2020. Exploration in two-stage recommender systems. arxiv: 2009.08956 [cs.IR]
[17]
Xu Huang, Defu Lian, Jin Chen, Liu Zheng, Xing Xie, and Enhong Chen. 2023. Cooperative Retriever and Ranker in Deep Recommenders. In Proceedings of the ACM Web Conference 2023. 1150--1161.
[18]
Wang-Cheng Kang and Julian McAuley. 2019. Candidate Generation with Binary Codes for Large-Scale Top-N Recommendation. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM '19). 1523--1532.
[19]
Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press, Chapter 6, 97--98.
[20]
Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H. Chi. 2020. Off-policy Learning in Two-stage Recommender Systems. In Proceedings of The Web Conference 2020. 463--473.
[21]
Xu Ma, Pengjie Wang, Hui Zhao, Shaoguo Liu, Chuhan Zhao, Wei Lin, Kuang-Chih Lee, Jian Xu, and Bo Zheng. 2021. Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach. 2036--2040.
[22]
Pascal Massart. 1990. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The annals of Probability (1990), 1269--1283.
[23]
Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.
[24]
Lequn Wang and Thorsten Joachims. 2023. Uncertainty Quantification for Fairness in Two-Stage Recommender Systems. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 940--948.
[25]
Yiqing Wang, Xiangyu Liu, Zhenzhe Zheng, Zhilin Zhang, Miao Xu, Chuan Yu, and Fan Wu. [n.,d.]. On Designing a Two-stage Auction for Online Advertising. In Proceedings of the ACM Web Conference 2022. 90--99.
[26]
Zhe Wang, Liqin Zhao, Biye Jiang, Guorui Zhou, Xiaoqiang Zhu, and Kun Gai. 2020. COLD: Towards the Next Generation of Pre-Ranking System. arxiv: 2007.16122 [cs.IR]
[27]
Haike Xu and Jian Li. 2021. Simple combinatorial algorithms for combinatorial bandits: Corruptions and approximations. In Uncertainty in Artificial Intelligence. PMLR, 1444--1454.
[28]
Yinglun Xu, Bhuvesh Kumar, and Jacob Abernethy. 2023. On the robustness of epoch-greedy in multi-agent contextual bandit mechanisms. arXiv preprint arXiv:2307.07675 (2023).
[29]
Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi. 2019. Sampling-bias-corrected neural modeling for large corpus item recommendations. In Proceedings of the 13th ACM Conference on Recommender Systems. 269--277.
[30]
Mengxiao Zhang and Haipeng Luo. 2023. Online Learning in Contextual Second-Price Pay-Per-Click Auctions. arXiv preprint arXiv:2310.05047 (2023).
[31]
Mengyan Zhang, Thanh Nguyen-Tang, Fangzhao Wu, Zhenyu He, Xing Xie, and Cheng Soon Ong. 2022. Two-Stage Neural Contextual Bandits for Personalised News Recommendation. arxiv: 2206.14648 [cs.IR]
[32]
Zhishan Zhao, Jingyue Gao, Yu Zhang, Shuguang Han, Siyuan Lou, Xiang-Rong Sheng, Zhe Wang, Han Zhu, Yuning Jiang, Jian Xu, and Bo Zheng. 2023. COPR: Consistency-Oriented Pre-Ranking for Online Advertising. arxiv: 2306.03516 [cs.IR]
[33]
Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, and Kun Gai. 2019. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 5941--5948.
[34]
Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. 2018. Deep interest network for click-through rate prediction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining. 1059--1068.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2024
6901 pages
ISBN:9798400704901
DOI:10.1145/3637528
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. mechanism design
  2. multi-armed bandit
  3. online advertising
  4. online learning

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 157
    Total Downloads
  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)22
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media