skip to main content
research-article
Open access

Query Refinement for Diverse Top-k Selection

Published: 30 May 2024 Publication History

Abstract

Database queries are often used to select and rank items as decision support for many applications. As automated decision-making tools become more prevalent, there is a growing recognition of the need to diversify their outcomes. In this paper, we define and study the problem of modifying the selection conditions of an ORDER BY query so that the result of the modified query closely fits some user-defined notion of diversity while simultaneously maintaining the intent of the original query. We show the hardness of this problem and propose a mixed-integer linear programming (MILP) based solution. We further present optimizations designed to enhance the scalability and applicability of the solution in real-life scenarios. We investigate the performance characteristics of our algorithm and show its efficiency and the usefulness of our optimizations.

Supplemental Material

MP4 File
Presentation video (with captions)
PDF File
Slides

References

[1]
Abolfazl Asudeh, H. V. Jagadish, Julia Stoyanovich, and Gautam Das. 2019. Designing Fair Ranking Schemes. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1259--1276. https://doi.org/10.1145/3299869.3300079
[2]
Ricardo Baeza-Yates. 2018. Bias on the web. Commun. ACM 61, 6 (2018), 54--61. https://doi.org/10.1145/3209581
[3]
Pierre Bourhis, Daniel Deutch, and Yuval Moskovitch. 2016. Analyzing data-centric applications: Why, what-if, and how-to. In 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16--20, 2016. IEEE Computer Society, 779--790. https://doi.org/10.1109/ICDE.2016.7498289
[4]
Matteo Brucato, Azza Abouzied, and Alexandra Meliou. 2014. Improving package recommendations through query relaxation. In Proceedings of the First International Workshop on Bringing the Value of "Big Data" to Users, Data4U@VLDB 2014, Hangzhou, China, September 1, 2014, Rada Chirkova and Jun Yang (Eds.). ACM, 13. https://doi.org/10.1145/ 2658840.2658843
[5]
Matteo Brucato, Azza Abouzied, and Alexandra Meliou. 2018. Package queries: efficient and scalable computation of high-order constraints. VLDB J. 27, 5 (2018), 693--718. https://doi.org/10.1007/s00778-017-0483--4
[6]
Matteo Brucato, Rahul Ramakrishna, Azza Abouzied, and Alexandra Meliou. 2015. PackageBuilder: From Tuples to Packages. CoRR abs/1507.00942 (2015). arXiv:1507.00942 http://arxiv.org/abs/1507.00942
[7]
Felix S. Campbell, Alon Silberstein, Julia Stoyanovich, and Yuval Moskovitch. 2024. Query Refinement for Diverse Top-?? Selection (Implementation). https://github.com/fsalc/diverse-top-k
[8]
Felix S. Campbell, Alon Silberstein, Julia Stoyanovich, and Yuval Moskovitch. 2024. Query Refinement for Diverse Top-?? Selection (Tech Report). arXiv:2403.17786 [cs.DB]
[9]
L. Elisa Celis, Anay Mehrotra, and Nisheeth K. Vishnoi. 2020. Interventions for ranking in the presence of implicit bias. In FAT* '20: Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, January 27--30, 2020, Mireille Hildebrandt, Carlos Castillo, L. Elisa Celis, Salvatore Ruggieri, Linnet Taylor, and Gabriela Zanfir-Fortuna (Eds.). ACM, 369--380. https://doi.org/10.1145/3351095.3372858
[10]
L. Elisa Celis, Damian Straszak, and Nisheeth K. Vishnoi. 2018. Ranking with Fairness Constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9--13, 2018, Prague, Czech Republic (LIPIcs, Vol. 107), Ioannis Chatzigiannakis, Christos Kaklamanis, Dániel Marx, and Donald Sannella (Eds.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 28:1--28:15. https://doi.org/10.4230/LIPIcs.ICALP.2018.28
[11]
Abraham Charnes and William W Cooper. 1962. Programming with linear fractional functionals. Naval Research Logistics Quarterly 9, 3--4 (1962), 181--186.
[12]
Zixuan Chen, Panagiotis Manolios, and Mirek Riedewald. 2023. Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals. Proc. VLDB Endow. 16, 9 (2023), 2377--2390. https://www.vldb.org/pvldb/vol16/p2377-chen.pdf
[13]
Wesley W. Chu and Qiming Chen. 1994. A structured approach for cooperative query answering. IEEE Transactions on Knowledge and Data Engineering 6, 5 (1994), 738--749.
[14]
Ting Deng and Wenfei Fan. 2014. On the Complexity of Query Result Diversification. ACM Trans. Database Syst. 39, 2 (2014), 15:1--15:46. https://doi.org/10.1145/2602136
[15]
Daniel Deutch, Zachary G. Ives, Tova Milo, and Val Tannen. 2013. Caravan: Provisioning for What-If Analysis. In Sixth Biennial Conference on Innovative Data Systems Research, CIDR 2013, Asilomar, CA, USA, January 6--9, 2013, Online Proceedings. www.cidrdb.org. http://cidrdb.org/cidr2013/Papers/CIDR13_Paper100.pdf
[16]
Daniel Deutch, Yuval Moskovitch, and Val Tannen. 2014. A Provenance Framework for Data-Dependent Process Analysis. Proc. VLDB Endow. 7, 6 (2014), 457--468. https://doi.org/10.14778/2732279.2732283
[17]
Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Comparing Top k Lists. SIAM J. Discret. Math. 17, 1 (2003), 134--160. https://doi.org/10.1137/S0895480102412856
[18]
Sahin Cem Geyik, Stuart Ambler, and Krishnaram Kenthapadi. 2019. Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search. In SIGKDD. ACM.
[19]
Sreenivas Gollapudi and Aneesh Sharma. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web, WWW 2009, Madrid, Spain, April 20--24, 2009, Juan Quemada, Gonzalo León, Yoëlle S. Maarek, and Wolfgang Nejdl (Eds.). ACM, 381--390. https://doi.org/10.1145/1526709.1526761
[20]
Md Mouinul Islam, Dong Wei, Baruch Schieber, and Senjuti Basu Roy. 2022. Satisfying Complex Top-k Fairness Constraints by Preference Substitutions. Proc. VLDB Endow. 16, 2 (2022), 317--329. https://www.vldb.org/pvldb/vol16/ p317-roy.pdf
[21]
Richard M. Karp. 1972. Reducibility Among Combinatorial Problems. In Proceedings of a symposium on the Complexity of Computer Computations, held March 20--22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA (The IBM Research Symposia Series), Raymond E. Miller and James W. Thatcher (Eds.). Plenum Press, New York, 85--103. https://doi.org/10.1007/978--1--4684--2001--2_9
[22]
M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika 30, 1--2 (06 1938), 81--93. https://doi.org/10.1093/ biomet/30.1--2.81 arXiv:https://academic.oup.com/biomet/article-pdf/30/1--2/81/423380/30--1--2--81.pdf
[23]
Jon M. Kleinberg and Manish Raghavan. 2018. Selection Problems in the Presence of Implicit Bias. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11--14, 2018, Cambridge, MA, USA (LIPIcs, Vol. 94), Anna R. Karlin (Ed.). Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 33:1--33:17. https://doi.org/10.4230/LIPIcs.ITCS.2018.33
[24]
Nick Koudas, Chen Li, Anthony K. H. Tung, and Rares Vernica. 2006. Relaxing Join and Selection Queries. In VLDB.
[25]
Matt J. Kusner, Joshua R. Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual Fairness. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4--9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 4066--4076. https://proceedings.neurips.cc/paper/2017/hash/a486cd07e4ac3d270571622f4f316ec5-Abstract.html
[26]
Jinyang Li, Yuval Moskovitch, Julia Stoyanovich, and HV Jagadish. 2023. Query Refinement for Diversity Constraint Satisfaction. Proceedings of the VLDB Endowment 17, 2 (2023), 106--118.
[27]
Jinyang Li, Alon Silberstein, Yuval Moskovitch, Julia Stoyanovich, and H. V. Jagadish. 2023. Erica: Query Refinement for Diversity Constraint Satisfaction. Proc. VLDB Endow. 16, 12 (2023), 4070--4073. https://doi.org/10.14778/3611540.3611623
[28]
Anh L. Mai, PengyuWang, Azza Abouzied, Matteo Brucato, Peter J. Haas, and Alexandra Meliou. 2023. Scaling Package Queries to a Billion Tuples via Hierarchical Partitioning and Customized Optimization. CoRR abs/2307.02860 (2023). https://doi.org/10.48550/arXiv.2307.02860 arXiv:2307.02860
[29]
Alexandra Meliou and Dan Suciu. 2012. Tiresias: the database oracle for how-to queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, Scottsdale, AZ, USA, May 20--24, 2012, K. Selçuk Candan, Yi Chen, Richard T. Snodgrass, Luis Gravano, and Ariel Fuxman (Eds.). ACM, 337--348. https://doi.org/10.1145/2213836.2213875
[30]
Chaitanya Mishra and Nick Koudas. 2009. Interactive query refinement. In EDBT 2009, 12th International Conference on Extending Database Technology, Saint Petersburg, Russia, March 24--26, 2009, Proceedings (ACM International Conference Proceeding Series, Vol. 360), Martin L. Kersten, Boris Novikov, Jens Teubner, Vladimir Polutin, and Stefan Manegold (Eds.). ACM, 862--873. https://doi.org/10.1145/1516360.1516459
[31]
Yuval Moskovitch, Jinyang Li, and H. V. Jagadish. 2022. Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022. ACM, 1:1--1:4. https://doi.org/10.1145/3530800.3534528
[32]
Yuval Moskovitch, Jinyang Li, and H. V. Jagadish. 2022. Bias analysis and mitigation in data-driven tools using provenance. In Proceedings of the 14th International Workshop on the Theory and Practice of Provenance, TaPP 2022, Philadelphia, Pennsylvania, 17 June 2022, Adriane Chapman, Daniel Deutch, and Tanu Malik (Eds.). ACM, 1:1--1:4. https://doi.org/10.1145/3530800.3534528
[33]
Yuval Moskovitch, Jinyang Li, and H. V. Jagadish. 2023. Detection of Groups with Biased Representation in Ranking. CoRR abs/2301.00719 (2023). https://doi.org/10.48550/arXiv.2301.00719 arXiv:2301.00719
[34]
Ion Muslea and Thomas J Lee. 2005. Online query relaxation via bayesian causal structures discovery. In AAAI. 831--836.
[35]
Neha Patki, Roy Wedge, and Kalyan Veeramachaneni. 2016. The Synthetic data vault. In IEEE International Conference on Data Science and Advanced Analytics (DSAA). 399--410. https://doi.org/10.1109/DSAA.2016.49
[36]
Christopher Peskun, Allan Detsky, and Maureen Shandling. 2007. Effectiveness of medical school admissions criteria in predicting residency ranking four years later. Medical education 41, 1 (2007).
[37]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: an Embeddable Analytical Database. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, Peter A. Boncz, Stefan Manegold, Anastasia Ailamaki, Amol Deshpande, and Tim Kraska (Eds.). ACM, 1981--1984. https://doi.org/10.1145/3299869.3320212
[38]
Suraj Shetiya, Ian P. Swift, Abolfazl Asudeh, and Gautam Das. 2022. Fairness-Aware Range Queries for Selecting Unbiased Data. In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9--12, 2022. IEEE, 1423--1436. https://doi.org/10.1109/ICDE53745.2022.00111
[39]
Julia Stoyanovich, Ke Yang, and H. V. Jagadish. 2018. Online Set Selection with Fairness and Diversity Constraints. In Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26--29, 2018, Michael H. Böhlen, Reinhard Pichler, Norman May, Erhard Rahm, Shan-Hung Wu, and Katja Hose (Eds.). OpenProceedings.org, 241--252. https://doi.org/10.5441/002/edbt.2018.22
[40]
Quoc Trung Tran and Chee-Yong Chan. 2010. How to conquer why-not questions. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 15--26.
[41]
Quoc Trung Tran, Chee-Yong Chan, and Srinivasan Parthasarathy. 2009. Query by output. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 535--548.
[42]
Marcos R. Vieira, Humberto Luiz Razente, Maria Camila Nardini Barioni, Marios Hadjieleftheriou, Divesh Srivastava, Caetano Traina Jr., and Vassilis J. Tsotras. 2011. On query result diversification. In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11--16, 2011, Hannover, Germany, Serge Abiteboul, Klemens Böhm, Christoph Koch, and Kian-Lee Tan (Eds.). IEEE Computer Society, 1163--1174. https://doi.org/10.1109/ICDE.2011. 5767846
[43]
Xiaolan Wang, Alexandra Meliou, and Eugene Wu. 2017. QFix: Diagnosing Errors through Query Histories. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, Semih Salihoglu, Wenchao Zhou, Rada Chirkova, Jun Yang, and Dan Suciu (Eds.). ACM, 1369--1384. https://doi.org/10.1145/3035918.3035925
[44]
Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
[45]
Ke Yang, Vasilis Gkatzelis, and Julia Stoyanovich. 2019. Balanced Ranking with Diversity Constraints. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10--16, 2019, Sarit Kraus (Ed.). ijcai.org, 6035--6042. https://doi.org/10.24963/ijcai.2019/836
[46]
Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27--29, 2017. ACM, 22:1--22:6. https://doi.org/10.1145/3085504.3085526
[47]
Meike Zehlike, Philipp Hacker, and Emil Wiedemann. 2020. Matching code and law: achieving algorithmic fairness with optimal transport. Data Min. Knowl. Discov. 34, 1 (2020), 163--200. https://doi.org/10.1007/s10618-019-00658--8
[48]
Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2023. Fairness in Ranking, Part I: Score-Based Ranking. ACM Comput. Surv. 55, 6 (2023), 118:1--118:36. https://doi.org/10.1145/3533379
[49]
Meike Zehlike, Ke Yang, and Julia Stoyanovich. 2023. Fairness in Ranking, Part II: Learning-to-Rank and Recommender Systems. ACM Comput. Surv. 55, 6 (2023), 117:1--117:41. https://doi.org/10.1145/3533380

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 3
SIGMOD
June 2024
1953 pages
EISSN:2836-6573
DOI:10.1145/3670010
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024
Published in PACMMOD Volume 2, Issue 3

Author Tags

  1. diversity
  2. provenance
  3. query refinement
  4. ranking
  5. top-k

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)205
  • Downloads (Last 6 weeks)56
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media