skip to main content
research-article

Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

Published: 23 September 2023 Publication History

Abstract

Human-AI collaboration for decision-making strives to achieve team performance that exceeds the performance of humans or AI alone. However, many factors can impact success of Human-AI teams, including a user’s domain expertise, mental models of an AI system, trust in recommendations, and more. This article reports on a study that examines users’ interactions with three simulated algorithmic models, all with equivalent accuracy rates but each tuned differently in terms of true positive and true negative rates. Our study examined user performance in a non-trivial blood vessel labeling task where participants indicated whether a given blood vessel was flowing or stalled. Users completed 140 trials across multiple stages, first without an AI and then with recommendations from an AI-Assistant. Although all users had prior experience with the task, their levels of proficiency varied widely.
Our results demonstrated that while recommendations from an AI-Assistant can aid in users’ decision making, several underlying factors, including user base expertise and complementary human-AI tuning, significantly impact the overall team performance. First, users’ base performance matters, particularly in comparison to the performance level of the AI. Novice users improved, but not to the accuracy level of the AI. Highly proficient users were generally able to discern when they should follow the AI recommendation and typically maintained or improved their performance. Mid-performers, who had a similar level of accuracy to the AI, were most variable in terms of whether the AI recommendations helped or hurt their performance. Second, tuning an AI algorithm to complement users’ strengths and weaknesses also significantly impacted users’ performance. For example, users in our study were better at detecting flowing blood vessels, so when the AI was tuned to reduce false negatives (at the expense of increasing false positives), users were able to reject those recommendations more easily and improve in accuracy. Finally, users’ perception of the AI’s performance relative to their own performance had an impact on whether users’ accuracy improved when given recommendations from the AI. Overall, this work reveals important insights on the complex interplay of factors influencing Human-AI collaboration and provides recommendations on how to design and tune AI algorithms to complement users in decision-making tasks.

References

[1]
Stall Catchers. 2019. Join a global game that’s trying to cure Alzheimer’s. Retrieved from https://stallcatchers.com/. Accessed 3/26/2023.
[2]
Veronika Alexander, Collin Blinder, and Paul J. Zak. 2018. Why trust an algorithm? Performance, cognition, and neurophysiology. Computers in Human Behavior 89, C (2018), 279–288.
[3]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–13.
[4]
Gagan Bansal, Besmira Nushi, Ece Kamar, Walter S. Lasecki, Daniel S. Weld, and Eric Horvitz. 2019. Beyond accuracy: The role of mental models in human-AI team performance. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 7, 2–11.
[5]
Gagan Bansal, Besmira Nushi, Ece Kamar, Daniel S. Weld, Walter S. Lasecki, and Eric Horvitz. 2019. Updates in human-AI teams: Understanding and addressing the performance/compatibility tradeoff. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2429–2437.
[6]
Gagan Bansal, Tongshuang Wu, Joyce Zhou, Raymond Fok, Besmira Nushi, Ece Kamar, Marco Tulio Ribeiro, and Daniel Weld. 2021. Does the whole exceed its parts? The effect of AI explanations on complementary team performance. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
[7]
Oliver Bracko, Lindsay K. Vinarcsik, Jean C. Cruz Hernndez, Nancy E. Ruiz-Uribe, Mohammad Haft-Javaherian, Kaja Falkenhain, Egle M. Ramanauskaite, Muhammad Ali, Aditi Mohapatra, Madisen A. Swallow, Brendah N. Njiru, Victorine Muse, Pietro E. Michelucci, Nozomi Nishimura, and Chris B. Schaffer. 2020. High fat diet worsens Alzheimer’s disease-related behavioral abnormalities and neuropathology in APP/PS1 mice, but not by synergistically decreasing cerebral blood flow. Scientific Reports 10, 1 (June2020), 1–16. DOI:DOI:Number: 1 Publisher: Nature Publishing Group.
[8]
Zana Buçinca, Phoebe Lin, Krzysztof Z. Gajos, and Elena L. Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. ACM, New York, NY, 454–464. DOI:DOI:
[9]
Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z. Gajos. 2021. To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–21.
[10]
Adrian Bussone, S. Stumpf, and D. O’Sullivan. 2015. The role of explanations on trust and reliance in clinical decision support systems. In Proceedings of the 2015 International Conference on Healthcare Informatics. 160–169.
[11]
Chun-Wei Chiang and Ming Yin. 2021. You’d better stop! Understanding human reliance on machine learning models under covariate shift. In Proceedings of the 13th ACM Web Science Conference.
[12]
Robyn M. Dawes. 1979. The robust beauty of improper linear models in decision making. American Psychologist 34, 7 (1979), 571.
[13]
Maria De-Arteaga, Riccardo Fogliato, and Alexandra Chouldechova. 2020. A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
[14]
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2015. Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General 144, 1 (2015), 114.
[15]
Berkeley J. Dietvorst, Joseph P. Simmons, and Cade Massey. 2018. Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Science 64, 3 (2018), 1155–1170.
[16]
Jaap J. Dijkstra. 1999. User agreement with incorrect expert system advice. Behaviour & Information Technology 18, 6 (1999), 399–411.
[17]
Jaap J. Dijkstra, Wim B. G. Liebrand, and Ellen Timminga. 1998. Persuasiveness of expert systems. Behaviour & Information Technology 17, 3 (1998), 155–163.
[18]
Mary T. Dzindolet, Linda G. Pierce, Hall P. Beck, and Lloyd A. Dawe. 2002. The perceived utility of human and automated aids in a visual detection task. Human Factors 44, 1 (2002), 79–94.
[19]
Hillel J. Einhorn. 1986. Accepting error to make less error. Journal of Personality Assessment 50, 3 (1986), 387–395.
[20]
Charles Elkan. 2001. The foundations of cost-sensitive learning. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 17. Lawrence Erlbaum Associates Ltd, 973–978.
[21]
Kaja Falkenhain, Nancy E. Ruiz-Uribe, Mohammad Haft-Javaherian, Muhammad Ali, Stall Catchers, Pietro E. Michelucci, Chris B. Schaffer, and Oliver Bracko. 2020. A pilot study investigating the effects of voluntary exercise on capillary stalling and cerebral blood flow in the APP/PS1 mouse model of Alzheimer’s disease. PLOS ONE 15, 8 (Aug.2020), e0235691. DOI:DOI:Publisher: Public Library of Science.
[22]
Shi Feng and Jordan Boyd-Graber. 2019. What can AI do for me? Evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 229–239.
[23]
Riccardo Fogliato, Shreya Chappidi, Matthew Lungren, Paul Fisher, Diane Wilson, Michael Fitzke, Mark Parkinson, Eric Horvitz, Kori Inkpen, and Besmira Nushi. 2022. Who Goes First? Influences of Human-AI Workflow on Decision Making in Clinical Imaging. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT’22). Association for Computing Machinery, New York, NY, USA, 1362–1374.
[24]
Susanne Gaube, Harini Suresh, Martina Raue, Alexander Merritt, Seth J. Berkowitz, Eva Lermer, Joseph F. Coughlin, John V. Guttag, Errol Colak, and Marzyeh Ghassemi. 2021. Do as AI say: Susceptibility in deployment of clinical decision-aids. NPJ Digital Medicine 4, 1 (2021), 1–8.
[25]
Yashesh Gaur, Walter S. Lasecki, Florian Metze, and Jeffrey P. Bigham. 2016. The effects of automatic speech recognition quality on human transcription latency. In Proceedings of the 13th Web for All Conference. 1–8.
[26]
Ana Valeria Gonzalez, Gagan Bansal, Angela Fan, Robin Jia, Yashar Mehdad, and Srinivasan Iyer. 2021. Human evaluation of spoken vs. visual explanations for open-domain QA. In Proceedings of the ACL.
[27]
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In Proceedings of the International Conference on Intelligent Computing. Springer, 878–887.
[28]
Scott Highhouse. 2008. Stubborn reliance on intuition and subjectivity in employee selection. Industrial and Organizational Psychology 1, 3 (2008), 333–342.
[29]
Kevin Anthony Hoff and Masooda Bashir. 2015. Trust in automation: Integrating empirical evidence on factors that influence trust. Human Factors 57, 3 (2015), 407–434.
[30]
Ece Kamar. 2016. Directions in hybrid intelligence: Complementing AI systems with human intelligence. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 4070–4073.
[31]
Ece Kamar, Severin Hacker, and Eric Horvitz. 2012. Combining human and machine intelligence in large-scale crowdsourcing. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 1. International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 467–474.
[32]
Joshua Klayman and Young-Won Ha. 1987. Confirmation, disconfirmation, and information in hypothesis testing. Psychological Review 94, 2 (1987), 211.
[33]
Rafal Kocielnik, Saleema Amershi, and Paul N. Bennett. 2019. Will you accept an imperfect AI? Exploring designs for adjusting end-user expectations of AI systems. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
[34]
Vivian Lai and Chenhao Tan. 2019. On human predictions with explanations and predictions of machine learning models: A case study on deception detection. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 29–38.
[35]
John D. Lee and Katrina A. See. 2004. Trust in automation: Designing for appropriate reliance. Human Factors 46, 1 (2004), 50–80.
[36]
Constance D. Lehman, Robert D. Wellman, Diana S. M. Buist, Karla Kerlikowske, Anna N. A. Tosteson, Diana L. Miglioretti, and Breast Cancer Surveillance Consortium. 2015. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Internal Medicine 175, 11 (2015), 1828–1837.
[37]
Greg Lipstein. 2020. Meet the winners of the Clog Loss Challenge for Alzheimer’s Research. (Sept.2020). Retrieved from https://www.drivendata.co/blog/clog-loss-alzheimers-winners. Section: Blog. Accessed 3/26/2023.
[38]
Jennifer M. Logg, Julia A. Minson, and Don A. Moore. 2019. Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151, C (2019), 90–103.
[39]
Zhuoran Lu and Ming Yin. 2021. Human reliance on machine learning models when performance feedback is limited: Heuristics and risks. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–16.
[40]
Scott M. Lundberg, Bala Nair, Monica S. Vavilala, Mayumi Horibe, Michael J. Eisses, Trevor Adams, David E. Liston, Daniel King-Wai Low, Shu-Fang Newman, Jerry Kim, and Su-In Lee. 2018. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering 2, 10 (2018), 749–760.
[41]
Poornima Madhavan and Douglas A. Wiegmann. 2007. Similarities and differences between human–human and human–automation trust: An integrative review. Theoretical Issues in Ergonomics Science 8, 4 (2007), 277–301.
[42]
Pietro Michelucci. 2019. The people and serendipity of the EyesOnALZ project. Narrative Inquiry in Bioethics 9, 1 (2019), 29–33.
[43]
Thomas Mussweiler and Fritz Strack. 1999. Comparing is believing: A selective accessibility model of judgmental anchoring. European Review of Social Psychology 10, 1 (1999), 135–167.
[44]
Jill Nugent. 2018. iNaturalist: Citizen science for 21st-century naturalists. Science Scope 41, 7 (2018), 12.
[45]
J. Nugent. 2021. CITIZEN SCIENCE: Accelerating Alzheimer’s research with stall catchers. The Science Teacher 88, 4 (2021), 16–19. https://www.jstor.org/stable/27135481.
[46]
Forough Poursabzi-Sangdeh, Daniel G. Goldstein, Jake M. Hofman, Jennifer Wortman Wortman Vaughan, and Hanna Wallach. 2021. Manipulating and measuring model interpretability. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1–52.
[47]
Andrew Prahl and Lyn Van Swol. 2017. Understanding algorithm aversion: When is advice from automation discounted? Journal of Forecasting 36, 6 (2017), 691–702.
[48]
Charvi Rastogi, Yunfeng Zhang, Dennis Wei, Kush R. Varshney, Amit Dhurandhar, and Richard Tomsett. 2022. Deciding fast and slow: The role of cognitive biases in AI-assisted decision-making. In Proceedings of the ACM Hum.-Comput. Interact. 6, CSCW1, (2022), 22 pages.
[49]
Paul Robinette, Ayanna M. Howard, and Alan R. Wagner. 2017. Effect of robot performance on human-robot trust in time-critical situations. IEEE Transactions on Human-Machine Systems 47, 4 (2017), 425–436.
[50]
James Schaffer, John O’Donovan, James Michaelis, Adrienne Raglin, and Tobias Höllerer. 2019. I can do better than your AI: Expertise and explanations. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 240–251.
[51]
Robert Simpson, Kevin R. Page, and David De Roure. 2014. Zooniverse: Observing the world’s largest citizen science platform. In Proceedings of the 23rd International Conference on World Wide Web. 1049–1054.
[52]
David F. Steiner, Robert MacDonald, Yun Liu, Peter Truszkowski, Jason D. Hipp, Christopher Gammage, Florence Thng, Lily Peng, and Martin C. Stumpe. 2018. Impact of deep learning assistance on the histopathologic review of lymph nodes for metastatic breast cancer. The American Journal of Surgical Pathology 42, 12 (2018), 1636.
[53]
Brian L. Sullivan, Jocelyn L. Aycrigg, Jessie H. Barry, Rick E. Bonney, Nicholas Bruns, Caren B. Cooper, Theo Damoulas, André A. Dhondt, Tom Dietterich, Andrew Farnsworth, Daniel Fink, John W. Fitzpatrick, Thomas Fredericks, Jeff Gerbracht, Carla Gomes, Wesley M. Hochachka, Marshall J. Iliff, Carl Lagoze, Frank A. La Sorte, Matthew Merrifield, Will Morris, Tina B. Phillips, Mark Reynolds, Amanda D. Rodewald, Kenneth V. Rosenberg, Nancy M. Trautmann, Andrea Wiggins, David W. Winkler, Weng-Keen Wong, Christopher L. Wood, Jun Yu, and Steve Kelling. 2014. The eBird enterprise: An integrated approach to development and application of citizen science. Biological Conservation 169 (2014), 31–40. https://www.sciencedirect.com/science/article/pii/S0006320713003820.
[54]
Sarah Tan, Julius Adebayo, Kori Inkpen, and Ece Kamar. 2018. Investigating human+ machine complementarity for recidivism predictions. arXiv:1808.09123. Retrieved from https://arxiv.org/abs/1808.09123.
[55]
Viswanath Venkatesh and Fred D. Davis. 2000. A theoretical extension of the technology acceptance model: Four longitudinal field studies. Management Science 46, 2 (2000), 186–204.
[56]
Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew H. Beck. 2016. Deep learning for identifying metastatic breast cancer. arXiv:1606.05718. Retrieved from https://arxiv.org/abs/1606.05718.
[57]
Guangyu Wang, Xiaohong Liu, Jun Shen, Chengdi Wang, Zhihuan Li, Linsen Ye, Xingwang Wu, Ting Chen, Kai Wang, Xuan Zhang, Ting Chen, Kai Wang, Xuan Zhang, Zhongguo Zhou, Jian Yang, Ye Sang, Ruiyun Deng, Wenhua Liang, Tao Yu, Ming Gao, Jin Wang, Zehong Yang, Huimin Cai, Guangming Lu, Lingyan Zhang, Lei Yang, Wenqin Xu, Winston Wang, Andrea Olvera, Ian Ziyar, Charlotte Zhang, Oulan Li, Weihua Liao, Jun Liu, Wen Chen, Wei Chen, Jichan Shi, Lianghong Zheng, Longjiang Zhang, Zhihan Yan, Xiaoguang Zou, Guiping Lin, Guiqun Cao, Laurance L. Lau, Long Mo, Yong Liang, Michael Roberts, Evis Sala, Carola-Bibiane Schönlieb, Manson Fok, Johnson Yiu-Nam Lau, Tao Xu, Jianxing He, Kang Zhang, Weimin Li, and Tianxin Lin. 2021. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nature Biomedical Engineering 5, 6 (2021), 509–521.
[58]
Xinru Wang and Ming Yin. 2021. Are explanations helpful? A comparative study of the effects of explanations in AI-assisted decision-making. In Proceedings of the 26th International Conference on Intelligent User Interfaces. 318–328.
[59]
Bryan Wilder, Eric Horvitz, and Ece Kamar. 2020. Learning to complement humans. In Proceedings of the International Conference on International Joint Conferences on Artificial Intelligence.
[60]
Michael Yeomans, Anuj Shah, Sendhil Mullainathan, and Jon Kleinberg. 2019. Making sense of recommendations. Journal of Behavioral Decision Making 32, 4 (2019), 403–414.
[61]
Ming Yin, Jennifer Wortman Vaughan, and Hanna Wallach. 2019. Understanding the effect of accuracy on trust in machine learning models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[62]
Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. DOI:DOI:

Cited By

View all

Index Terms

  1. Advancing Human-AI Complementarity: The Impact of User Expertise and Algorithmic Tuning on Joint Decision Making

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Computer-Human Interaction
        ACM Transactions on Computer-Human Interaction  Volume 30, Issue 5
        October 2023
        593 pages
        ISSN:1073-0516
        EISSN:1557-7325
        DOI:10.1145/3623487
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 23 September 2023
        Online AM: 10 March 2023
        Accepted: 28 April 2022
        Revised: 03 March 2022
        Received: 02 July 2021
        Published in TOCHI Volume 30, Issue 5

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Human-AI collaboration
        2. Human-AI performance
        3. human-centered AI
        4. citizen science

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)1,384
        • Downloads (Last 6 weeks)181
        Reflects downloads up to 14 Sep 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Human Guidance Approaches for the Genetic Improvement of SoftwareProceedings of the 13th ACM/IEEE International Workshop on Genetic Improvement10.1145/3643692.3648263(21-22)Online publication date: 16-Apr-2024
        • (2024)Trust Development and Repair in AI-Assisted Decision-Making during Complementary ExpertiseProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency10.1145/3630106.3658924(546-561)Online publication date: 3-Jun-2024
        • (2024)An Evaluation of Situational Autonomy for Human-AI Collaboration in a Shared Workspace SettingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642564(1-17)Online publication date: 11-May-2024
        • (2024)Semantic interlinking of Immigration Data using LLMs for Knowledge Graph ConstructionCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651557(605-608)Online publication date: 13-May-2024
        • (2024)Psychological Traits and Appropriate Reliance: Factors Shaping Trust in AIInternational Journal of Human–Computer Interaction10.1080/10447318.2024.2348216(1-17)Online publication date: 13-May-2024
        • (2024)From explainable to interactive AI: A literature review on current trends in human-AI interactionInternational Journal of Human-Computer Studies10.1016/j.ijhcs.2024.103301189(103301)Online publication date: Sep-2024
        • (2024)Human-AI collaboration: Unraveling the effects of user proficiency and AI agent capability in intelligent decision support systemsInternational Journal of Industrial Ergonomics10.1016/j.ergon.2024.103629103(103629)Online publication date: Sep-2024
        • (2024)At the Edge of AIundefinedOnline publication date: 2-Aug-2024
        • (2023)Erschließung handschriftlicher Dokumente zwischen Fachwissen, Citizen Science und KIBibliothek Forschung und Praxis10.1515/bfp-2023-004347:3(503-513)Online publication date: 15-Nov-2023
        • (2023)Understanding the Role of Human Intuition on Reliance in Human-AI Decision-Making with ExplanationsProceedings of the ACM on Human-Computer Interaction10.1145/36102197:CSCW2(1-32)Online publication date: 4-Oct-2023
        • Show More Cited By

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media