skip to main content
10.1145/3491101.3519842acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
poster

Towards Conversationally Intelligent Dialog Systems

Published: 28 April 2022 Publication History

Abstract

Spoken dialog systems, lacking the means to address the complex phenomena of spontaneous speech and conversational dynamics, force users into a constrained mode of dialog that resembles text-based interaction more closely than spoken conversation. Turn-taking is simplified and discourse-related information is lost, as discourse markers are largely ignored and prosodic information is not captured or utilized. We hypothesize that incorporating a few of these key conversational phenomena at specific points in a dialog will reduce cognitive load in spoken human-computer interaction and expand the potential application areas of dialog systems to tasks requiring more complex interactions. In this paper, we describe our approach to adding conversational intelligence to dialog systems and our work to date validating the hypothesis that adding conversational intelligence to existing dialog systems will significantly reduce users’ cognitive load.

Supplementary Material

MP4 File (3491101.3519842-talk-video.mp4)
Talk Video

References

[1]
Ron Cole, Lynette Hirschman, Les Atlas, Hynek Hermansky, Patti Price, Mary Beckman Steve Levinson, Alan Biermann, Kathy Mckeown, Judy Spitz, Marcia Bush, Nelson Morgan, Alex Waibel, Mark Clements, David G. Novick, Clifford Weinstein, Jordan Cohen, Mari Ostendorf, Steve Zahorian, Oscar Garcia, Sharon Oviatt, Victor Zue, and Brian Hanson. 1995. The challenge of spoken language systems: Research directions for the nineties. IEEE transactions on Speech and Audio processing, 3(1), 1-21.
[2]
Candace Kamm. 1995. User interfaces for voice applications. Proceedings of the National Academy of Sciences, 92(22), 10031-10037.
[3]
Cathy Pearl. 2016. Designing voice user interfaces: Principles of conversational experiences. " O'Reilly Media, Inc."
[4]
Dasgupta, Ritwik and Srivastava. 2018. Voice User Interface Design. Berkeley, CA, USA: Apress.
[5]
Michael H. Cohen, James P. Giangola, and Jennifer Balogh. 2004. Voice user interface design. Addison-Wesley Professional.
[6]
Nigel G. Ward, Anais G. Rivera, Karen Ward, and David G. Novick. 2005. Root causes of lost time and user stress in a simple dialog system. Ninth European Conference on Speech Communication and Technology.
[7]
Elizabeth Shriberg. 2005. Spontaneous speech: How people really talk and why engineers should care. In Ninth European Conference on Speech Communication and Technology.
[8]
Victor H. Yngve. 1970. On getting a word in edgewise. In Chicago Linguistics Society, 6th Meeting, 1970 (pp. 567-578).
[9]
Harvey Sacks, Emanuel A. Schegloff, and Gail Jefferson. 1978. A simplest systematics for the organization of turn taking for conversation. In Studies in the organization of conversational interaction (pp. 7-55). Academic Press.
[10]
Carole Edelsky. 1981. Who's got the floor?. Language in society, 10(3), 383-421.
[11]
Emanuel A. Schegloff. 1982. Discourse as an interactional achievement: Some uses of ‘uh huh'and other things that come between sentences. Analyzing discourse: Text and talk, 71, 93.
[12]
Willem J. Levelt. 1983. Monitoring and self-repair in speech. Cognition, 14(1), 41-104.
[13]
Herbert H. Clark and Edward F. Schaefer. 1989. Contributing to discourse. Cognitive science, 13(2), 259-294.
[14]
Herbert H. Clark and Susan E. Brennan. 1991. Grounding in communication. In L. B. Resnick, J. M. Levine, & S. D. Teasley (Eds.), Perspectives on socially shared cognition (pp. 127–149). American Psychological Association.
[15]
Stephen C. Levinson and Francisco Torreira. 2015. Timing in turn-taking and its implications for processing models of language. Frontiers in psychology, 6, 731.
[16]
Gabriel Skantze. 2021. Turn-taking in conversational systems and human-robot interaction: a review. Computer Speech & Language, 67, 101178.
[17]
Alan Cooper. 1999. The inmates are running the asylum. In Software-Ergonomie’99 (pp. 17-17). Vieweg+ Teubner Verlag, Wiesbaden.
[18]
Miguel Helft. "Inside Sundar Pichai's Plan to Put AI Everywhere." Forbes https://www.Forbes.com/sites/miguelhelft/2016/05/18/inside-sundarpichais-plan-to-put-ai-everywhere (2016).
[19]
Nigel G.Ward and David DeVault. 2016. Challenges in building highly-interactive dialog systems. Ai Magazine, 37(4), 7-18.
[20]
Julian Hough and David Schlangen. 2016, September. Investigating fluidity for human-robot interaction with real-time, real-world grounding strategies. In Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue (pp. 288-298).
[21]
Matthew Marge, Carol Espy-Wilson, Nigel G. Ward, Abeer Alwan, Yoav Artzi, Mohit Bansal, Gil Blankenship, Joyce Chai, Hal Daumé III, Debadeepta Dey, Mary Harper, Thomas Howard, Casey Kennington, Ivana Kruijff-Korbayová, Dinesh Manocha, Cynthia Matuszek, Ross Mead, Raymond Mooney, Roger K. Moore, Mari Ostendorf, Heather Pon-Barry, Alexander I. Rudnicky, Matthias Scheutz, Robert St. Amant, Tong Sun, Stefanie Tellex, David Traum, and Zhou Yu. 2022. Spoken language interaction with robots: Recommendations for future research. Computer Speech & Language, 71, 101255.
[22]
Yaniv Leviathan and Yossi Matias. 2018. Google Duplex: An AI system for accomplishing real-world tasks over the phone.
[23]
Masahiro Mori, Karl F. MacDorman, and Norri Kageki. 2012. The uncanny valley [from the field]. IEEE Robotics & Automation Magazine, 19(2), 98-100.
[24]
J. Raymond Comstock Jr. and Ruth J. Arnegard. 1992. The multi-attribute task battery for human operator workload and strategic behavior research.
[25]
Johan Engström, Gustav Markkula, Trent Victor, and Natasha Merat. 2017. Effects of cognitive load on driving performance: The cognitive control hypothesis. Human factors, 59(5), 734-764.
[26]
David L.Strayer, Joel M. Cooper, Jonna Turrill, James Coleman, Nate Medeiros-Ward, and Francesco Biondi. 2013. Measuring Cognitive Distraction in the Automobile (Technical Report). Washington, D.C.: AAA Foundation for Traffic Safety.
[27]
Sandra G. Hart. 2006. NASA-task load index (NASA-TLX); 20 years later. In Proceedings of the human factors and ergonomics society annual meeting (Vol. 50, No. 9, pp. 904-908). Sage CA: Los Angeles, CA: Sage publications.
[28]
John Brooke. 1996. SUS: A quick and dirty usability scale. In P.W. Jordan, B. Thomas, B. A. Weerdmeester & I. L. McClelland (Eds.), Usability Evaluation in Industry (pp. 189-194). London: Taylor & Francis.
[29]
Luciana Ferrer, Elizabeth Shriberg, and Andreas Stolcke. 2002. Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody. In Seventh international conference on spoken language processing.
[30]
Lynette Hirschman. 1992. Multi-site data collection for a spoken language corpus. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems
April 2022
3066 pages
ISBN:9781450391566
DOI:10.1145/3491101
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 April 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Conversational intelligence
  2. Dialog systems
  3. Spontaneous speech
  4. conversational AI
  5. dialogue complexity
  6. human computer interaction

Qualifiers

  • Poster
  • Research
  • Refereed limited

Conference

CHI '22
Sponsor:
CHI '22: CHI Conference on Human Factors in Computing Systems
April 29 - May 5, 2022
LA, New Orleans, USA

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 218
    Total Downloads
  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)2
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media