1 Introduction

Spurred by a ripening of many core technologies and limited only by designers’ creativity and ingenuity, intelligent assistants are dotting the human-technology interaction landscape today. Chances are the phone we own already comes with an assistant, a car we drove lately also had one, and perhaps we have also recently interacted with an assistant on a website. At the moment, millions of users interact with intelligent assistants in a personal context to carry out simple tasks and queries (Whitenton and Budiu 2018), but as assistants’ capabilities advance, it is foreseeable that they will play a major role in the future of work (Maedche et al. 2019). While these developments seem to point towards a brighter future where such assistants may become an integral part of our lives, researchers in social, ethical, psychological and legal domains have cautioned against their indiscriminate deployment (Danaher 2018; Hernández-Orallo and Vold 2019), and, at the same time, recent movements in the human-computer interaction (HCI) domain such as positive computing (Calvo and Peters 2017) have set lofty goals for researchers and designers. The assistants of the future will not only have to be trustworthy, respect our privacy, be accountable and fair, but also help us flourish as human beings. Consequently, designers may have to perform a balancing act that emphasizes different design goals based on the characteristics of various entities involved in the environment, the goal, and the activity for which an intelligent assistant is to be developed.

Yet, it is not so clear how to design assistants that meet these objectives (Maedche et al. 2019). For instance, how should we design the way in which assistants cooperate with their users? Which characteristics of assistants should we consider and influence, and in what ways? How can we objectively compare different assistants? As such there is no common terminology that directly informs the field of intelligent assistants. In this article, we resolve this state of conceptual fogginess, so that research insights from different backgrounds assimilate into knowledge and understanding. This article contributes to scientific research by establishing an overarching conceptual frame of reference in the information systems research community on what an ‘intelligent assistant’ is by considering its functionality (what an intelligent assistant does), outcomes (its benefit for the user), context (entities, contextual conditions and modes of interaction), design trends ( how assistants have been designed in the past), and evaluation (metrics for evaluating an intelligent assistant).

Specifically, our investigation is guided by three research objectives, each building on the outcome of the previous objective(s):

  1. 1.

    Research Objective 1 (RO1): An investigation of the work of human assistants as metaphor for digital assistants: Human assistants have been playing a major role in several professions long before the development of digital (intelligent) assistants. What does their role tell us about the activity of assisting and the context in which it is delivered (Research Question 1 (RQ1))?

  2. 2.

    Research Objective 2 (RO2): An analysis of intelligent assistant characteristics: For decades, the research community has demonstrated prototypical intelligent assistants. (a) What do we mean when we say an assistant is ‘intelligent’ (RQ2.1)? (b) What are the attributes of these assistants (RQ2.2), and (c) what constitutes the context of interacting with these assistants (RQ2.3)?

  3. 3.

    Research Objective 3 (RO3): An analysis of trends in the design and evaluation of intelligent assistants. (a) Which design trends and research goals have steered the direction of research in the domain (RQ3.1), and (b) which measures have been used to evaluate assistants (RQ3.2)?

Before beginning with our own attempt at creating a typology for intelligent assistants, we carried out a literature search to identify any prior attempts. At the time of writing this article, we only found one paper by Knote et al. (2019) who carried out a categorization of design characteristics of intelligent personal assistants. Our attempt complements this prior work in several ways:

  • Our research scope extends beyond the technological characteristics of intelligent assistants. We focus not only on the technological capabilities, but also include the concept of assisting (RO1) and design/evaluation guidelines (RO3) in our analysis.

  • While Knote et al. inductively derived the design characteristics of interaction and intelligence based on their literature search, we first begin with an exploration of human assistants in several human domains to deductively arrive at our concepts, following which we build our typology in an empirically iterative manner as described by Nickerson et al. (2013). This approach, although atypical, brings to light several contextual aspects regarding the work of assistants which are otherwise taken for granted and also provides a background against which ideas in the technological domain can be compared.

  • We not only derive the typology but demonstrate its effectiveness by cross-referencing several attributes which yield further insights into the topic, for example design trends.

  • We also go a step further and pave the way for guidelines and future research themes in the field of intelligent assistant design.

Such an undertaking can be beneficial for all stakeholders involved in the design and use of technology – researchers, designers and decision-makers:

  • For researchers: While conceptual work itself may not provide empirical evidence, it forms a basis for pursuing empirical work. This article contributes to the development of theory by creating descriptive knowledge which allows researchers to distinguish between concepts and hypothesize the relationships among them (McKnight and Chervany 2001). In addition, it also anchors these concepts to their manifestations in the real world which provides an avenue for scientific results to make an impact on practice (Iivari 2007).

  • For designers: Assistants are now available on consumer devices and are continually gaining cyber-physical functionalities, raising several new social, psychological, ethical and legal questions. In order to tackle some of these issues, the presence of descriptive knowledge in the form of conceptual typology and design trends can provide a common frame of reference for future discussions between the design and research communities. Over time, this descriptive knowledge could be used as a ‘design-space’ (Shaw 2012) or be used to develop frameworks for facilitating the design of assistants, generate prescriptive design knowledge from real-world use cases, and derive design patterns reflecting best practices.

  • For decision-makers: As firms ponder deploying assistants that interact with their customers and assist their employees, a clear view of the components, aims and capabilities of intelligent assistants could help them structure their own research to make informed decisions about the features, benefits and drawbacks of using these systems in their organizations. Further, decision makers could include unresolved areas of concern in their risk assessment process before committing to a particular assistant framework or technology.

The remainder of this work is structured as follows: Section 2 outlines the methods used to conduct the research and analysis for each of the three objectives. Sections 3, 4 and 5 explicate our findings and analysis. Section 6 highlights our research contributions, discusses conceptual and practical challenges in intelligent assistant design and suggests opportunities for future work for each of three groups of stakeholders mentioned above. Section 7 concludes the paper.

2 Methodology

We took a multidisciplinary, two-pronged approach towards developing the conceptual aspects and typology of intelligent assistants. First, to deepen our conceptual understanding, we reviewed research contributions investigating the work of human assistants from a sociological and organizational perspective. Then, we used these concepts as a starting point to explore the design dimensions of intelligent assistants in the information systems and computing domain. We did so for two reasons. Firstly, we wished to maintain conceptual thoroughness and re-use existing concepts developed in other disciplines. Secondly, doing so revealed additional viewpoints which may prove useful as critique of existing approaches and inspire the design, development, and evaluation of future intelligent assistants.

2.1 Research Objective 1

Inspired by the approach taken by Erickson et al. (2008), we conducted a literature review of systematic qualitative or ethnographic studies of human assistants at work. Appendix A offers the complete list of articles used for RO1 and the methodology for searching and selecting articles is detailed in Appendix B (appendices are available online via http://link.springer.com). We used seven articles to construct the dimensions of our qualitative analysis, guided by the following questions:

  • Which entities form the context of cooperation when assisting or being assisted?

  • Which tasks do human assistants perform and to achieve which outcomes?

  • Do assistants always take the initiative or is the need for assistance communicated?

We derived characteristics of human assistants’ work under each dimension via a qualitative content analysis (Mayring 2000). Our starting point was to identify passages of text that explained the activities carried out by assistants, resulting in a total of 40 such activities, many of which had commonalities depending on the domain. Relying on activity theory (Kaptelinin and Nardi 2006), we identified the tools and objects used within assistance work and the human subjects involved and classified the activities, their content and allocation as well as the outcomes. In two rounds of reduction, we generalized these classes to create a contextual model.

2.2 Research Objective 2

We started with the concepts of environment, outcome, activity and initiative from the previous section, and harmonized them with current discussions concerning the definition and characteristics of intelligence in general and machine intelligence in particular. First, we consulted review articles on intelligence to extract the characteristics of intelligence itself. Broadly, in the context of our work, intelligence can be seen as a property of an agent who interacts with an external environment or a situation by taking actions through which it can achieve a particular goal (Legg and Hutter 2008). The discussion of machine intelligence revolves around the ability of hardware and/or software systems or ‘agents’. We reviewed frequently cited literature on intelligent agents to capture other qualities of an intelligent agent – namely autonomy, flexibility, communication modalities, character, mobility and ubiquity. These served as a starting point for creating our taxonomy.

For developing a taxonomy of intelligent assistants, we followed the method described by Nickerson et al. (2013). According to this method, researchers must first determine one or more ‘meta-characteristic(s)’, described as “the most comprehensive characteristic that will serve as the basis for the choice of characteristics in the taxonomy”. We used the results of RO1 to settle on three overarching perspectives: the outcome, the environment and the assistant itself. While intelligence has previously been seen as a distinct meta-characteristic (Janssen et al. 2020), our investigation revealed that it is best captured as an ‘emergent’ characteristic evidenced by the behavioral and interactive capabilities of the assistant with respect to a particular goal in an environment.

Following a literature search in IEEE Explore, ACM Library, Science Direct, Springer Link and ISI Web of Knowledge, we began with 2713 articles, which we then filtered down to 111 after applying our inclusion and exclusion criteria in two rounds. The typology was developed with an empirical-to-conceptual approach in four iterations, discarding non-relevant categories and inserting newer categories, ending when objective and subjective ending conditions were met. Appendix A offers the complete list of articles used for RO2 and appendix B describes the methodology for searching and selecting the articles. Appendix C details the successive iterations of typology creation.

2.3 Research Objective 3

One of the hallmarks of a mature applied research domain is the existence of empirically derived guidelines (dos and don’ts) and metrics for design and evaluation of systems in the domain. Intelligent assistant design is a burgeoning field with many scattered goals. The following two questions guided our investigation in this section:

  • Which design criteria has been used by researchers in designing intelligent assistants?

  • Which evaluation criteria has been used by researchers in evaluating intelligent assistants?

We used the same data set previously utilized for building the typology, and the selection criteria are explained in Appendix B. Out of the data set, 24 articles mentioned design criteria, and 60 articles mentioned evaluation criteria. An in-depth reading of the articles and a qualitative analysis followed, resulting in the categories based on criteria shown in Appendix D.

In the following, we present the results of our investigation for research objectives 1, 2 and 3 respectively.

3 The Work of Human Assistants as Metaphor (RO1/RQ1)

The word ‘assistant’ has been used in HCI research for almost as long as the domain exists (Floyd 1986) as a kind of ‘interaction metaphor’ (Neale and Carroll 1997). Most metaphors such as files and folders are based on physical objects whose properties are relatively stable and easily accessible, so one can safely assume that almost all users would have a similar understanding of these objects. It is highly likely that every researcher has modeled their intelligent assistant on some implicit understanding of the activity of assisting and the nature of human assistants. To make the metaphor of assisting explicit, we need to first reveal its structure and functions in its ‘source domain’. What is assistance, which skills and competencies do assistants need to work, how do they carry out this work, and to what end?

3.1 Assistance as Cooperation among Human Agents

The word assistance has its roots in Latin (assistere–stand by, take a stand near, attend), and has been a part of the English language’s lexicon for a few centuries now. As per Merriam-Webster dictionary, the word assistance refers to both the act of assisting someone and the help supplied. Hence, assistance refers to an action performed by someone in the service of another, the contents of this action, and the activity that forms the context of this act. Take for example, the simple sentence, ‘I assisted him/her/them in writing a paper’. In using the word assist, we also implicitly assume that:

  • There is a need to assist or be assisted (the present situation: someone needed assistance)

  • Some form of aid is provided (action or content, e.g., effort, money, information etc. – I did something or provided something)

  • Someone (an entity) provides this aid to someone else (another entity)

  • A goal exists for which it is provided (to what end–perhaps to meet a deadline?)

  • A positive change has been brought about (the connotation that one is in a better state with assistance than without – he/she/they were in a better state after receiving my assistance)

According to activity theory, an activity is “understood as a relationship between the subject (that is, an actor) and the object (that is, an entity objectively existing in the world)” to satisfy a need (achieve an outcome), mediated by tools or instruments (Kaptelinin and Nardi 2012). The notion of the subject is not limited to an individual human, but also includes teams and organizations (Kaptelinin and Nardi 2006). The specifics and dynamics of a collective human activity are situated in an activity system (Engeström 2000), which adds additional mediational means such as rules/norms and roles. The outcome of the activity in this case is the result of the collective work of the team towards a common outcome. For example, in the medical domain this may translate to patient well-being.

In organizations, human assistants usually report to their principals (a person in authority, or specifically, the person from whom an agent’s authority derives). In other words, the designation ‘assistant’ necessitates the existence of a principal or an expert who is accountable for the assistant. Since many assistant professions tend to be service oriented, there is also significant client contact. All three entities are human beings with varying personalities, cultural dispositions and competence levels (Takala 2007; Erickson et al. 2008). Figure 1 illustrates how assistants maintain an awareness of the needs and present state of the principal and client via communication, and use tools to fulfill client needs and to maintain supporting objects.

Fig. 1
figure 1

Contextual model in the case of human assistants. Dotted lines indicate interactive relationships

3.2 Outcomes and Activities

Human assistants’ work tends to be variable and fluid. At the highest level, assistants produce two types of complementary outcomes depending on the principal’s situation. In the majority of literature on human assistants we analyzed (83%), assistants augment their principals by working together to jointly enhance their performance and skill levels (Fig. 2). In this case, both the principal and assistant collaborate at their optimal skill and performance level.

Fig. 2
figure 2

The two types of outcomes of assistance: to compensate for sub-optimal and augment normal attributes

In the remainder of cases, assistants strive to maintain a minimum level of performance or ability by compensating for their principals in cases where they are indisposed or unable to perform at their optimal performance level, or in unforeseen situations that hinder the flow of work or demand immediate attention. Here, principals entrust assistants to troubleshoot, escalate or de-escalate a situation by performing the principal’s role with partial or full authority for a short duration or for specific tasks. Several situations illustrate this phenomenon: shifts in doctor priority (Henshall et al. 2019), comforting restless students, intervening in conflicts or taking over when the teacher needs to leave the classroom (Takala 2007), forced change of plans in surgical interventions (Hall et al. 2014), starting a procedure when the principal is running late (Quick 2013).

It is evident that the activity determines the ‘range’ of performance or quality the joint principal-assistant team wishes to attain. Several activities (for instance critical surgical interventions) may only be performed in an augmented state where team work is an absolute necessity, whereas other activities such as administrative work may benefit from augmentation but not necessarily. Compensation modes, on the other hand may be imperative to avoid negative consequences, such as a threat to patient safety during surgical interventions.

Achieving outcomes entails carrying out specific tasks, the most important of which are foreground tasks (43%) with standard procedures that are performed most frequently with varying skill levels alongside the principal, such that the principals’ effort stays directed at situations that maximize the utilization of their time and skills. Examples: ensuring the doctor does not have to wait for their patients (Taché and Hill-Sakurai 2010), supplying varying skills and knowledge during surgical tasks (Hall et al. 2014) or helping teachers and students with tasks (Kerry 2005; Takala 2007). Further, assistants communicate (13%) to fill the gap between principals’ assessment of the situation and the actual situation with additional information that allows the principal to take better decisions or solve problems at hand. Examples: talking to patients before a surgery (Quick 2013), anticipating patient needs before a visit (Taché and Hill-Sakurai 2010), skimming emails and blocking events, keeping track of what’s going on, consolidating information for the principal to use (Erickson et al. 2008).

Background tasks (23%), or housekeeping tasks (usually performed without the principal’s involvement) ensure that the objects (that will be acted on during the activity), resources (information, materials etc. consumed), the work environment, and tools/devices required for everyday activities are present and operational both before and after the activity. Fig. 1 illustrates this as the responsibility of the assistant, consisting of relationships between the assistant, the tools and the objects.

Setbacks and deviations from the norm are a normal part of everyday work of human assistants. Problem solving (13%) is needed to bring a derailed situation back on track, either by intervening (diagnosing the problem, informing the principal and/or taking corrective measures) or adapting to the new situation (re-planning the course of action). The former is more prevalent in the medical and pedagogical domains, where conformity to procedures is pivotal to achieving goals.

Finally, assistants devote time and effort towards maintaining awareness of the situation and principal needs. Good assistants are keen observers and gather information regarding the present situation as well as the preferences of their principal (Erickson et al. 2008) and clients (Taché and Hill-Sakurai 2010). Fig. 1 showcases the lines of communication between the principal, the assistant, the client(s) and the resources that may be used to maintain a common awareness of the situation.

3.3 Initiative

We found three modes in which assistants are motivated to act and cooperate with their principals. In the majority of cases, assistants act autonomously, that is, they self-initiate (53% of articles in our dataset) tasks based on prior task assignment, or take initiative when they sense a need for their involvement. Principals delegate tasks or communicated their needs directly in 35% of the articles in our dataset. In the third and remaining category, some activities mix both assistant initiative and principal instruction, in that the assistant takes initiative but is guided or supervised by the principal. As seen in Fig. 1, the principal can either directly communicate the need for assistance to the assistant, or the assistant acts based on their level of awareness regarding the needs of the principal, the client, and the status of the activity as indicated by tools and resources.

3.4 Summary

Working towards a common outcome, the activity of principals and assistants is directed at objects such as reports, forms, documents, calendars, teaching materials etc. that are organized, referred to, populated and updated at appropriate times either manually or by using common tools to reflect the state of an activity. In working with clients, tools are also used to directly act on them, for example in medical activities (Hall et al. 2014). A stable work-practice forms the basis for defining and constraining the course of work by means of formalized procedures, rules and informal conventions or preferences that convey how, and how not to, carry out tasks and interact with the principal and clients. These are used to infer the running state of the task (satisfactory or unsatisfactory), and, combined with the current state of the principal and assistant (their presence/absence and current ability), the desired outcomes are conveyed (by the principal) or inferred (by the assistant).

In the next section, we build upon the conceptual dimensions derived here by using them to analyze intelligent assistants. We begin with a discussion on the topic of intelligence, extend the typology and build a model of cooperation involving intelligent assistants.

4 An Analysis of Intelligent Assistant Characteristics (RO2)

4.1 Intelligence in Assistants (RQ2.1)

The use of the phrase ‘intelligent assistant’ can be traced back to the early 1990s. Maes described an ‘intelligent personal assistant’ as an autonomous interface agent which “collaborates with the user in the same work environment”, and utilizes machine learning to become “gradually more effective as it learns the user’s interests, habits and preferences” (Maes 1994). In this view, an intelligent assistant is an agent that is capable of autonomy (ability to act independently without direct user manipulation) and learning (be able to observe the user’s interaction with the interface and learn the user’s preferences). Hence, such an assistant would be ‘intelligent’ because it could learn, and, as a consequence, adapt to the user over time. But is learning alone a sign of intelligence? More recent views also suggest to include features such as affect recognition (Morana et al. 2020), natural language processing, speech recognition and knowledge representation (Russell and Norvig 2009). Which of these capabilities form a minimally sufficient set in order to label an assistant as intelligent?

As such there is no one definition of intelligence. The mainstream perspective on human intelligence defines it to be “a very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience” (Gottfredson 1997). Intelligence can only be demonstrated as a group of several related cognitive capabilities (Deary 2012), by a test (Urbina 2011), evaluated for a specific purpose, in this case predicting the future academic or cognitive performance of an individual.

Consequently, many definitions of machine intelligence exist (Monett and Lewis 2017), which tend to be influenced by different theoretical backgrounds and practical considerations. Most definitions of machine intelligence consider it a demonstration, or an effect, of specific capabilities of an agent, which “encompass at least the essence of human intelligence along with the prospect of other capabilities specific to machines” (Legg and Hutter 2008). Agents are entities that posses at least four characteristics—they are reactive (they can sense and act), they act autonomously (exercise control over their own actions), they are temporally continuous (they are a continuously running process), and they are goal-oriented (they pursue goals and do not simply act in response to a stimulus) (Wooldridge and Jennings 1995; Franklin and Graesser 1997).

Legg and Hutter suggest that if the environment signals some kind of reward as an indication of goal achievement, an intelligent agent can learn about the structure of its environment to maximize the expected reward, thereby achieving goals in a wide variety of environments (Legg and Hutter 2008). A second way of interpreting the “best (expected) outcome” is in terms of the real world constraints placed on all agents – they are limited by their insufficiency of knowledge and resources. Supporting this perspective, Wang proposes a different definition of intelligence as “the capacity of an information-processing system to adapt to its environment while operating with insufficient knowledge and resources” (Wang 2019). Russel and Norvig’s view on machine intelligence is that of a ‘rational agent’, which “achieves the best outcome, or when there is uncertainty, the best expected outcome” (Russell and Norvig 2009), based on its current performance measure, its prior knowledge of the environment, its range of actions and its history of observation. In order to do so it is necessary that an agent learns not only to modify and augment its knowledge, but also to do it without designer intervention, by relying on several algorithmic approaches such as reasoning (both stochastic and logical), planning, learning, communicating, perceiving and acting (Russell and Norvig 2009). More recent operational definitions of machine intelligence include functional aspects such as reasoning, planning, learning, communication and perception along with complex, pre-defined goal achievement subject to adaptation (Samoili et al. 2020).

Viewed this way, for an agent adaptation (adjusting the goal and/or means to achieve it as the environment changes) and human-like sensory/cognitive capabilities could both count as signs of intelligence, whereas autonomy (do so without user or designer intervention) kick-starts adaptation and is in turn broadened by intelligence (Gunderson and Gunderson 2004; Hrabia et al. 2015; Abbass et al. 2018). It is the provided goal, the sensing, acting and processing capabilities of the agent, as well as the environment that determine the range of behaviors an intelligent agent exhibits (for instance speech-based communication, visual-perception etc.). Hence, qualifying two functionally and/or algorithmically dissimilar agents as ‘intelligent’ may allude to different capabilities and/or underlying behavior.

4.2 Typology of Intelligent Assistants (RQ2.2)

Intelligent assistants can be classified and categorized on the basis of three meta-characteristics: the assistant, the outcome(s), and the environment. The first summarizes the nature of the intelligent agent acting as the assistant. The second consists of the nature and type of goals achieved by the assistant, whereas the third captures the environment or the context (consisting of the domain and the entities with which the assistant interacts). Table 1 summarizes the dimensions and characteristics, along with their distribution in our dataset. We describe each of these dimensions in turn.

Table 1 Typology of intelligent assistants

4.2.1 Activities

From the user’s perspective, two major categories of activities can be identified as being carried out by assistants. Providing information and feedback (73%) during various stages of a user activity is the most commonly cited form of assistance, where the assistant gathers, analyses and presents information to fulfill a need, provides additional hints and suggestions, anticipates errors or evaluates the user’s current state of work. Executing routine tasks and services (27%) comprises taking action and carrying out well-defined parts of an activity, either under explicit instruction or out of own initiative by operating or interacting with other applications and objects.

4.2.2 Initiative

The three kinds of stimulus identified in Sect. 3.3 are analogous to the three invocation modes of intelligent assistants. The majority of assistants have been autonomous (43%), taking initiative based on their assessment of the user’s needs with or without direct interaction. An alternate approach is delegated (37%) invocation, whereby assistants act as standby for the user in order to delegate the tasks or queries to them, most commonly through a graphical interface or by voice. Lastly, mixed (20%) invocation techniques, also known as mixed-initiative assistants, utilize a stochastic model to actively anticipate user needs and collaborate with them, so that the user can take over when desired, or vice-versa.

4.2.3 Flexibility

Flexibility refers to the ability of the assistant to adapt its own behavior based on user preferences or context. For human assistants, learning the user’s preferences, improving their own competence and adapting to the situation are indicators of their own ability to assist. The majority of digital assistants (55%), on the other hand, have been static; their abilities do not change over time or with context. They are followed by adaptive (35%) assistants which modify their own behavior to conform to the user’s context or preferences over time, and lastly, adaptable (10%) assistants which provide an interface to the user to fine tune its features.

4.2.4 Input Combinations/Modalities

The input modality refers to the channel through which the assistant receives input, either from the user or from the surrounding environment. Not surprisingly, peripheral input (67%) (keyboard, mouse, touchscreen etc.) constitutes the main interface used by assistants for receiving information from the user, followed by language via text and/or speech (20%). A variety of input combinations have been used with external sensors consisting of cameras, depth sensors, wearable sensors and environmental sensors to detect the user’s gaze, posture, body signals etc. or the surrounding environment outside the application itself (peripheral input and environmental sensors (5%), language and environmental sensors (5%) and solely environment sensors (4%)).

4.2.5 Feedback Combinations/Modalities

Similar to the input modality, the output modality refers to the channel through which the assistant presents its feedback/response to the user. Not surprisingly, visual feedback (79%) has been the most widely used form of output modality in the past decades, followed by language + visual feedback (15%) combining natural language output such as speech alongside a screen. A few assistants (3%) can direct their output to devices such as wearables to augment visual output with haptic feedback (visual + haptic feedback). Speech or haptic only feedback appear in a minuscule percentage of the dataset (2% and 1% respectively).

4.2.6 Multimodality

Multimodality is defined as the ability of a system to combine more than one input or output, either in a sequential or concurrent manner (Nigay and Coutaz 1993). A total of 25% of the assistants in our dataset supported multimodal input or feedback. The rest relied on a combination of a single input and output modality.

4.2.7 Embodiment

It is well known that humans tend to attribute human-like qualities to computers and media (Reeves and Nass 1998) based on a variety of factors Epley et al. (2007). For instance, recent studies show that users attribute anthropomorphic qualities to speech-enabled interfaces (Cowan et al. 2017). Nonetheless, most articles in our dataset do not mention eliciting embodiment or anthropomorphism as an explicit design objective. Only a small fraction (7%) of our data set consists of assistants who have been designed with an ’avatar’ or explicit identity to visually exhibit emotions and social cues.

4.2.8 Platform Diversity / Ubiquity

Platform-specific (64%) assistants running on a single system or application have been the norm in the past decades. Their strong point is specialization in regard to features and hardware/software compatibility, but not necessarily networking or ubiquity. In comparison, web-based (23%) assistants that provide their services through a web interface have been used to provide cross-platform compatibility. Propelled by miniaturization and advances in mobile technology, mobile devices (10%) enable assistants as applications on mobile platforms. A tiny percentage of our data set consists of assistants that can run on and connect to multiple devices (2%) with varying form factors by relying on specific application-programming interfaces (APIs) provided by commercially available assistant platforms. These examples are also the most recent in our dataset, showing that intelligent assistant technology has reached a level of maturation such that it can be used as a basis for evaluating other, more nuanced aspects of human-intelligent assistant interaction.

4.2.9 Learning Capability

Although human learning is assumed to be a lifelong process, machine learning is often subject to technical and infrastructural constraints. As a result, not all intelligent assistants learn during use. More than half (57%) of the examples in our data set consist of assistants that rely on pre-trained models (learned) which can be updated but not modify themselves. Examples include expert systems and assistants utilizing case-based reasoning, natural language processing, speech synthesis and data classification. The second category consists of assistants that actively learn (43%) during use, either to adapt to the users’ preferences over time or to improve the quality of their algorithmic output.

4.2.10 Outcome(s) and Outcome Types

Viewed broadly, and juxtaposed against the outcomes achieved by human assistants, the type of outcome attained by intelligent assistants falls primarily under the augment category (81%), where the added value is an enhancement to a normal user attribute. We found several sub-categories in this respect. The first sub-category consists of improving productivity (53%), in which the assistant reduces the time spent on carrying out certain repetitive activities by leveraging computational power to do them faster but not necessarily qualitatively better. Doing so may also maximize users’ utility since the time saved could theoretically be used to solve more cognitively challenging problems, or maximize resource utilization under circumstances that place strict limits on user time and resources. The second sub-category consists of assistants that act as ‘task quality enhancers’ to improve work quality (24%). These assistants evaluate users’ work against established procedures and guidelines to detect errors and suggest improvements. Assistants have also been developed for pedagogic purposes to facilitate training and skill acquisition (13%) by acting as intermediaries between students and teachers, with the simultaneous aim of improving student performance and reducing teachers’ workload. In some cases, they automate certain pedagogical tasks such as assigning and evaluating homework, and in others, they qualitatively enhance teaching materials by providing multimedia tools and resources to teachers. Finally, a few assistants have been used to augment user experience (10%) in leisure activities by providing context-sensitive information, guiding the user, delivering hedonic experiences and increasing engagement to promote activities and skill acquisition. In the second category of outcome types, 19% of assistants could step in when user performance was below-average, i.e., to compensate for a drop in user performance.

4.2.11 Cooperation (Entities and Targets) (RQ2.3)

As we mentioned previously, the awareness of an assistant in a human workplace extends across various entities – the user, the tools that are used, the resources used, the clients that are serviced and the practices themselves. These entities constitute the configuration in which the assistant and the user cooperate. In the case of intelligent assistants, we looked for a similar configuration of digital entities that have been modeled by designers and researchers.

In the most general sense, the assistant is packaged within a digital workspace, consisting of the application interface or tools used to create and modify digital objects that are the end product of user effort, as shown in Fig. 3. The assistant just as the user, has access to both these entities, along with digital resources such as databases, knowledge bases, files and computing components inherent to the digital workspace. In case user activity is carried out using physical tools and on physical objects, the assistant can be designed to communicate with and act on them. Work practices are modeled algorithmically as logical rules or process flows.

Conventionally, intelligent assistants act upon one or more common entities, in the majority of cases the entity being the application interface (64%) where the assistant interacts with the user through the application interface, taking queries, providing answers or suggesting improvements. In other scenarios (22%), the assistant is aware of and directs its effort at the objects of the user activity (such as files, diagrams, plans etc. or physical objects such as the workpiece), modifying them in some way. Finally, assistants acting on both the application interface and the objects make up the remaining portion of the dataset (14%), exemplifying varying levels of awareness and sophistication.

Fig. 3
figure 3

Contextual model in the case of intelligent assistants

4.2.12 Application Domains

Whereas contemporary intelligent assistants are most popular in the personal domain, our literature search shows that this is only a recent development. Viewed historically, numerous assistants have been introduced in the professional/work domain (43%), that is, for users at the workplace. This includes diverse sectors such as design, development and verification (software, mechanical and architectural), medical services (diagnosis, analysis and documentation), legal services (argumentation etc.), and aerospace (flight control etc.). The assistants here are mostly viewed as ‘helpers’ for experts. Private (43%) use characterizes the most popular and well known form of assistants, and it encompasses a variety of tasks that all users may perform on their personal devices regardless of their level of expertise, ranging from email filtering, managing appointments, internet use and information query, to giving product recommendations and managing home automation. Training and education (14%) comprises the third-largest category of intelligent assistants situated between novices and experts, targeted at the education and training of both students and adults in various professions.

5 Analysis of Trends in the Design and Evaluation of Intelligent Assistants (R03)

One of the main tests of a useful typology is its explanatory nature – it should provide a structure for understanding past trends, drawing inferences about the future, and postulating relationships and hypotheses between characteristics (Nickerson et al. 2013). In line with this reasoning, we present some of our observations.

5.1 Trends in Intelligent Assistant Design (RQ3.1)

Based on the relative number of objects in each category, some trends are directly visible, which we discuss below.

Augmentation is favored over compensation. We noticed that more that 80% of intelligent assistants are designed to augment rather than compensate the user. In our view, this disparity most likely mirrors a dilemma of human-technology interaction itself. Augmentation could be viewed as a use case where the assistant and the user actively work through a predictable, well-defined process. The assistant in this case may be able to suggest recurring steps, identify mistakes, and execute certain branches either automatically or upon delegation. Compensating for a user’s loss of performance requires that the assistant should ideally possess the ability to take over from the user at any time, implying that the assistant should be fully capable of carrying out particular tasks on its own. Further, a mistake made by the assistant in compensation mode would be more damaging to the overall situation than in augmentation mode because the user may no longer be able to intervene to override the assistant and salvage the situation.

Supplying information forms the core activity of an assistant. Assistants that provide information to the user in the form of suggestions, evaluations or upon request have continuously received attention from researchers and developers (Fig. 4). This is also reflected in studies about the usage patterns of speech assistants (Whitenton and Budiu 2018).

Fig. 4
figure 4

Chronological distribution of activity types

Speech based assistants are becoming popular. As Fig. 5 and Fig. 6 show, assistants that support natural language input and output have gained popularity over the past few years, while simultaneously we also see a decline in peripheral input. At the same time, assistants that rely on a combination of external sensors (both non-wearable and wearable) in the user’s environment have not been popular to the same degree.

Fig. 5
figure 5

Input modalities in assistant design over the years

Fig. 6
figure 6

Output modalities in assistant design over the years

5.2 Relationships between Assistant Characteristics

In this section we provide some insights into relationships among some variables of our typology. We do not claim that there is a causal relationship between these variables, but these tendencies could point towards specific design patterns and trends which may warrant a more objective research effort.

Assistant activities vs. outcomes. Fig. 7 shows how the activity performed by the assistant contributes to the outcomes achieved by the user. Whereas some outcomes are more dependent on information provision, others also require executing tasks. Therefore, the underlying nature of the activity influences how the assistant can help the user in any given case. For instance, the quality of work is improved by giving the user inputs on how their work could be improved, whereas improving productivity requires that a part of task execution may be taken over by the assistant.

Fig. 7
figure 7

Assistance outcomes in relation to the activities performed

Domain vs. outcomes. Fig. 8 presents the relative distribution of assistants’ goals in the domain for which they have been developed. Assistants are primarily developed to save time and costs both at work and in private use, with the goal of quality improvement emphasized at the workplace. On the contrary, saving time and costs does not feature as a goal in promoting learning, which accounts for a difference between pedagogic and work domains. Finally, the majority of assistants in the private domain are developed to simplify information search, which relates to the fact that most assistance in the private domain is designed for non-expert users in contrast to experts at the workplace.

Fig. 8
figure 8

Distribution of assistance outcomes in respective domains

Outcomes vs. invocation methods. The nature of the goals that assistants strive to attain influence the approach taken in their interaction with the user. Fig. 9 cross-references these two characteristics. It is apparent that information search is delegated to the assistant, whereas learning and teaching assistants tend to be highly autonomous. For other goals there is no particular trend – these are more likely based on the specifics of the task the user needs assistance with. This may imply that designers need to take the outcomes into account when deciding on the level of autonomy of an assistant.

Fig. 9
figure 9

Distribution of outcomes with respect to invocation modes

Domain vs. outcome types. The domain and the activity itself also plays a decisive role in determining the compensatory mechanism. As an example, missing a turn while driving triggers most navigational assistants to re-plan the route, since there is no back-tracking needed to understand why the user missed the turn. A driving assistant on the other hand, may have to explain to the user why it missed the turn. Interestingly, both offset and augmentation modes are supported by assistants in the learning/training domain (Fig. 10), comprising 13% of our dataset. These domains are characterized by assistants that compare student performance with sandboxed, expert models of learning, making it possible to identify and correct problems, offer solutions, and gradually remove learning crutches as the learner demonstrates competence.

Fig. 10
figure 10

Distribution of domains with respect to outcome types

Activities vs. targets. The activity of the assistant somewhat influences the target which determines the assistant’s actions. As Fig. 11 illustrates, information is usually provided at the level of the application (in some cases on the object). Task execution is primarily targeted creating and changing objects and only in rare cases, actual manipulation of the principal’s tools (i.e., the application interface). Learning assistance is offered at both application and object level, in that the user learns about the content and tools/techniques.

Fig. 11
figure 11

Distribution of assistant targets across activities

Further, mobile assistants are mostly used for providing information, whereas platform-specific assistants and web-based assistants are developed for multiple activities (Fig. 12). However, as the line between device form factors is becoming increasingly blurred, this observation may not hold in the future.

Fig. 12
figure 12

Distribution of platforms across assistant activities

5.2.1 Opportunities for Exploration

Housekeeping deserves more attention. In Sect. 3.2, housekeeping tasks were identified as background activities carried out by human assistants to ensure that the resources (information, materials etc.), the workspace, and devices required for everyday activities are present and operational both before and after the activity. Contrary to human assistants’ work, we did not find any examples of intelligent assistants in our dataset carrying out housekeeping tasks. In our view this area could benefit from more attention, especially in an age where users regularly engage with an increasing number of digital devices, tasks and digital resources spread among several devices, applications and locations.

Cross-device and multi-device functionality remains relatively unexplored. It is understandable that the majority of assistants have historically been confined to a single device, given that mobile computing took off with the introduction of smartphones in the late 2000s. The trend is reflected by a chronological analysis of the ubiquity dimension which shows a gradual increase in the number of intelligent assistants offered as mobile applications (Fig. 13). At the same time, we see only a few cross-device assistants that are capable of seamlessly running on devices with different form factors and capabilities (for example, combining the desktop with wearable, augmented, mixed or virtual reality (AR/VR/MR) devices). In our view, as the number of offerings in the wearable and AR/VR/MR sector increase, new research opportunities could open up in this area.

Fig. 13
figure 13

Platform diversity in assistant design

Enhancement of user experience remains relatively unexplored. An overwhelming majority of assistants are what we would label as ‘serious assistants’ meant for work instead of fun (Fig. 14). We hypothesize that this corresponds to the tendency of utility-driven design that views technological products as providers of functional features and benefits, while ignoring the experiential aspects (Hassenzahl et al. 2010). Consequently, pragmatic attributes dominate and hedonic or eudaimonic possibilities have been largely neglected so far. This could represent an opportunity for research in the future.

Fig. 14
figure 14

Outcomes for which assistants have been designed

5.3 Design Themes in Intelligent Assistant Research

Every designed artifact is instantiated to meet specific goals, and in the case of assistants this is no different. Table2 shows, after ranking the design criteria based on their frequency of occurrence, the normative design goals that informed the development of intelligent assistants from both design and use perspectives. Some of these goals focus on solving specific problems resulting from the design of contemporary assistants, while others focus on improving the usefulness and sociability of the assistant. Accordingly, we discuss each in turn.

Adaptability. The idea that the assistant should, over time, adjust itself to the user and provide context-aware services is highly influential both at a functional and interface level. Our typology already captures these under the flexibility and learning dimensions. Adaptability allows intelligent assistants to be autonomous and minimize their interaction with the user, and as an added benefit reduces user workload (Wittig and Griwodz 1995; Brancaleoni et al. 1997; Menczer et al. 2002; Myers et al. 2007). While human assistants get to know the preferences of their users over time, intelligent assistants achieve their adaptability through several specific mechanisms—surrounding environment, frequency of feature use, structural modularity etc.

Control and intelligibility. Researchers have laid emphasis on explicitness and transparency of the assistant’s behavior, for instance that the user knows what the assistant does, is able to predict its behavior and is aware of the means to stay in control (May and Vargas 1996; Myers et al. 2007).

Value-add. Related to the usefulness dimension of usability and technology acceptance, many researchers have highlighted that the assistant needs to provide a clear benefit to the user (De Roeck et al. 1998; Matthews et al. 2000; Franklin and Hammond 2001). This usefulness is conventionally measured as an improvement of the user’s productivity and efficiency.

Sociability. This dimension concerns the anthropomorphic nature of an intelligent assistant, with the aim of making the interaction with an assistant as natural and effortless as possible. An assistant should not only be visible but also engage with the user and be likeable, capable of demonstrating an awareness of social conventions in order to maximize the perceived anthropomorphic attributes. Some authors have also emphasized that assistants should comprehend user emotion and be capable of displaying empathy (dos Santos et al. 2002; Myers et al. 2007; Morana et al. 2020).

Architecture and Privacy. The design and maintenance of the software architecture of the assistant itself has been an important topic of discussion in recent years. The fact that most intelligent assistants for personal use are developed and marketed by software giants has made many researchers wary of a ‘virtual assistant monopoly’ where proprietary technologies may threaten the users’ choice and privacy. Several recent research efforts have been directed at developing open source assistants such as Almond (Lam et al. 2019) or Mycroft (Gesling 2019). Since many state-of-the-art assistants providing personalized services gather and analyze user data, in the past few years both the data gathering process and its interpretation has invited intense scrutiny. Many researchers have highlighted the risks of breaches that may inadvertently reveal personal information or expose intelligent assistants to adversarial remote control and network attacks. Designing for privacy has consequently received research attention (Jain et al. 2017; Lam et al. 2019).

Design techniques and frameworks. The idea that intelligent assistants should be autonomous agents has driven the design of several modern assistants. Autonomous agents are expected to work independently of user interference or need for oversight and control, which in turn, saves the user time and effort. Many assistants, for example, are based on the beliefs-desires-intentions (BDI) or multi-agent frameworks (Menczer et al. 2002; Todorov et al. 2016). A more recent development has been gamification, where the assistant is developed using game-like features to motivate or nudge the users to change their behavior and achieve goals (Magaña and Muñoz-Organero 2016). Research has also drawn attention to the presence of embedded socio-cultural stereotypes in the intelligent assistant design process. For example, the personality of a majority of anthropomorphic assistants is modeled after Caucasian females, and one way of ‘designing out’ these ‘stereotypes, judgments and biases of the creators or their culture’, may be to involve users in the design process (Spencer et al. 2018).

Table 2 Design objectives, ranked in decreasing order of frequency
Table 3 Attribute groups for evaluating asssitance

5.4 Evaluated Characteristics in Intelligent Assistant Research (RQ3.2)

Research goals can be measured via evaluation against specific, well-defined variables. Table3 shows the frequency distribution of the evaluated attributes of assistants. The system performance of the assistant (e.g., predictive or learning accuracy, precision, technical performance) is evaluated most frequently, which is hardly surprising when taking into consideration that most assistants are presented as use-cases for the underlying technological achievement. The user performance is the second most frequent metric which indicates the value-add of the assistant for the user, measured as the change in user performance/effort or progress with vs without assistance. Further, user feedback about the assistant (in the form of subjective evaluation and user satisfaction) is the third most commonly used metric, in most cases applied to reveal aspects of user opinion not visible in performance-based studies. Usability and ease-of-use are mentioned only in a small minority of papers, followed by technology acceptance.

5.5 Summary

Table 2 illustrates that the assistant is envisioned as an adaptive, human-like companion to the user. Simultaneously, these capabilities are matched with a desire to control it, understand it and to prevent its incursion into personal life. One cited reason is that providing assistance requires remodeling tasks for algorithmic purposes, which, when derived from a set of principles or scientific theory, ends up ignoring situated work practices and alienating users (May and Vargas 1996) . Interacting with an intelligent assistant should thus feel similar to working with a human assistant (Myers et al. 2007). Another cited reason is that the benefits of automation ought to be weighed against its effects on (perceived) predictability and intelligibility (Horvitz 1999). This is evidenced by the fact that the opaqueness of an assistant’s intelligence lowers the user’s trust or confidence in it (Faulring et al. 2010).

We have also observed that the case for assistance is made primarily on the basis of performance factors at work. Intelligent assistants are seen as a serious ‘performance enhancers’ or ‘time savers’ in the context of human work, affirming the automation mindset which views technology as a tool for accomplishing tasks. We found varied explanations as to why and how the assistants were capable of improving user performance. In some cases, the improvement comes from the users changing their mindset when working with an assistant (Gustafson et al. 1998), whereas in others the interaction with an assistant moderates performance by either reducing user workload due to a significantly faster completion of sub-tasks (Yang et al. 1994; Babaian et al. 2002), or by shifting the modality to a more efficient form of communication such as speech (Fast et al. 2018). Other than that, most approaches evaluating assistants are used to confirm design aspirations either as direct proofs of the usefulness and efficacy of an assistant, or as first-hand accounts of users’ own evaluation of the assistant. In both cases, the user experience is rarely in focus, contrary to the long-standing view that introduction of technology changes both the tasks and the users in a mutually influencing loop (Norman 1992). Once we take this loop into account, we may have to modify our approach towards measuring and understanding the relationship between users and intelligent assistants (Steinfeld et al. 2007; Berry et al. 2011).

6 Article Summary and Potential Avenues for Interdisciplinary Research

In order to design intelligent assistants, recent discourse suggests the adoption of a socio-technical perspective by focusing on the different conceptual dimensions of cooperation between users and intelligent assistants (Maedche et al. 2019). In this section, we recap our analysis and explicate how the characteristics of a ‘good assistant’ as understood in the context of human work differ from those defined by researchers in the domain of intelligent assistants. The insights gained here raise specific interdisciplinary research questions which may be of interest to all stakeholders.

Human assistants provide supplementary aid to principals in the form of work or resources with respect to a specific goal to bring about a positive change. Goals establish a common understanding of what is to be achieved (for example, learning is an inherently different problem than performing a medical intervention or executing routine tasks). While the specifics of goals (how, when, where) are rooted in established practices and work conventions, good assistants continuously adapt to their principals’ situation by not only augmenting them but also compensating for them so that the outcome is always relative to a ‘set-point’, or the ‘usual case’ determined by principal-assistant relationship, as shown in Fig. 2. They also regularly undertake housekeeping tasks to maintain a smooth working environment, and strive to improve their own competence. Principals provide feedback, guidance, and supervise assistants, so that both develop a common mental schema to communicate with each other. This is acquired through training and experience when using the common tools to work on common objects (Fig. 1).

In the digital domain, as Fig. 3 shows, an intelligent assistant is an agent, but also a meta-application that is distinct from the use of other applications and yet relies on them. Depending on the activity and problem, the assistant can either provide information/feedback or execute tasks by interacting with multiple entities in the digital workspace which comprises the application interface, digital resources and objects. Owing to rapid digitalization, assistants are also beginning to transcend their digital workspace by interfacing with and augmenting physical entities such as tools and objects. Nonetheless, the most common mode of assisting is to provide information autonomously on a single device, thereby relying on visual modality and using the application interface to improve productivity in information related tasks, in our view a classic case of information systems use. In the works we analyzed (we deliberately did not include works in the medical domain for people with impairments, since we were interested in more general, non-medical assistant systems), augmenting the user by relying on existing user capability is the conventional goal, whereas compensation is rarely considered. Each domain places different demands on the outcomes to be achieved by assistants, which in turn influence the cooperation modes as well as the modalities that make up the assistant.

Comparing the two kinds of assistants, it is apparent that, first, in working with a human assistant, the principal’s role demands experience and overarching domain knowledge. Intelligent assistants, on the other hand, are seen as helpers that either cover up an unintelligible interface (Morana et al. 2020) or carry out tasks that the user delegates. Unless human users already have sufficient domain knowledge, they cannot delegate tasks to assistants since they cannot communicate what they do not know (Yankelovich 1996) and will have to be guided by assistants. Unless users build this domain knowledge, at the other extreme, over-relying on assistants may result in a ‘dumbing-down’ such that a failure or removal of the assistance significantly impairs the user’s abilities (Danaher 2018; Hernández-Orallo and Vold 2019). Hence, intelligent assistants may have to be designed to not only assist, but also help users maintain and even further their own procedural and factual knowledge. Users may still need access to the application interface and digital objects to maintain their own competence in using a digital technology or system, to co-evolve with technology, and to assess the ability of the assistant.

For researchers::

How do we investigate the interplay between the users’ own knowledge and mental schema about a task and its representation in the digital domain? What are the consequences of a mismatch, both for the user’s own self-evaluation and their opinion of the assistant? How can a shared understanding be reached?

For designers::

How can this fact be incorporated when choosing suitable design characteristics regarding the invocation mode, modality, assistance target as well as the outcomes of assistant interaction? Should assistants also incorporate learning materials for the user, and are there existing mechanisms or design patterns which could be of use?

For decision-makers::

Are assistants suitable for problem domains where employees lack fundamental knowledge? If not, would it be necessary to re(train) employees before they can exploit the capabilities of assistants?

Second, when working with a human assistant, the principal carries the responsibility for decisions and actions of the assistant, which means the relationship is built on trust and an accurate judgment of the assistant’s competencies. It is not so clear if the same would be the case when interacting with intelligent assistants. In many cases reliance on an assistant may be perceived negatively (if something is easy to do yourself, why would you risk delegating it to an assistant only to see it fail at the task?). Even if users do manage to successfully delegate tasks, would they be willing to attribute their success to intelligent assistants? Furthermore, the additional knowledge needed to design intelligent assistants, which usually comes from domain experts, creates a morally ambiguous situation for the user both in terms of the ownership of successes (to whom is success attributed?) and the accountability of problems (who is responsible in these situations?). Could these dilemmas diminish the user’s own sense of competence and autonomy in the long term? These questions may be crucial to acceptance of assistants.

For researchers::

What effect do failed interactions and mistakes made by the assistant have on users’ self-competence and their assessment of trustworthiness of the assistant? And conversely, how does the attribution of success to an assistant (or its creators) affect users’ self-competence and propensity to trust the assistant?

For designers::

How can the design handle failed and/or successful interactions sensitively, keeping in mind the user’s self-competence and self-esteem may rely on these encounters? Which interactive strategies could be used to improve users’ competence with time?

For decision-makers::

Employee motivation is an important driver of performance. How are success and failure scenarios with assistants to be interpreted in this respect? Given how closely the role of an assistant may be related to the business process itself, should organizations invest in building in-house competence for developing assistance applications?

Third, and in relation to the point above, human assistants also work proactively in the background as housekeepers, even before the principal’s involvement, and after the task has been completed. Doing so requires initiative and access to the common environment. Intelligent assistants, on the other side, have been mostly designed to actively assist the user, hence housekeeping may be an interesting avenue to explore in the design of assistants for activities which users enjoy doing on their own.

For researchers::

Which user activities are in need of housekeeping, and what is the role of housekeeping from a psychological perspective? Can the metaphor of housekeeping be transferred to the digitalized life, e.g., as ’digital housekeeping’? A task analysis could be fruitful in this regard.

For designers::

How can housekeeping be designed in assistants, at a generic and context-specific level, given that the user activity consists of the user, their tools and resources?

For decision-makers::

Employees spend a significant amount of time looking for the right information and tools to begin and execute a task. Could assistants that support housekeeping be a viable option from a business perspective?

Fourth, although human intelligence is fundamental to most tasks, its intentional character is not limited to utility maximization. Human beings have a natural tendency to understand their world, to develop and to actualize their potentials, and to fulfill basic needs of competency, autonomy and relatedness (Ryan and Deci 2002). Feedback, guidance and supervision is important for human assistants to enable them to continuously learn and adapt to their work environment. Machine intelligence, on the other hand, has conventionally been designed with specific goals in mind as an attempt to replicate human capability or behavior, such as adaptability or utility maximization, with designers and developers wielding considerable control over the skills, capabilities and ‘black-box’ of intelligent systems. It may be possible, however, that machine intelligence in near future becomes more human-like and contends for human tasks. Despite this, a more nuanced view could focus on maximizing the well-being of the user by more deliberately and systematically incorporating both hedonic and eudaimonic aspects of human experience (Mekler and Hornbæk 2016) into the design of intelligent assistants. As we observed, only a small minority of assistants have been designed to enhance user-experience and well-being.

User experience considers the product as a tool for “manipulating the environment” to achieve pragmatic goals or as the provider of stimulation or identification (Hassenzahl 2005; Hassenzahl et al. 2010). An assistant, while in some situations doing the same, in many cases simultaneously influences the user and the task. Whereas hedonic goals concerning pleasure and enjoyment are useful in the short term, a balance between comfort and personal development may be a more suitable approach for the long term, requiring the user to accept challenges and hence leave the personal ‘comfort zone’.

For researchers::

Does prior motivation influence users’ expectations towards assistants? Do hedonic and eudaimonic aspects of human-assistant interaction vary in relation to the task complexity, assistant autonomy and user competence? Can users be nudged to leave their comfort zone? Should the development of assistants address barriers, refusal and fears, or give importance to users’ well-being and motivation (Pawlowski et al. 2015; Gutsche and Griffith 2017)?

For designers::

There are several combinations of when, where and how assistants can act, or which intelligent capabilities they may possess. Is it possible to select a set of suitable candidate properties that are more desirable than others? Which features could be stable, and which would need intelligence or continuous adaptation? Could strategies from other domains such as positive psychology serve as blueprints?

For decision-makers::

Eudaimonic well-being has been demonstrated to positively influence business performance and employee engagement (der Kinderen and Khapova 2021). How could business processes be (re)designed to support an active integration of assistants in the entire process, especially for supporting eudaimonic well-being?

Finally, culture, norms and conventions permeate the work environment of human assistants. It is well known that the use of technology is determined by cultural norms and values (Srite and Karahanna 2006), and well-being research also points to variations in correlates of well-being between cultures (Diener et al. 2003), which leads us to the hypothesis that the desirability of, and interaction with, intelligent assistants will be influenced by the user’s cultural background. For instance, the holographic Gatebox assistant (Arauner 2017) embodying the inherently Japanese conception of an intelligent companion which consists of of an anime character/avatar is considerably different from the voice assistants developed in the western world. Cross-cultural research is needed to fill in this gap between user expectations and design strategies.

For researchers::

Do expectations towards assistants and preferred modes of interaction vary across cultures?

For designers::

How could design processes incorporate cultural differences in designing assistants, such that it is easier to localize them? Could also these localization features extend to assistant characteristics? If yes, how?

For decision-makers::

Multinational firms may have to take a cautious approach when it comes to deploying assistants – one size may not fit all, and assistants may have to be adjusted in order to conform with cultural norms and values. On the other hand, in an era of increasing globalization, assistants developed by small-to-medium sized enterprises may also have to accommodate the expectations of employees from different cultural backgrounds.

In the preceding paragraphs, we have outlined some possible avenues for exploration in research and design of intelligent assistants. Descriptive knowledge in the form of a typology or taxonomy such as the one presented in this paper does not prescribe the configuration of dimensions and characteristics that is best suited to achieve a particular goal. The space between descriptive knowledge and design goals can possibly be traversed by a kind of prescriptive knowledge in the form of design principles, patterns, guidelines and/or tools, so that designers can work their way backwards by setting goals, winding back to the components that play a role in achieving these goals, instantiating artifacts and evaluating them. Doing so may involve the use of classic design science frameworks such as the ones proposed by Gregor and Hevner (2013) or Peffers et al. (2007). To make this task easier we have identified and presented the conceptual and behavioral components of assistants in this article (Table1). Fig. 3 provides a way to visualize the design space constituting the entities and modes of cooperation between the user and the assistant in order to apply best practices. The typology and contextual model could serve as a starting point for constructing a more comprehensive design space that matches user expectations with design features (Lowry et al. 2015). It is our hope that engineering-oriented design research as well as behavioral and theory-oriented research into assistants can be enriched by establishing empirical links between these dimensions.

7 Conclusion

Assistants are gaining capabilities and making inroads into diverse walks of life, and could, in future, play a major role in users’ private and work lives. However, investigating this role and designing assistants that best meet specific user goals is a task of considerable complexity, one that requires an understanding of the relationships between the technological dimensions and/or characteristics of assistants as well as their effect on users and their tasks. In this article we take an interdisciplinary approach to derive descriptive knowledge including the conceptual dimensions of an assistant and the context of assistant use. We begin with an investigation of the work of human assistants, which reveals that the nature of assisting is cooperation between the principal, the client, and the assistant, mediated via the use of common tools, objects and resources in a joint work environment consisting of established norms. The role of a human assistant is to either augment or compensate the principal’s attributes.

The design of intelligent assistants borrows some of these properties and adds specific technology driven capabilities, whereby intelligence is assumed to designate an emergent combination of adaptability, human-like behavior, and autonomy. However, comparing the work of ‘good’ human assistants with the functionalities of their digital/intelligent counterparts reveals several gaps which may provide opportunities for designing novel assistants in the future. In the past, utilitarian outcomes have driven the design of intelligent assistants, but as their technology and capabilities mature, incorporating the hedonic and eudaimonic aspects of human-assistant interaction may be crucial. A more in-depth, multidisciplinary investigation of user expectations, theories, patterns and guidelines could advance the field. We sincerely hope that our effort lends the conceptual clarity necessary for developing more advanced assistants, and moreover provides a frame of reference for future design and evaluation of intelligent assistants.