Next Article in Journal
Navigating the Human–Robot Interface—Exploring Human Interactions and Perceptions with Social and Telepresence Robots
Previous Article in Journal
Stability Control of Underground Openings Under High-Stress and Deep Mining Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Automated Hierarchy Method to Improve History Record Accessibility in Text-to-Image Generative AI

1
Department of IT Convergence, Dong-Eui University, Busan 47340, Republic of Korea
2
Department of ICT Industrial Engineering, Dong-Eui University, Busan 47340, Republic of Korea
3
Department of Computer Engineering, Dong-Eui University, Busan 47340, Republic of Korea
*
Author to whom correspondence should be addressed.
Submission received: 30 October 2024 / Revised: 31 December 2024 / Accepted: 15 January 2025 / Published: 23 January 2025

Abstract

:
This study aims to enhance access to historical records by improving the efficiency of record retrieval in generative AI, which is increasingly utilized across various fields for generating visual content and gaining inspiration due to its ease of use. Currently, most generative AIs, such as Dall-E and Midjourney, employ conversational user interfaces (CUIs) for content creation and record retrieval. While CUIs facilitate natural interactions between complex AI models and users by making the creation process straightforward, they have limitations when it comes to navigating past records. Specifically, CUIs require numerous interactions, and users must sift through unnecessary information to find desired records, a challenge that intensifies as the volume of information grows. To address these limitations, we propose an automatic hierarchy method. This method, considering the modality characteristics of text-to-image applications, is implemented with two approaches: vision-based (output images) and prompt-based (input text) approaches. To validate the effectiveness of the automatic hierarchy method and assess the impact of these two approaches on users, we conducted a user study with 12 participants. The results indicated that the automatic hierarchy method enables more efficient record retrieval than traditional CUIs, and user preferences between the two approaches varied depending on their work patterns. This study contributes to overcoming the limitations of linear record retrieval in existing CUI systems through the development of an automatic hierarchy method. It also enhances record retrieval accessibility, which is essential for generative AI to function as an effective tool, and suggests future directions for research in this area.

1. Introduction

Recently, high-performing generative AIs (e.g., Dall-E, Midjourney) have been widely used for visualizing abstract ideas, obtaining inspiration, and visually expressing one’s thoughts due to their ease of use [1,2,3,4]. These tools are not only utilized by general users but also by domain experts in fields such as healthcare [5,6] and architecture [7,8], where generative AI reduces the time to complete tasks and minimizes labor-intensive repetition. Additionally, generative AI has been used in creative fields such as art [9] and design [10,11,12], where it serves as a tool to overcome design fixation and generate inspiration by providing concept images. The widespread adoption of generative AI by people across various fields can be attributed not only to its strong performance but also to its user-friendly conversational user interface (CUI), which has made these tools accessible and efficient [13,14,15]. CUI enables interactions between users and generative AI like human-to-human interactions, such as turn-taking in sequential dialogue [16,17,18,19]. This makes generative AI highly accessible and intuitive to use.
However, despite these advantages, CUIs also have the drawback of inefficient linear navigation when searching for previous dialogue history. In contrast to graphic user interfaces, where users can quickly locate information via menus or search functions, CUIs require users to scroll through previous dialogues sequentially. As the amount of dialogue (information) increases, this becomes more cumbersome [20]. Although some studies have examined the limitations of linear navigation in ChatGPT [20,21], research addressing this issue in multi-modal AI CUIs, such as text-to-Image models like Dall-E and Midjourney, is lacking. We believe that this low accessibility to past records will hinder the adoption of generative AI as a functional tool, despite its strengths in ease of use and versatility.
This study proposes a clustering-based automatic hierarchy method to enhance access to records in multi-modal AI CUIs. We developed automatic hierarchy methods based on generation intent, designing two clustering-based approaches: one using result images and the other using input prompts. To validate our proposed hierarchy method, we conducted a user study with 12 participants to examine whether these methods improved navigation efficiency and to determine which of the two approaches provided greater user benefits. Based on these findings, we discuss ways to enhance accessibility to records, which is essential for generative AI to function as an effective authoring tool.
This study makes two key contributions. First, we developed an automatic hierarchy method to overcome the limitations of linear navigation in multi-modal CUIs for previous-generation results. Second, we contribute to solving the challenge of enhancing accessibility to past records, which is necessary for generative AI to establish itself as an efficient tool, and we offer directions for future research.

2. Related Works

2.1. The Potential of Generative AI as an Efficient Tool

As previously mentioned, generative AIs like Dall-E and Midjourney are becoming efficient tools in various fields due to their ease of use, with research exploring further applications. For example, in the medical field, studies have generated anatomical images or images related to cardiovascular disease to serve as visual aids for medical education and scientific publications [5]. In architecture, cases have been reported where generative AI is used to create initial designs and simulation materials before construction [7,8]. Additionally, generative AI serves as a creative support tool for individuals in creative professions (e.g., artists, designers) by providing visual inspiration or initial concept images [10,11,12,22]. Beyond specific fields, it is also utilized as an efficient tool for everyday tasks such as creating simple posters or generating images for interpersonal communication.
However, most existing studies focus primarily on aspects like generation speed and accuracy, without addressing issues such as improving access to past records to enhance AI’s usability as an efficient tool. Therefore, this study developed an automatic hierarchy method to improve accessibility to historical records in generative AI and validated its effectiveness.

2.2. Conversational User Interface

A conversational user interface (CUI) allows users to interact with systems through natural language, whether via text or voice, offering a more intuitive alternative to traditional graphical user interfaces (GUIs). Unlike GUIs, which rely on visual elements such as buttons, icons, and menus, CUIs enable users to perform tasks by engaging in conversations, making them accessible to a broader audience [23]. This interaction paradigm has become increasingly popular as a next-generation UI due to its versatility and ability to integrate seamlessly into complex systems.
CUIs are particularly effective in enhancing accessibility, streamlining workflows, and enabling natural interactions in AI-driven systems like ChatGPT and DALL-E. These interfaces rely heavily on advancements in natural language processing (NLP), allowing users to bypass the rigid structures of traditional interfaces. As a result, many AI applications are adopting CUI formats to improve usability and user satisfaction [24].
Jong Hyun Chin et al. proposed a feature and user interface (UI) to facilitate conversation exploration through a conversational user interface (CUI) and text-to-text generative AI by creating a hierarchical structure for conversation history. This proposal was based on the hypothesis that embedding text data within the same conversation and determining similarity could enable such functionality. However, their study did not include an actual implementation of this feature, and it focused primarily on identifying limitations in re-exploring conversation history from a UX perspective, which presents a key limitation of their research.
In this study, we aim to address these gaps by implementing the automatic hierarchy method and examining its impact on users while validating the feasibility of the technology. Furthermore, we move beyond the single-modality approach of text-to-text generative AI and focus on text-to-image generative AI, such as DALL-E and MidJourney, which have demonstrated significant potential as efficient tools through their rapid generation of visual content. Specifically, we propose an automatic hierarchy method to overcome the limitations of linear exploration that may occur when users re-explore the results of text-to-image generative AI through a CUI.
Additionally, we suggest two approaches for automatic hierarchy in the context of text-to-image generative AI: one based on embedding input prompts and the other based on embedding the resulting images within the record units. We further investigate how these two approaches impact users and explore the implications of the automatic hierarchy method on user experience.

3. Methods

To begin with, the problem addressed in this study must be clearly defined. We aim to improve the linear exploration process that occurs when using a conversational user interface (CUI) to explore previously generated image records, rather than the process of inputting text and receiving generated images iteratively in an image generation scenario. As mentioned earlier in the Introduction, linear exploration requires users to scroll through the records one by one to locate the desired past generation record. This process demands a significant number of interactions and a considerable amount of time. Therefore, we propose a method to automatically organize generation records hierarchically, enabling users to intuitively explore past records and reducing the effort and time required to locate desired records. Additionally, we developed a user interface (UI) by applying this method and validated its effectiveness through usability evaluations.
The operational workflow of the proposed automatic hierarchy method consists of five steps, and the overall process is illustrated in Figure 1. The following is an explanation of the workflow: First, the user’s image generation process is recorded in units of input prompts and the resulting images. This corresponds to Figure 1A. Next, features are extracted from these units, and a preprocessing step is conducted for hierarchy construction. At this stage, we devised two methods for extracting features from these units: a vision-based approach using the resulting images and a prompt-based approach using input prompts. This is shown in Figure 1B. In the third step, the preprocessed units of entire records are treated as individual points, and clustering is performed to organize them hierarchically. This step, independent of the preprocessing method, follows the same clustering process and corresponds to Figure 1C. Subsequently, to enhance the interpretability of the hierarchy and support intuitive exploration for users, representative keywords for each hierarchy level are extracted. This step is depicted in Figure 1D. Finally, the results of the hierarchical organization of generation records are provided to users through a UI that applies the automatic hierarchy method. This corresponds to Figure 1E. The design and implementation details of each step are discussed below.

3.1. Two Approaches for Hierarchy

We devised two approaches to enhance users’ accessibility in re-exploring their generation records, inspired by the interaction between the model and the user during the re-exploration process. The first approach involves layering based on the features of the prompt, which the user inputs to express their generation intent to the text-to-image model. The second approach involves layering based on the features of the output image generated by the model from the input prompt. We propose these two approaches because, given the current limitation that text-to-image models do not fully capture user intent, a hierarchy based on the visual characteristics of the resulting images may offer users better accessibility.

3.1.1. Vision-Based Approach

To extract the visual features of generated images, we utilized a vision transformer (VIT) model trained with DINO [25]. This model is pre-trained on ImageNet-1K without additional fine-tuning. Through this model, we obtained a sequence of fixed-size patches (resolution 16 × 16) that are linearly embedded in the image. While these embedded results are typically used for image classification tasks within labeled datasets, in this study, we used them as inputs for a clustering algorithm.

3.1.2. Prompt-Based Approach

For layering based on input prompt features, we used Microsoft’s mpnet-base model [26] to map sentences and paragraphs into a 768-dimensional dense vector space suitable for clustering tasks [27]. The encoded prompt vectors were then used as inputs for the clustering algorithm.

3.2. Unsupervised Learning for Automatic Hierarchy

A challenge we faced in this section was the inability to know in advance what types of images users would generate with generative AI. This means that we could not construct a pre-labeled dataset to classify user results or predefine the number of clusters for automatic hierarchy organization. Therefore, we addressed this challenge using HDBSCAN [28], a density-based clustering algorithm within unsupervised learning. The main operational process of HDBSCAN consists of four steps. First, the mutual reachability distance is calculated. HDBSCAN introduces the concept of mutual reachability distance, a transformed distance metric that considers both core distances and pairwise distances between points. This enables clusters to adapt to local density variations dynamically. In the second step, a hierarchical tree (dendrogram) of clusters is constructed based on the mutual reachability distance. This tree captures small, densely packed clusters and larger, more dispersed ones. The third step involves evaluating the stability of clusters across various density thresholds. Stability is measured by the degree to which clusters persist within the hierarchy. The most stable clusters are selected as the final results. In the fourth and final step, points that do not belong to any cluster from the third step are classified as noise. This allows for the natural handling of outliers and noisy data.
Through these processes, HDBSCAN does not require the number of clusters (k) to be predefined, unlike centroid-based methods such as K-means [29]. Furthermore, HDBSCAN exhibits robust performance against noisy data by leveraging its transformed distance metric, outperforming traditional density-based clustering algorithms such as single linkage [30]. Ultimately, this approach enables more efficient clustering. Additionally, compared to DBSCAN [31], HDBSCAN has the advantage of reflecting local density information and achieving hierarchical clustering based on the structure of the input data. For these reasons, we adopted HDBSCAN in our study and set the hyperparameters to a minimum cluster size of 2 and an alpha value of 0.001.

3.3. Hierarchy Keyword Extraction

In this section, we extracted keywords to represent each cluster and applied them to the automatic hierarchy method. This was necessary because, although the clustering results from the previous Section 3.2 showed good performance, representing clusters as “Cluster 1”, “Cluster 2”, etc., in HDBSCAN was not suitable for enhancing the efficiency of users’ record re-exploration.
Therefore, we designed the system to extract keywords from the input prompts of the images within each cluster, making these keywords representative of the corresponding hierarchy. To achieve this, we grouped all prompts within a given cluster into a single paragraph and then used the BART model fine-tuned on the CNN Daily Mail dataset [32] to summarize all prompts included in the cluster. This approach prevents frequently used terms, such as “8K” or “masterpiece”, which users often repeat to ensure high image quality, from being selected as representative keywords. We then applied the maximal marginal relevance (MMR) algorithm [33] to extract new keywords that are not similar to the already selected ones, setting the diversity parameter to 0.8. The MMR algorithm is defined in Equation (1).
M M R = λ · S i m ( d , Q ) ( 1 λ ) · max d D S i m ( d , d )
The explanation of terms is as follows:
  • λ : A parameter that balances relevance and diversity.
  • Sim ( d , Q ) : The similarity between document d and query Q.
  • max d D Sim ( d , d ) : The maximum similarity between document d and each document d in the selected set D .

3.4. Generative AI, Baseline UI, and Automatic Hierarchy UI for User Studies

3.4.1. Generative AI

In this study, participants were provided with StableDiffusion 3 Medium [34] for the user study. This model is an advanced generative model reported to achieve comparable or superior performance based on human feedback evaluation results, and this is not only with open-source generative models like SDXL and SDXL Turbo but also with closed-source systems such as Dall-E and Midjourney. However, this generative model requires a specialized environment, such as an integrated development environment (IDE) or ComfyUI* (https://github.com/comfyanonymous/ComfyUI, (accessed on 10 October 2024)), for generation (inference) and does not support continuous generation within a single UI for both text-to-image and image-to-image generation. Therefore, to create a more practical experimental environment, we developed a CUI format that allows for both text-to-image and image-to-image generation within a single UI. Inspired by additional features in Dall-E, we also enabled participants to reference not only the previous image but also earlier images for image-to-image generation. Additionally, we developed a web-based system using Gradio* (https://www.gradio.app/, (accessed on 2 October 2024)). We implemented the environment using a machine with an Intel(R) Xeon(R) Silver 4214 Core Processor (Intel, Santa Clara, CA, USA) and an NVIDIA GeForce TITAN RTX (NVIDIA, Santa Clara, CA, USA).

3.4.2. Baseline UI

The baseline UI followed the standard conversation interface that was designed with inspiration from Dall-E’s archive. This setup allowed users to navigate past generative records only through linear exploration. Additionally, we provided a toolbar with a thumbs-up icon to mark records based on their intended purpose and a thumbs-down icon to flag results generated due to AI errors or mistakes, helping participants recall their initial creative intent. The baseline UI can be seen in Figure 2.

3.4.3. Automatic Hierarchy UI

The Automatic Hierarchy UI used the same interface for both approaches as described in Section 3.1. To support hierarchy exploration, keywords for each level were displayed in a navigation bar format at the top of the UI, and the top-left of each gallery in the hierarchy displays a keyword for the layer, facilitating record navigation. Additionally, the input prompt for each result image appeared responsively when the mouse hovers over the image. Both the Baseline UI and the Automatic Hierarchy UI are web-based systems built using Gradio. We implemented them on a machine with an 11th Gen Intel(R) Core™ i7-11700KF @ 3.60 GHz processor, 128 GB of memory, and an NVIDIA GeForce RTX 3090 GPU, operating on Windows 10 Education. An example of the Automatic Hierarchy UI can be seen in Figure 3.

4. User Study

In this study, we conducted a user study to understand the impact of our proposed automatic hierarchy method on users’ ability to navigate past outcomes. The user study had two primary objectives: The first objective was to verify whether the automatic hierarchy method supports the efficient navigation of past conversation records. The second objective was to identify which of the two proposed approaches, vision-based and prompt-based, provides greater benefits to users.
The user study was designed as a within-subject experiment with 12 participants. We asked each participant to generate images and then reviewed past generated results with three different interfaces using the standard CUI (baseline), vision-based hierarchy organization, and prompt-based hierarchy organization. Participants were also asked to create a promotional poster for their affiliated club, with a minimum of 30 iterations in revising and refining the initial image. This task was inspired by a representative study on generative AI supporting users [35]. Additionally, to achieve the experimental objectives, all parameter values used in the automatic hierarchy method experienced by participants were kept consistent across the two approaches. Specifically, the variable values in the proposed automatic hierarchy method include the minimum cluster size and alpha value for clustering in HDBSCAN, as well as the λ value for the MMR keyword extraction algorithm. These were fixed at 2, 0.001, and 0.8, respectively.

4.1. Participants

As mentioned earlier, we recruited 12 participants (7 men and 5 women). The participants consisted of 11 undergraduate students and 1 graduate student, all of whom were involved in club activities, with an average age of 22.5 years. A 7-point Likert scale survey on generative AI usage and literacy [36,37,38] indicated that most participants (11 out of 12 with scores above 4, and the remaining participant scoring 3) reported being able to produce desired outcomes using generative AI and frequently used it for routine tasks. Additionally, 9 out of 12 participants, excluding 3, stated they understood the principles of generative AI, suggesting that the participant group generally actively uses generative AI in their daily work and demonstrates a high level of generative AI literacy.

4.2. Procedure

The study was conducted in person and took approximately 50 min, beginning with an introduction and obtaining participants’ consent. Participants received a 5 min explanation of the generative AI used in the experiment and the research protocol. They were asked to create a promotional poster for their affiliated club and, before generating images, specify five elements that they wanted to include in the final poster. Then, using the provided generative AI, participants generated and modified images for about 20 min, with a minimum of 30 images generated during this process.
After completing the task, participants spent 5 min exploring their generated records using the baseline interface. During this phase, participants were asked to categorize the conversation records according to their creation intent and to label outputs that resulted from errors during the generation process. They were then given 5 min to explore the generated results using the automatic hierarchy methods proposed in this study, with the sequence of the two proposed approaches counterbalanced. Immediately after exploring each approach, participants completed a questionnaire designed to assess the efficiency and accuracy of the organization.
After evaluating each approach, participants completed a final questionnaire for feedback on the overall automatic hierarchy method and demographic analysis. Finally, a 10 min semi-structured interview was conducted to gather insights on their experiences with each of the three navigation methods. Participants received a compensation of KRW 5000 (approximately USD 3.63) for their participation. The whole process of the user study is shown in Figure 4.

4.3. Measures

4.3.1. Questionnaire

For measurement, the survey conducted after each exploration included questions asking participants to rate their experience with the proposed navigation methods on a 7-point Likert scale [39,40]. The survey was divided into two sections based on the objectives of the user study. The first survey aimed to assess whether the automatic hierarchy method effectively supported users, aligning with the first objective. This survey compared the overall user experience between the baseline and automatic hierarchy method, evaluating usability, efficiency, and ease of learning. Subsequently, a survey was conducted to determine the more useful hierarchy approach for our second objective, comparing user experiences between the vision-based and prompt-based approaches by measuring usability, efficiency, and perceived accuracy. The items in each questionnaire, corresponding to each metric in the surveys, are detailed in Table 1.

4.3.2. Interview

In the interview, questions focused on the differences participants perceived between each navigation record method. Participants were also asked whether they observed any differences in the perceived benefits of each approach within the automatic hierarchy method, and if so, to explain the reasons. Additionally, we inquired about the potential use of these methods for adaptive services in generative AI. Further questions explored additional features that might enhance access to records and identified other domains where our hierarchy method could improve access to generative AI records.

5. Results

In this section, we report the statistical results derived from the user tests. These results are divided into two parts: Section 5.1 presents the comparison between the UI with the proposed automatic hierarchy method applied and the baseline UI, while Section 5.2 reports the perceived differences between the vision-based and prompt-based approaches within the proposed automatic hierarchy method. Wilcoxon signed-rank test was conducted to determine statistical significance, with the significance level set at p < 0.05. Survey ratings were analyzed to assess the impact of the automatic hierarchy method on perceived experience, specifically usability, ease of learning, and efficiency. Additionally, the ratings were analyzed to evaluate the effects of the proposed approaches on perceived experience, including perceived accuracy, usability, and efficiency. This non-parametric test was chosen because the data were not normally distributed. Although it is a non-parametric test, we also reported the mean (M) and standard deviation ( S D ) to help understand the trends in the ratings. Regarding the values of Z, p, M, and S D , Z represents the test statistic of the Wilcoxon signed-rank test. In Section 5.1, it indicates the direction and magnitude of the differences between the application and non-application of the automatic hierarchy method. In Section 5.2, it reflects the differences between the two preprocessing approaches: the vision-based and prompt-based approach. The p value determines whether the differences between the two dependent samples in each section are statistically significant. Additionally, the M and S D values provide an overall indication of how participants perceived the two dependent samples in each section, highlighting the trends and consistency of the evaluation scores. And through the results of in-depth interviews, we found the cause of the perceived experience.

5.1. Baseline Versus Automatic Hierarchy Method: Results

5.1.1. Usability

Questions 1 through 2 asked the participants to rate their usability by exploring the records of each method. We will refer to these questions as Q1 through Q2 in the following sections. The participants rated the automatic hierarchy method higher than the baseline for Q2 (Figure 5a). The results for each question are described separately below. For Q1, participants showed no statistically significant difference between the two methods (Z = −1.395, p = 0.16), with baseline (M = 4.50, S D = 1.45) and automatic hierarchy (M = 5.42, S D = 1.50) being relatively similar. However, for Q2, participants showed a statistically significant difference (Z = −2.88, p < 0.01), with automatic hierarchy (M = 5.92, S D = 0.79) being rated higher than the baseline (M = 3.00, S D = 1.60).

5.1.2. Ease of Learning

As shown in Figure 5b, questions 3 to 4 were asked to measure the ease of learning via each method. Q3 asked whether the participants could understand the operating method of each method. The participants showed a statistically significant difference (Z = −2.23, p = 0.03), with automatic hierarchy (M = 5.75, S D = 0.97) being rated higher than the baseline (M = 3.75, S D = 1.77). For Q4, showed no statistically significant difference (Z = −1.04, p = 0.29), with the baseline (M = 5.42, S D = 1.98) and automatic hierarchy (M = 4.92, S D = 1.24) being relatively similar.

5.1.3. Efficiency

As Figure 5c shows, questions 5 through 6 assessed efficiency. A higher score suggests that the method contributes to increased efficiency. For Q5, the question asked whether the method helps lower interaction time for exploring the records. The participants showed a statistically significant difference between the two methods (Z = −3.01, p < 0.01), with automatic hierarchy (M = 6.42, S D = 0.51) being rated higher than the baseline (M = 2.00, S D = 1.71). Q6 asked whether the method helps shorten exploration times. The participants showed a statistically significant difference between the two methods (Z = −3.11, p < 0.01), with automatic hierarchy (M = 6.75, S D = 0.45) being rated higher than the baseline (M = 1.67, S D = 0.89).

5.2. Vision-Based Versus Prompt-Based Approach: Results

5.2.1. Perceived Accuracy

Questions 1 through 2 asked the participants to rate their perceived accuracy via each approach of the automatic hierarchy method. As shown in Figure 6a, for Q1, participants showed no statistically significant difference between the two approaches (Z = −1.27, p = 0.20), with the vision-based (M = 4.33, S D = 1.83) and prompt-based (M = 5.08, S D = 1.24) approaches being relatively similar. For Q4, participants showed no statistically significant difference between the two approaches (Z = −0.63, p = 0.53), with the vision-based (M = 4.33, S D = 2.06) and prompt-based (M = 4.83, S D = 1.40) approaches being relatively similar.

5.2.2. Usability

As Figure 6b shows, questions 3 to 4 asked the participants to rate the overall performance of the approaches. For Q3, participants showed no statistically significant difference between the two approaches (Z = −0.51, p = 0.61), with vision-based (M = 5.50, S D = 1.31) and prompt-based (M = 5.67, S D = 1.30) approaches being relatively similar. Also, for Q4, participants showed no statistically significant difference between the two approaches (Z = −1.14, p = 0.25), with vision-based (M = 4.83, S D = 1.85) and prompt-based (M = 5.58, S D = 0.90) approaches being relatively similar.

5.2.3. Efficiency

As shown in Figure 6c, questions 5 through 6 asked about assessed efficiency. For Q5, participants showed no statistically significant difference between the two approaches (Z = −1.08, p = 0.28), with vision-based (M = 4.75, S D = 1.54) and prompt-based (M = 5.25, S D = 0.87) approaches being relatively similar. For Q6, participants showed no statistically significant difference between the two approaches (Z = −0.72, p = 0.47), with vision-based (M = 5.17, S D = 1.64) and prompt-based (M = 5.41, S D = 1.24) approaches being relatively similar.

6. Discussion

6.1. Baseline vs. Automatic Hierarchy Method

The automatic hierarchy method proposed in this study has shown a positive trend in enhancing the efficiency of searching creation records, as well as in usability and ease of learning compared to the baseline. Our findings confirm that the automatic hierarchy can reduce the number of interactions and shorten the time required for searches, thus increasing efficiency for users. Preference for future use also showed higher favorability compared to the baseline, and participants found it easy to use. Interestingly, although participants mentioned that becoming accustomed to the automatic hierarchy method requires more learning and repetition than the baseline, they could intuitively understand the information-searching method of automatic hierarchies. We believe that familiar components like the navigation bar provided at the top of the automatic hierarchy method, a common feature in traditional GUIs, contributed to its intuitive usability.

6.2. Vision-Based vs. Prompt-Based Approach

In the comparison between the two approaches of the automatic hierarchy method, the prompt-based approach was generally perceived to have higher accuracy, usability, and efficiency, although no statistically significant difference was found. Interviews revealed the reasons behind these results. Participants often generated new initial images in the process of creating and modifying images, which could lead to variations in the resulting images despite using similar prompts for the same required tasks. This tends to classify such images into the same category in the prompt-based approach, whereas in the vision-based approach, they might be classified into different categories. Therefore, depending on the participants’ work patterns, the perceived accuracy, usability, and efficiency favored different approaches. Additionally, most participants (9 out of 12) expressed a preference for having both approaches available with the option for users to choose between them. Therefore, we believe that both approaches have substantial potential to enhance accessibility in history record exploration. For users in specific fields that involve repeated detail modifications to the same image (e.g., design or art), the prompt-based approach may be more suitable, whereas for users engaged in everyday tasks, the vision-based approach could be more appropriate.

6.3. Limitations

This study has several limitations that future research should address. Firstly, the participant pool for testing the proposed automatic hierarchy method in this study is limited. Future studies could include a more diverse group of participants, reflecting various demographic factors such as age, occupation, and the primary usage purposes of generative AI, with a larger number of participants. Another limitation is that the HDBSCAN algorithm used for the automatic hierarchy may not be the optimal method. This study serves as an initial investigation into the effects of the automatic hierarchy method on users intending to explore records, and it did not benchmark various algorithms such as t-SNE [41] or PCA [42]. Addressing these limitations in future studies is recommended.

7. Conclusions

This study proposed an automatic hierarchy method to enhance the efficiency of exploring result records in generative AI. This method, implemented through vision-based and prompt-based approaches, organizes the records of text-to-image, i.e., multi-modality AI, into a hierarchy. We validated the impact of this method and its approaches on users navigating result records through a user study. The findings confirmed that the proposed method not only enhances the efficiency of searching generation result records but also maintains or improves usability and ease of learning compared to traditional conversational user interfaces (CUIs). Additionally, the two approaches to hierarchy significantly affected the perceived accuracy, usability, and efficiency depending on users’ workflow patterns. Therefore, this research is significant as it develops and proposes an automatic hierarchy method that improves the linear search typical in existing generative AI CUIs and serves as an initial study that defines and addresses the need for enhanced access to historical records, paving the way for further research in making generative AI an efficient tool.

Author Contributions

Conceptualization, H.-J.K. and S.-H.K.; methodology, H.-J.K. and J.-S.P.; software, H.-J.K. and J.-S.P.; validation, H.-J.K., J.-S.P. and Y.-M.C.; formal analysis, H.-J.K.; investigation, H.-J.K. and Y.-M.C.; resources, H.-J.K.; data curation, H.-J.K.; writing—original draft preparation, H.-J.K. and Y.-M.C.; writing—review and editing, H.-J.K. and S.-H.K.; visualization, H.-J.K.; supervision, S.-H.K.; project administration, S.-H.K.; funding acquisition, S.-H.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partly supported by Innovative Human Resource Development for Local Intellectualization program through the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (IITP-2025-RS-2020-II201791, 50%) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2022-NR072999, 50%).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ramdurai, B.; Adhithya, P. The impact, advancements and applications of generative AI. Int. J. Comput. Sci. Eng. 2023, 10, 1–8. [Google Scholar] [CrossRef]
  2. Caldeira, W.G.; Simões, J.M. Disrupting the Conventional: The Impact of Generative AI Models on Creativity in Visual Communications. E-Rev. Estud. Intercult. 2024. [Google Scholar] [CrossRef]
  3. Ghosh, A. AI Assisted Visual Communication Through Generative Models. Ph.D. Thesis, University of Oxford, Oxford, UK, 2021. [Google Scholar]
  4. Cai, A.; Rick, S.R.; Heyman, J.L.; Zhang, Y.; Filipowicz, A.; Hong, M.; Klenk, M.; Malone, T. DesignAID: Using generative AI and semantic diversity for design inspiration. In Proceedings of the ACM Collective Intelligence Conference, Delft, The Netherlands, 6–9 November 2023; pp. 1–11. [Google Scholar]
  5. Buzzaccarini, G.; Degliuomini, R.S.; Borin, M.; Fidanza, A.; Salmeri, N.; Schiraldi, L.; Di Summa, P.G.; Vercesi, F.; Vanni, V.S.; Candiani, M.; et al. The promise and pitfalls of AI-generated anatomical images: Evaluating midjourney for aesthetic surgery applications. Aesthetic Plast. Surg. 2024, 48, 1874–1883. [Google Scholar] [CrossRef] [PubMed]
  6. Temsah, M.H.; Alhuzaimi, A.N.; Almansour, M.; Aljamaan, F.; Alhasan, K.; Batarfi, M.A.; Altamimi, I.; Alharbi, A.; Alsuhaibani, A.A.; Alwakeel, L.; et al. Art or artifact: Evaluating the accuracy, appeal, and educational value of AI-generated imagery in DALL· E 3 for illustrating congenital heart diseases. J. Med Syst. 2024, 48, 54. [Google Scholar] [CrossRef]
  7. Liu, R.; bin Ismail, A.I. Application and Challenges of Generative AI in Architectural Design: A Case Study of GPT-4. Evol. Stud. Imaginative Cult. 2024, 8.2, 991–999. [Google Scholar]
  8. Hutson, J.; Lively, J.; Robertson, B.; Cotroneo, P.; Lang, M. Expanding Horizons: AI Tools and Workflows in Art Practice. In Creative Convergence: The AI Renaissance in Art and Design; Springer: Cham, Switzerland, 2023; pp. 101–132. [Google Scholar]
  9. Wojtkiewicz, K. How Do You Solve a Problem like DALL-E 2? J. Aesthet. Art Crit. 2023, 81, 454–467. [Google Scholar] [CrossRef]
  10. Jin, Y.; Yoon, J.; Self, J.; Lee, K. Understanding Fashion Designers’ Behavior Using Generative AI for Early-Stage Concept Ideation and Revision. Arch. Des. Res. 2024, 37, 25–45. [Google Scholar] [CrossRef]
  11. Mim, N.J.; Nandi, D.; Khan, S.S.; Dey, A.; Ahmed, S.I. In-Between Visuals and Visible: The Impacts of Text-to-Image Generative AI Tools on Digital Image-making Practices in the Global South. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24), Honolulu, HI, USA, 11–16 May 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
  12. Cheng, S.H. Impact of Generative Artificial Intelligence on Footwear Design Concept and Ideation. Eng. Proc. 2023, 55, 32. [Google Scholar] [CrossRef]
  13. Patel, R. A Deep Dive Into the Advantages and Disadvantages of OpenAI’s Dall-E Model. 2024. Available online: https://www.spaceo.ai/blog/advantages-and-disadvantages-of-using-openai-dalle-model/ (accessed on 22 October 2024).
  14. Wray, R.; Yeh, R. DALL-E, Midjourney, Stable Diffusion: A Nuclear Medicine How-To for Commercial AI Text-To-Image Generation Tools. J. Nucl. Med. 2023, 64, P1463. [Google Scholar]
  15. Braguez, J. AI as a Creative Partner: Enhancing Artistic Creation and Acceptance. In Proceedings of the Barcelona Conference on Arts, Media & Culture 2023, Barcelona, Spain, 19–23 September 2023. [Google Scholar]
  16. Wiemann, J.M.; Knapp, M.L. Turn-taking in conversations. In Communication Theory; Routledge: London, UK, 2017; pp. 226–245. [Google Scholar]
  17. Bergner, A.S.; Hildebrand, C.; Häubl, G. Machine Talk: How Verbal Embodiment in Conversational AI Shapes Consumer–Brand Relationships. J. Consum. Res. 2023, 50, 742–764. [Google Scholar] [CrossRef]
  18. Bibauw, S.; François, T.; Desmet, P. Dialogue systems for language learning: Chatbots and beyond. In The Routledge Handbook of Second Language Acquisition and Technology; Routledge: London, UK, 2022; pp. 121–135. [Google Scholar]
  19. Shawar, B.A.; Atwell, E. Chatbots: Are they really useful? J. Lang. Technol. Comput. Linguist. 2007, 22, 29–49. [Google Scholar] [CrossRef]
  20. Chin, J.; Lee, S.; Park, C.; Yeoun, M. Proposal of User Interface Based on Heavy User Usage. Arch. Des. Res. 2024, 37, 287–313. [Google Scholar]
  21. Karren, K.; Schmitz, M.; Schaffer, S. Improving Conversational User Interfaces for Citizen Complaint Management through enhanced Contextual Feedback. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (CUI ’24), Luxembourg, 8–10 July 2024; Association for Computing Machinery: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
  22. Dhami, S.; Brisco, R. A comparison of artificial intelligence image generation tools in product design. In Proceedings of the DS 131: Proceedings of the International Conference on Engineering and Product Design Education (E&PDE 2024), Birmingham, UK, 5–6 September 2024; pp. 13–18. [Google Scholar]
  23. Klopfenstein, L.C.; Delpriori, S.; Malatini, S.; Bogliolo, A. The Rise of Bots: A Survey of Conversational Interfaces, Patterns, and Paradigms. In Proceedings of the 2017 Conference on Designing Interactive Systems (DIS ’17), Edinburgh, UK, 10–14 June 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 555–565. [Google Scholar] [CrossRef]
  24. Planas, E.; Daniel, G.; Brambilla, M.; Cabot, J. Towards a model-driven approach for multiexperience AI-based user interfaces. Softw. Syst. Model. 2021, 20, 997–1009. [Google Scholar] [CrossRef]
  25. Caron, M.; Touvron, H.; Misra, I.; Jégou, H.; Mairal, J.; Bojanowski, P.; Joulin, A. Emerging Properties in Self-Supervised Vision Transformers. arXiv 2021, arXiv:2104.14294. [Google Scholar]
  26. Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. MPNet: Masked and Permuted Pre-training for Language Understanding. arXiv 2020, arXiv:2004.09297. [Google Scholar]
  27. Fase, H. sentence-transformers/all-mpnet-base-v2. 2021. Available online: https://huggingface.co/sentence-transformers/all-mpnet-base-v2 (accessed on 21 October 2024).
  28. Malzer, C.; Baum, M. A Hybrid Approach To Hierarchical Density-based Cluster Selection. In Proceedings of the 2020 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), Karlsruhe, Germany, 14–16 September 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar] [CrossRef]
  29. Arthur, D.; Vassilvitskii, S. How slow is the k-means method? In Proceedings of the Twenty-Second Annual Symposium on Computational Geometry, Sedona, AZ, USA, 5–7 June 2006; pp. 144–153. [Google Scholar]
  30. Eldridge, J.; Belkin, M.; Wang, Y. Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the Conference on Learning Theory, PMLR, Paris, France, 3–6 July 2015; pp. 588–606. [Google Scholar]
  31. Ester, M.; Kriegel, H.P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), Portland, OR, USA, 2–4 August 1996; Volume 96, pp. 226–231. [Google Scholar]
  32. Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
  33. Bennani-Smires, K.; Musat, C.; Hossmann, A.; Baeriswyl, M.; Jaggi, M. Simple unsupervised keyphrase extraction using sentence embeddings. arXiv 2018, arXiv:1801.04470. [Google Scholar]
  34. Esser, P.; Kulal, S.; Blattmann, A.; Entezari, R.; Müller, J.; Saini, H.; Levi, Y.; Lorenz, D.; Sauer, A.; Boesel, F.; et al. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. arXiv 2024, arXiv:2403.03206. [Google Scholar]
  35. Jiang, J. When generative artificial intelligence meets multimodal composition: Rethinking the composition process through an AI-assisted design project. Comput. Compos. 2024, 74, 102883. [Google Scholar] [CrossRef]
  36. Annapureddy, R.; Fornaroli, A.; Gatica-Perez, D. Generative AI literacy: Twelve defining competencies. Digit. Gov. Res. Pract. 2024. [Google Scholar] [CrossRef]
  37. Kazanidis, I.; Pellas, N. Harnessing Generative Artificial Intelligence for Digital Literacy Innovation: A Comparative Study between Early Childhood Education and Computer Science Undergraduates. AI 2024, 5, 1427–1445. [Google Scholar] [CrossRef]
  38. Amoozadeh, M.; Daniels, D.; Nam, D.; Kumar, A.; Chen, S.; Hilton, M.; Srinivasa Ragavan, S.; Alipour, M.A. Trust in Generative AI among students: An exploratory study. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, Portland, OR, USA, 20–23 March 2024; pp. 67–73. [Google Scholar]
  39. Mussa, O.; Rana, O.; Goossens, B.; Orozco Ter wengel, P.; Perera, C. ForestQB: Enhancing Linked Data Exploration through Graphical and Conversational UIs Integration. ACM J. Comput. Sustain. Soc. 2024, 2, 32. [Google Scholar] [CrossRef]
  40. Schrepp, M. User experience questionnaire handbook. In All You Need to Know to Apply the UEQ Successfully in Your Project; UEQ: Weyhe, Germany, 2015; pp. 50–52. [Google Scholar]
  41. Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  42. Maćkiewicz, A.; Ratajczak, W. Principal components analysis (PCA). Comput. Geosci. 1993, 19, 303–342. [Google Scholar] [CrossRef]
Figure 1. The operational process of the proposed automatic hierarchy method for the hierarchical exploration of CUIs in this study: (A) The linearity of existing CUIs over time and an example of a unit-level text-to-image operation; (B) explanation of two proposed feature extraction methods for each unit during the generation process for automatic hierarchy; (C) an example of the hierarchical organization of entire generation records based on the extracted features; (D) extraction of hierarchical keywords to explain the results of the hierarchy; (E) an example of the UI for exploring generation records proposed in this study to overcome the limitations of linear exploration in existing CUIs; dashed boxes represent the units in the generation process.
Figure 1. The operational process of the proposed automatic hierarchy method for the hierarchical exploration of CUIs in this study: (A) The linearity of existing CUIs over time and an example of a unit-level text-to-image operation; (B) explanation of two proposed feature extraction methods for each unit during the generation process for automatic hierarchy; (C) an example of the hierarchical organization of entire generation records based on the extracted features; (D) extraction of hierarchical keywords to explain the results of the hierarchy; (E) an example of the UI for exploring generation records proposed in this study to overcome the limitations of linear exploration in existing CUIs; dashed boxes represent the units in the generation process.
Applsci 15 01119 g001
Figure 2. Example of baseline (standard conversation records UI) used in user study. Red lines and text correspond to UI element descriptions.
Figure 2. Example of baseline (standard conversation records UI) used in user study. Red lines and text correspond to UI element descriptions.
Applsci 15 01119 g002
Figure 3. A hierarchical exploration support UI for image generation records using the proposed automatic hierarchy method: This UI represents the hierarchy resulting from the application of the automatic hierarchy method to generation records and provides a navigation bar at the top to move between the hierarchies. Additionally, the input prompts for the resulting images are displayed when hovering over the images of interest. Red lines and text correspond to UI element descriptions.
Figure 3. A hierarchical exploration support UI for image generation records using the proposed automatic hierarchy method: This UI represents the hierarchy resulting from the application of the automatic hierarchy method to generation records and provides a navigation bar at the top to move between the hierarchies. Additionally, the input prompts for the resulting images are displayed when hovering over the images of interest. Red lines and text correspond to UI element descriptions.
Applsci 15 01119 g003
Figure 4. Overall process of the user study.
Figure 4. Overall process of the user study.
Applsci 15 01119 g004
Figure 5. Questionnaire results from the comparison of perceived user experience between baseline and automatic hierarchy methods. The significance levels are as follows: * p < 0.05; ** p < 0.01.
Figure 5. Questionnaire results from the comparison of perceived user experience between baseline and automatic hierarchy methods. The significance levels are as follows: * p < 0.05; ** p < 0.01.
Applsci 15 01119 g005
Figure 6. Questionnaire results from comparing perceived user experience between vision-based and prompt-based approaches. The significance levels are as follows: * p < 0.05; ** p < 0.01.
Figure 6. Questionnaire results from comparing perceived user experience between vision-based and prompt-based approaches. The significance levels are as follows: * p < 0.05; ** p < 0.01.
Applsci 15 01119 g006
Table 1. Questionnaire items by comparison targets in the user study.
Table 1. Questionnaire items by comparison targets in the user study.
Comparison TargetMetricItem
Baseline and
Automatic hierarchy method
UsabilityI found this tool easy to use.
I am likely to use this tool frequently.
Ease of
Learning
I found this tool intuitive to understand.
I could use this tool without any special instructions.
EfficiencyI experienced fewer interactions when completing my search with this tool.
I believe this tool will help save time in finding desired information from records.
Vision-based and
Prompt-based approach
Perceived
Accuracy
I believe this tool correctly categorized the data into appropriate groups.
believe the hierarchy results generated by this tool match my expectations.
UsabilityI am satisfied with the overall performance of this tool.
I would be willing to recommend this tool to others.
EfficiencyI believe this tool will help save time in finding desired information from records.
I found the hierarchy organization natural and convenient while using this method.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, H.-J.; Park, J.-S.; Choi, Y.-M.; Kim, S.-H. An Automated Hierarchy Method to Improve History Record Accessibility in Text-to-Image Generative AI. Appl. Sci. 2025, 15, 1119. https://doi.org/10.3390/app15031119

AMA Style

Kim H-J, Park J-S, Choi Y-M, Kim S-H. An Automated Hierarchy Method to Improve History Record Accessibility in Text-to-Image Generative AI. Applied Sciences. 2025; 15(3):1119. https://doi.org/10.3390/app15031119

Chicago/Turabian Style

Kim, Hui-Jun, Jae-Seong Park, Young-Mi Choi, and Sung-Hee Kim. 2025. "An Automated Hierarchy Method to Improve History Record Accessibility in Text-to-Image Generative AI" Applied Sciences 15, no. 3: 1119. https://doi.org/10.3390/app15031119

APA Style

Kim, H.-J., Park, J.-S., Choi, Y.-M., & Kim, S.-H. (2025). An Automated Hierarchy Method to Improve History Record Accessibility in Text-to-Image Generative AI. Applied Sciences, 15(3), 1119. https://doi.org/10.3390/app15031119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop