skip to main content
10.1145/3493244.3493252acmotherconferencesArticle/Chapter ViewAbstractPublication PagessbqsConference Proceedingsconference-collections
research-article

LabelUX! Guidelines to support software engineers to design data labeling systems

Published: 14 December 2021 Publication History

Abstract

The demand for systems using artificial intelligence has substantially boosted in recent times, especially with Machine Learning (ML) techniques. Systems that use ML supervision techniques need representative and correctly categorized data to ensure its quality. In this context, a data labeling step plays a fundamental role during the development of such systems. The labeling is performed by users specialized in the data domain and aims to generate a database to enable a supervised ML model. However, labeling is exhausting for users, which can compromise the quality of the ML system, especially if the labeling is being done on systems that were not designed to assist the user in this activity. On the one hand, it can be difficult for a software engineer to design these kinds of systems. Depending on the type of data to be labeled, the interface needs different graphics and strategies to present and request user feedback. Aiming to help software engineers develop these kinds of systems, this work proposes the LabelUX guidelines. These guidelines aim to support software engineers in designing data labeling systems, defining a design with quality that provides a better user experience during the labeling task. We developed these guidelines from studies carried out in the literature and industry. We selected software engineers working on ML projects to participate in a feasibility study to evaluate the use of guidelines. The qualitative results obtained through the interview improved that the LabelUX guidelines supported a better design of textual type data labeling systems.

References

[1]
Xavier Amatriain. 2013. Big & Personal: Data and Models behind Netflix Recommendations. In Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (Chicago, Illinois) (BigMine ’13). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/2501221.2501222
[2]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, Lisbon, 291–300.
[3]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza. 2014. Power to the people: The role of humans in interactive machine learning. Ai Magazine 35, 4 (2014), 105–120.
[4]
Amazon Web Services AWS. 2021. Amazon SageMaker Ground Truth. https://docs.aws.amazon.com/sagemaker/latest/dg/sms-task-types.html Acesso em 03 de maio de 2021.
[5]
Minsuk Choi, Cheonbok Park, Soyoung Yang, Yonggyu Kim, Jaegul Choo, and Sungsoo Ray Hong. 2019. Aila: Attentive interactive labeling assistant for document classification through attention-based deep neural networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
[6]
João Lucas Correia, Juliana Alves Pereira, Rafael Mello, Alessandro Garcia, Baldoino Fonseca, Márcio Ribeiro, Rohit Gheyi, Marcos Kalinowski, Renato Cerqueira, and Willy Tiengo. 2020. Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development. In 19th Brazilian Symposium on Software Quality. 1–10.
[7]
Warteruzannan Soyer Cunha, Guisella Angulo Armijo, and Valter Vieira de Camargo. 2020. InSet: A Tool to Identify Architecture Smells Using Machine Learning. In Proceedings of the 34th Brazilian Symposium on Software Engineering. 760–765.
[8]
John J Dudley and Per Ola Kristensson. 2018. A review of user interface design for interactive machine learning. ACM Transactions on Interactive Intelligent Systems (TiiS) 8, 2(2018), 1–37.
[9]
Cristian Felix, Aritra Dasgupta, and Enrico Bertini. 2018. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 153–164.
[10]
Benedikt Grimmeisen and Andreas Theissler. 2020. The machine learning model as a guide: pointing users to interesting instances for labeling through visual cues. In Proceedings of the 13th International Symposium on Visual Information Communication and Interaction. 1–8.
[11]
Ramya Hebbalaguppe, Kevin McGuinness, Jogile Kuklyte, Graham Healy, Noel O’Connor, and Alan Smeaton. 2013. How interaction methods affect image segmentation: user experience in the task. In 2013 1st IEEE Workshop on User-Centered Computer Vision (UCCV). IEEE, 19–24.
[12]
Ray Hyman. 1982. Quasi-experimentation: Design and analysis issues for field settings (book). Journal of Personality Assessment 46, 1 (1982), 96–97.
[13]
Been Kim, Kayur Patel, Afshin Rostamizadeh, and Julie Shah. 2015. Scalable and interpretable data representation for high-dimensional, complex data. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 29.
[14]
Microsoft. 2021. Universal Human Relevance System (UHRS). https://prod.uhrs.playmsn.com/uhrs/. Acesso em 03 de maio de 2021.
[15]
Mario Nadj, Merlin Knaeble, Maximilian Xiling Li, and Alexander Maedche. 2020. Power to the Oracle? Design Principles for Interactive Labeling Systems in Machine Learning. KI-Künstliche Intelligenz(2020), 1–12.
[16]
Michael Nalisnik, David A Gutman, Jun Kong, and Lee A D Cooper. 2015. An interactive learning framework for scalable classification of pathology images. In 2015 IEEE International Conference on Big Data (Big Data). 928–935. https://doi.org/10.1109/BigData.2015.7363841
[17]
Elizamary Nascimento, Iftekhar Ahmed, Edson Oliveira, Márcio Piedade Palheta, Igor Steinmacher, and Tayana Conte. 2019. Understanding Development Process of Machine Learning Systems: Challenges and Solutions. In 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 1–6.
[18]
Jakob Nielsen. 1994. Usability engineering. Morgan Kaufmann.
[19]
Ipek Ozkaya. 2020. What Is Really Different in Engineering AI-Enabled Systems?IEEE Software 37, 4 (2020), 3–6.
[20]
Leticia Passos, Lucas Viana, Edson Oliveira, and Tayana Conte. 2021. Rotule-me! Uma experiência de engenharia de requisitos para um sistema de rotulagem. 24th Workshop on Requirements Engineering. To appear.
[21]
Forough Poursabzi-Sangdeh, Jordan Boyd-Graber, Leah Findlater, and Kevin Seppi. 2016. Alto: Active learning with topic overviews for speeding label induction and document labeling. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1158–1169.
[22]
Prodigy. 2021. Prodigy: Annotation Tool for AI, Machine Learning & NLP. https://prodi.gy Acesso em 03 de maio de 2021.
[23]
Junfei Qiu, Qihui Wu, Guoru Ding, Yuhua Xu, and Shuo Feng. 2016. A survey of machine learning for big data processing. EURASIP Journal on Advances in Signal Processing 2016, 1(2016), 67.
[24]
Maxim Tkachenko, Mikhail Malyuk, Nikita Shevchenko, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2021. Label Studio: Data labeling software. https://labelstud.io/templates/. Open source software available from https://github.com/heartexlabs/label-studio.
[25]
Lucas Viana, Edson Oliveira, and Tayana Conte. 2021. An Interface Design Catalog for Interactive Labeling Systems. In Proceedings of the 23rd International Conference on Enterprise Information Systems (ICEIS 2021) - Volume 2. 483–494.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 December 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data labeling
  2. human in the loop
  3. interactive machine learning
  4. software design

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SBQS '21
SBQS '21: XX Brazilian Symposium on Software Quality
November 8 - 11, 2021
Virtual Event, Brazil

Acceptance Rates

Overall Acceptance Rate 35 of 99 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media