research-article

Towards Feature-Based Analysis of the Machine Learning Development Lifecycle

Authors:

Boyue Caroline Hu,

Marsha ChechikAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 2087 - 2091

https://doi.org/10.1145/3611643.3613082

Published: 30 November 2023 Publication History

Abstract

The safety and trustworthiness of systems with components that are based on Machine Learning (ML) require an in-depth understanding and analysis of all stages in its Development Lifecycle (MLDL). High-level abstractions of desired functionalities, model behaviour, and data are called features, and they have been studied by different communities across all MLDL stages. In this paper, we propose to support Software Engineering analysis of the MLDL through features, calling it feature-based analysis of the MLDL. First, to achieve a shared understanding of features among different experts, we establish a taxonomy of existing feature definitions currently used in various MLDL stages. Through this taxonomy, we map features from different stages to each other, discover gaps and future research directions and identify areas of collaboration between Software Engineering and other MLDL experts.

References

[1]

Hervé Abdi and Lynne J Williams. 2010. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2, 4 (2010), 433–459. https://doi.org/10.1002/wics.101

Digital Library

[2]

Khlood Ahmad, Mohamed Abdelrazek, Chetan Arora, Muneera Bano, and John Grundy. 2023. Requirements engineering for artificial intelligence systems: A systematic mapping study. Information and Software Technology, 158 (2023), 107176. issn:0950-5849 https://doi.org/10.1016/j.infsof.2023.107176

Digital Library

[3]

Zeyuan Allen-Zhu and Yuanzhi Li. 2021. Feature Purification: How Adversarial Training Performs Robust Deep Learning. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022. IEEE, 977–988. https://doi.org/10.1109/FOCS52979.2021.00098

[4]

Rob Ashmore, Radu Calinescu, and Colin Paterson. 2021. Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges. ACM Comput. Surv., 54, 5 (2021), Article 111, 39 pages. issn:0360-0300 https://doi.org/10.1145/3453444

Digital Library

[5]

Markus Borg, Jens Henriksson, Kasper Socha, Olof Lennartsson, Elias Sonnsjö Lönegren, Thanh Bui, Piotr Tomaszewski, Sankar Raman Sathyamoorthy, Sebastian Brink, and Mahshid Helali Moghadam. 2022. Ergo, SMIRK is Safe: A Safety Case for a Machine Learning Component in a Pedestrian Automatic Emergency Brake System. arxiv:2204.07874.

[6]

Jonathan Crabbé and Mihaela van der Schaar. 2022. Concept Activation Regions: A Generalized Framework For Concept-Based Explanations. arxiv:2209.11222.

[7]

M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88, 2 (2010), June, 303–338. https://doi.org/10.1007/s11263-009-0275-4

Digital Library

[8]

Divya Gopinath, Luca Lungeanu, Ravi Mangal, Corina S. Pasareanu, Siqi Xie, and Huanfeng Yu. 2023. Feature-Guided Analysis of Neural Networks. FASE, 42, 2 (2023), 245–284. https://doi.org/10.1007/978-3-031-30826-0_7

Digital Library

[9]

Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, and Jianfei Cai. 2018. Recent advances in convolutional neural networks. Pattern recognition, 77 (2018), 354–377. https://doi.org/10.1016/j.patcog.2017.10.013

Digital Library

[10]

Geoffrey Hinton. 2014. Where do features come from? Cognitive science, 38, 6 (2014), 1078–1101. https://doi.org/10.1111/cogs.12049

[11]

Boyue Caroline Hu, Lina Marsso, Krzysztof Czarnecki, Rick Salay, Huakun Shen, and Marsha Chechik. 2022. If a Human Can See It, So Should Your System: Reliability Requirements for Machine Vision Components. In Proceedings of the 44th International Conference on Software Engineering (ICSE’2022), Pittsburgh, USA. ACM. https://doi.org/10.1145/3510003.3510109

Digital Library

[12]

Didac Gil De La Iglesia and Danny Weyns. 2015. MAPE-K Formal Templates to Rigorously Design Behaviors for Self-Adaptive Systems. ACM Trans. Auton. Adapt. Syst., 10, 3 (2015), Article 15, sep, 31 pages. issn:1556-4665 https://doi.org/10.1145/2724719

Digital Library

[13]

Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. 2019. Adversarial Examples Are Not Bugs, They Are Features. In Advances in Neural Information Processing Systems (NeurIPS). 32, https://proceedings.neurips.cc/paper/2019/hash/e2c420d928d4bf8ce0ff2ec19b371514-Abstract.html

[14]

Ioannis Kansizoglou, Loukas Bampis, and Antonios Gasteratos. 2022. Deep Feature Space: A Geometrical Perspective. IEEE Trans. Pattern Anal. Mach. Intell., 44, 10 (2022), 6823–6838. https://doi.org/10.1109/TPAMI.2021.3094625

Digital Library

[15]

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie J. Cai, James Wexler, Fernanda B. Viégas, and Rory Sayres. 2018. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Jennifer G. Dy and Andreas Krause (Eds.). 80, PMLR, 2673–2682. http://proceedings.mlr.press/v80/kim18d.html

[16]

D.G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the Seventh IEEE International Conference on Computer Vision. 2, 1150–1157 vol.2. https://doi.org/10.1109/ICCV.1999.790410

[17]

Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Marc Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Based Systems: A Survey. ACM Trans. Softw. Eng. Methodol., 31, 2 (2022), Article 37e, apr, 59 pages. issn:1049-331X https://doi.org/10.1145/3487043

Digital Library

[18]

Michael Maurer, Ivan Breskovic, Vincent C. Emeakaroha, and Ivona Brandic. 2011. Revealing the MAPE loop for the autonomic management of Cloud infrastructures. In 2011 IEEE Symposium on Computers and Communications (ISCC). 147–152. https://doi.org/10.1109/ISCC.2011.5984008

Digital Library

[19]

Andrei Paleyes, Raoul-Gabriel Urma, and Neil D. Lawrence. 2022. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput. Surv., 55, 6 (2022), Article 114, 29 pages. issn:0360-0300 https://doi.org/10.1145/3533378

Digital Library

[20]

Samira Pouyanfar, Saad Sadiq, Yilin Yan, Haiman Tian, Yudong Tao, Maria Presa Reyes, Mei-Ling Shyu, Shu-Ching Chen, and S. S. Iyengar. 2018. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv., 51, 5 (2018), Article 92, sep, 36 pages. issn:0360-0300 https://doi.org/10.1145/3234150

Digital Library

[21]

Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 1135–1144. https://doi.org/10.1145/2939672.2939778

Digital Library

[22]

Yuji Roh, Geon Heo, and Steven Euijong Whang. 2021. A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective. IEEE Transactions on Knowledge and Data Engineering, 33, 4 (2021), 1328–1347. https://doi.org/10.1109/TKDE.2019.2946162

[23]

Timo Speith. 2022. A Review of Taxonomies of Explainable Artificial Intelligence (XAI) Methods. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22). Association for Computing Machinery, New York, NY, USA. 2239–2250. isbn:9781450393522 https://doi.org/10.1145/3531146.3534639

Digital Library

[24]

Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, and Xiangyu Zhang. 2021. To what extent do DNN-based image classification models make unreliable inferences? Empir. Softw. Eng., 26, 4 (2021), 84. https://doi.org/10.1007/s10664-021-09985-1

Digital Library

[25]

Wei Wang, Yan Huang, Yizhou Wang, and Liang Wang. 2014. Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR Workshops 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, 496–503. https://doi.org/10.1109/CVPRW.2014.79

Digital Library

[26]

Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine Learning Testing: Survey, Landscapes and Horizons. IEEE Transactions on Software Engineering, 48, 1 (2022), 1–36. https://doi.org/10.1109/TSE.2019.2962027

Digital Library

[27]

Tahereh Zohdinasab, Vincenzo Riccio, Alessio Gambi, and Paolo Tonella. 2022. Efficient and Effective Feature Space Exploration for Testing Deep Learning Systems. ACM Trans. Softw. Eng. Methodol., issn:1049-331X https://doi.org/10.1145/3544792

Digital Library

Cited By

Torbarina LFerkovic TRoguski LMihelcic VSarlija BKraljevic Z(2024)Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Position PaperNatural Language Processing Journal10.1016/j.nlp.2024.1000767(100076)Online publication date: Jun-2024
https://doi.org/10.1016/j.nlp.2024.100076

Index Terms

Towards Feature-Based Analysis of the Machine Learning Development Lifecycle
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software verification and validation

Recommendations

Machine and deep learning amalgamation for feature extraction in Industrial Internet-of-Things
Abstract
In this paper, we develop a feature extraction model using the amalgamation of machine and deep learning techniques for Industrial Internet of Thing (IIoT). We train the model with the most effective feature set evaluated using machine ...
Graphical abstract

Display Omitted
Highlights
- We address the multi-dimensional problem of the IIoT features to make an adequate feature extraction method.
Machine learning-based lung and colon cancer detection using deep feature extraction and ensemble learning
Abstract
Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemical abnormalities. Lung and colon cancer have emerged as two of the leading causes of death and disability in humans. The histopathological detection ...
Highlights
- A hybrid ensemble model to efficiently identify lung and colon cancer is introduced.
- Anticipated deep feature extraction to extract features from cancer datasets.
- An ensemble strategy is evolved to build a robust detection model.
Features in Identification Approaches for MicroRNA Precursors Based on Machine Learning
ISDEA '14: Proceedings of the 2014 Fifth International Conference on Intelligent Systems Design and Engineering Applications

MicroRNAs (miRNAs) are a group of non-coding small RNA of ~ 22 nucleotides in length. They play important roles in gene regulation in animals and plants. The machine learning approach has become an important way to discover miRNAs, which is complement ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2023

2215 pages

ISBN:9798400703270

DOI:10.1145/3611643

General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '23

Sponsor:

SIGSOFT

ESEC/FSE '23: 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

December 3 - 9, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)63
Downloads (Last 6 weeks)8

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Torbarina LFerkovic TRoguski LMihelcic VSarlija BKraljevic Z(2024)Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Position PaperNatural Language Processing Journal10.1016/j.nlp.2024.1000767(100076)Online publication date: Jun-2024
https://doi.org/10.1016/j.nlp.2024.100076

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents