research-article

Open access

A retrospective study of one decade of artifact evaluations

Authors:

Christopher S. Timperley,

Michael Hilton,

Dirk BeyerAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 145 - 156

https://doi.org/10.1145/3540250.3549172

Published: 09 November 2022 Publication History

Abstract

Most software engineering research involves the development of a prototype, a proof of concept, or a measurement apparatus. Together with the data collected in the research process, they are collectively referred to as research artifacts and are subject to artifact evaluation (AE) at scientific conferences. Since its initiation in the SE community at ESEC/FSE 2011, both the goals and the process of AE have evolved and today expectations towards AE are strongly linked with reproducible research results and reusable tools that other researchers can build their work on. However, to date little evidence has been provided that artifacts which have passed AE actually live up to these high expectations, i.e., to which degree AE processes contribute to AE's goals and whether the overhead they impose is justified.

We aim to fill this gap by providing an in-depth analysis of research artifacts from a decade of software engineering (SE) and programming languages (PL) conferences, based on which we reflect on the goals and mechanisms of AE in our community. In summary, our analyses (1) suggest that articles with artifacts do not generally have better visibility in the community, (2) provide evidence how evaluated and not evaluated artifacts differ with respect to different quality criteria, and (3) highlight opportunities for further improving AE processes.

References

[1]

Simon Adar, Dirk Beyer, Patricia Cruse, Gustavo Durand, Wayne Graves, Christopher Heid, Lundon Holmes, Chuck Koscher, Meredith Morovatis, Joshua Pyle, Bernard Rous, Wes Royer, and Dan Valen. 2017. Best Practices on Artifact Integration. Zenodo. https://doi.org/10.5281/zenodo.7296608

[2]

Vaibhav Bajpai, Anna Brunstrom, Anja Feldmann, Wolfgang Kellerer, Aiko Pras, Henning Schulzrinne, Georgios Smaragdakis, Matthias Wählisch, and Klaus Wehrle. 2019. The Dagstuhl Beginners Guide to Reproducibility for Experimental Networking Research. SIGCOMM Comput. Commun. Rev., 49, 1 (2019), feb, 24–30. issn:0146-4833 https://doi.org/10.1145/3314212.3314217

Digital Library

[3]

M. Baker. 2016. 1,500 scientists lift the lid on reproducibility. Nature, 533 (2016), May, 452–454. https://doi.org/10.1038/533452a

[4]

Maria Teresa Baldassarre, Neil A. Ernst, Ben Hermann, Tim Menzies, and Rahul Yedida. 2021. Crowdsourcing the State of the Art(ifacts). CoRR, 2108.06821 (2021), https://doi.org/10.48550/arXiv.2108.06821

[5]

Yoav Benjamini and Yosef Hochberg. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57, 1 (1995), 289–300. issn:00359246 http://www.jstor.org/stable/2346101

[6]

Nicolas Bonneel, David Coeurjolly, Julie Digne, and Nicolas Mellado. 2020. Code Replicability in Computer Graphics. ACM Trans. Graph., 39, 4 (2020), Article 93, jul, 8 pages. issn:0730-0301 https://doi.org/10.1145/3386569.3392413

Digital Library

[7]

Christian S. Collberg and Todd A. Proebsting. 2016. Repeatability in computer systems research. Commun. ACM, 59, 3 (2016), 62–69. https://doi.org/10.1145/2812803

Digital Library

[8]

Eitan Frachtenberg. 2022. Research artifacts and citations in computer systems papers. PeerJ Computer Science, 8 (2022), Feb., e887. https://doi.org/10.7717/peerj-cs.887

[9]

Odd Erik Gundersen, Saeid Shamsaliei, and Richard Juul Isdahl. 2022. Do machine learning platforms provide out-of-the-box reproducibility? Future Generation Computer Systems, 126 (2022), 34–47. issn:0167-739X https://doi.org/10.1016/j.future.2021.06.014

Digital Library

[10]

Ivan Heibi, Silvio Peroni, and David Shotton. 2019. Software review: COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations. Scientometrics, 121, 2 (2019), 11, 1213–1228. issn:1588-2861 https://doi.org/10.1007/s11192-019-03217-6

Digital Library

[11]

Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community Expectations for Research Artifacts and Evaluation Processes. In Proc. ESEC/FSE. ACM, 469–480. isbn:9781450370431 https://doi.org/10.1145/3368089.3409767

Digital Library

[12]

Ben Hermann, Stefan Winter, and Janet Siegmund. 2020. Community Expectations for Research Artifacts and Evaluation Processes (Additional Material). Zenodo. https://doi.org/10.5281/zenodo.3951724

Digital Library

[13]

Robert Heumüller, Sebastian Nielebock, Jacob Krüger, and Frank Ortmeier. 2020. Publish or perish, but do not forget your software artifacts. Empir. Softw. Eng., 25, 6 (2020), 4585–4616. https://doi.org/10.1007/s10664-020-09851-6

Digital Library

[14]

Shriram Krishnamurthi. 2013. Artifact evaluation for software conferences. ACM SIGSOFT Softw. Eng. Notes, 38, 3 (2013), 7–10. https://doi.org/10.1145/2464526.2464530

Digital Library

[15]

Shriram Krishnamurthi and Jan Vitek. 2015. The real software crisis: Repeatability as a core value. Commun. ACM, 58, 3 (2015), 34–36. https://doi.org/10.1145/2658987

Digital Library

[16]

Chao Liu, Cuiyun Gao, Xin Xia, David Lo, John Grundy, and Xiaohu Yang. 2021. On the Reproducibility and Replicability of Deep Learning in Software Engineering. ACM Trans. Softw. Eng. Methodol., 31, 1 (2021), Article 15, oct, 46 pages. issn:1049-331X https://doi.org/10.1145/3477535

Digital Library

[17]

National Academies of Sciences, Engineering, and Medicine. 2019. Reproducibility and Replicability in Science. National Academies Press. isbn:978-0-309-48616-3 https://doi.org/10.17226/25303

[18]

NISO. 2021. Reproducibility Badging and Definitions: A Recommended Practice of the National Information Standards Organization. https://doi.org/10.3789/niso-rp-31-2021

[19]

Martin Shepperd, Nemitari Ajienka, and Steve Counsell. 2018. The role and value of replication in empirical software engineering results. Inf. Softw. Technol., 99 (2018), 120–132. issn:0950-5849 https://doi.org/10.1016/j.infsof.2018.01.006

Digital Library

[20]

Christopher Steven Timperley, Lauren Herckis, Claire Le Goues, and Michael Hilton. 2021. Understanding and Improving Artifact Sharing in Software Engineering Research. Empir. Softw. Eng., 26, 4 (2021), 67. https://doi.org/10.1007/s10664-021-09973-5

Digital Library

[21]

Christopher S. Timperley, Lauren Herckis, Claire Le Goues, and Michael Hilton. 2021. Replication Package for Understanding and Improving Artifact Sharing in Software Engineering Research. https://doi.org/10.5281/zenodo.4737346

[22]

Chat Wacharamanotham, Lukas Eisenring, Steve Haroz, and Florian Echtler. 2020. Transparency of CHI Research Artifacts: Results of a Self-Reported Survey. ACM, 1–14. isbn:9781450367080 https://doi.org/10.1145/3313831.3376448

Digital Library

[23]

Stefan Winter, Christopher S. Timperley, Ben Hermann, Jürgen Cito, Jonathan Bell, Michael Hilton, and Dirk Beyer. 2022. Reproduction Package (Docker Container) for the ESEC/FSE 2022 Article ‘A Retrospective Study of One Decade of Artifact Evaluations’. Zenodo. https://doi.org/10.5281/zenodo.7082407

Digital Library

[24]

Noa Zilberman and Andrew W. Moore. 2020. Thoughts about Artifact Badging. SIGCOMM Comput. Commun. Rev., 50, 2 (2020), may, 60–63. issn:0146-4833 https://doi.org/10.1145/3402413.3402422

Digital Library

Cited By

Lathouwers SLiu YZaytsev V(2025)Extract, model, refine: improved modelling of program verification tools through data enrichmentSoftware and Systems Modeling10.1007/s10270-024-01232-7Online publication date: 8-Jan-2025
https://doi.org/10.1007/s10270-024-01232-7
Beyer DChien PJankola MLee N(2024)A Transferability Study of Interpolation-Based Hardware Model Checking for Software VerificationProceedings of the ACM on Software Engineering10.1145/36607971:FSE(2028-2050)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660797
Abualhaija SAydemir FDalpiaz FDell'Anna DFerrari AFranch XFucci D(2024)Replication in Requirements Engineering: The NLP for RE CaseACM Transactions on Software Engineering and Methodology10.1145/365866933:6(1-33)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3658669
Show More Cited By

Index Terms

A retrospective study of one decade of artifact evaluations

Index terms have been assigned to the content through auto-classification.

Recommendations

Community expectations for research artifacts and evaluation processes
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Background. Artifact evaluation has been introduced into the software engineering and programming languages research community with a pilot at ESEC/FSE 2011 and has since then enjoyed a healthy adoption throughout the conference landscape. Objective. In ...
Thoughts about Artifact Badging

Reproducibility: the extent to which consistent results are obtained when an experiment is repeated, is important as a means to validate experimental results, promote integrity of research, and accelerate follow up work. Commitment to artifact reviewing ...
An Artifact Evaluation of NDP

Artifact badging aims to rank the quality of submitted research artifacts and promote reproducibility. However, artifact badging may not indicate inherent design and evaluation limitations.

This work explores current limits in artifact badging using a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,005
Total Downloads

Downloads (Last 12 months)460
Downloads (Last 6 weeks)53

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lathouwers SLiu YZaytsev V(2025)Extract, model, refine: improved modelling of program verification tools through data enrichmentSoftware and Systems Modeling10.1007/s10270-024-01232-7Online publication date: 8-Jan-2025
https://doi.org/10.1007/s10270-024-01232-7
Beyer DChien PJankola MLee N(2024)A Transferability Study of Interpolation-Based Hardware Model Checking for Software VerificationProceedings of the ACM on Software Engineering10.1145/36607971:FSE(2028-2050)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660797
Abualhaija SAydemir FDalpiaz FDell'Anna DFerrari AFranch XFucci D(2024)Replication in Requirements Engineering: The NLP for RE CaseACM Transactions on Software Engineering and Methodology10.1145/365866933:6(1-33)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3658669
Frattini JFucci DTorkar RMendez D(2024)A Second Look at the Impact of Passive Voice Requirements on Domain Modeling: Bayesian Reanalysis of an ExperimentProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648211(27-33)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643664.3648211
Guilloteau QCiorba FPoquet MGoepp DRichard O(2024)Longevity of Artifacts in Leading Parallel and Distributed Systems Conferences: a Review of the State of the Practice in 2023Proceedings of the 2nd ACM Conference on Reproducibility and Replicability10.1145/3641525.3663631(121-133)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3641525.3663631
Ahmed SOrtiz REisty N(2024)Decade-long Utilization Patterns of ICSE Technical Papers and Associated Artifacts2024 IEEE/ACIS 22nd International Conference on Software Engineering Research, Management and Applications (SERA)10.1109/SERA61261.2024.10685638(166-173)Online publication date: 30-May-2024
https://doi.org/10.1109/SERA61261.2024.10685638
Guevara-Vega CBernárdez BCruz MDurán ARuiz-Cortés ASolari M(2024)Research artifacts for human-oriented experiments in software engineering: An ACM badges-driven structure proposalJournal of Systems and Software10.1016/j.jss.2024.112187218(112187)Online publication date: Dec-2024
https://doi.org/10.1016/j.jss.2024.112187
Frattini JMontgomery LFucci DUnterkalmsteiner MMendez DFischbach J(2024)Requirements quality research artifacts: Recovery, analysis, and management guidelineJournal of Systems and Software10.1016/j.jss.2024.112120216(112120)Online publication date: Oct-2024
https://doi.org/10.1016/j.jss.2024.112120
Abate AAlthoff MBu LErnst GFrehse GGeretti LJohnson TMenghi CMitsch SSchupp SSoudjani S(2024)The ARCH-COMP Friendly Verification Competition for Continuous and Hybrid SystemsTOOLympics Challenge 202310.1007/978-3-031-67695-6_1(1-37)Online publication date: 26-Apr-2024
https://dl.acm.org/doi/10.1007/978-3-031-67695-6_1
Olszewski DLu AStillman CWarren KKitroser CPascual AUkirde DButler KTraynor PMeng WJensen CCremers CKirda E(2023)"Get in Researchers; We're Measuring Reproducibility": A Reproducibility Study of Machine Learning Papers in Tier 1 Security ConferencesProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623130(3433-3459)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623130

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten