skip to main content
10.1145/3379177.3388909acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

From Ad-Hoc Data Analytics to DataOps

Published: 16 September 2020 Publication History

Abstract

The collection of high-quality data provides a key competitive advantage to companies in their decision-making process. It helps to understand customer behavior and enables the usage and deployment of new technologies based on machine learning. However, the process from collecting the data, to clean and process it to be used by data scientists and applications is often manual, non-optimized and error-prone. This increases the time that the data takes to deliver value for the business. To reduce this time companies are looking into automation and validation of the data processes. Data processes are the operational side of data analytic workflow.
DataOps, a recently coined term by data scientists, data analysts and data engineers refer to a general process aimed to shorten the end-to-end data analytic life-cycle time by introducing automation in the data collection, validation, and verification process. Despite its increasing popularity among practitioners, research on this topic has been limited and does not provide a clear definition for the term or how a data analytic process evolves from ad-hoc data collection to fully automated data analytics as envisioned by DataOps.
This research provides three main contributions. First, utilizing multi-vocal literature we provide a definition and a scope for the general process referred to as DataOps. Second, based on a case study with a large mobile telecommunication organization, we analyze how multiple data analytic teams evolve their infrastructure and processes towards DataOps. Also, we provide a stairway showing the different stages of the evolution process. With this evolution model, companies can identify the stage which they belong to and also, can try to move to the next stage by overcoming the challenges they encounter in the current stage.

References

[1]
[n.d.]. 3 reasons why DataOps is essential for big data success | IBM Big Data & Analytics Hub. https://www.ibmbigdatahub.com/blog/3-reasons-why-dataops-essential-big-data-success. (Accessed on 01/24/2020).
[2]
[n.d.]. Data Ops. https://www.gartner.com/en/information-technology/glossary/data-ops. (Accessed on 01/14/2020).
[3]
[n.d.]. DataOps - Devops for Big Data and Analytics | XenonStack. https://www.xenonstack.com/insights/what-is-dataops/. (Accessed on 01/14/2020).
[4]
[n.d.]. DataOps and the DataOps Manifesto - ODSC - Open Data Science - Medium. https://medium.com/@ODSC/dataops-and-the-dataops-manifesto-fc6169c02398. (Accessed on 01/14/2020).
[5]
[n.d.]. DataOps: Changing the world one organization at a time | ZDNet. https://www.zdnet.com/article/dataops-changing-the-world-one-organization-at-a-time/. (Accessed on 01/14/2020).
[6]
[n.d.]. DataOps in Seven Steps - data-ops - Medium. https://medium.com/data-ops/dataops-in-7-steps-f72ff2b37812. (Accessed on 01/25/2020).
[7]
[n.d.]. DataOps is NOT Just DevOps for Data - data-ops - Medium. https://medium.com/data-ops/dataops-is-not-just-devops-for-data-6e03083157b7. (Accessed on 12/20/2019).
[8]
[n.d.]. The DataOps Manifesto. https://www.dataopsmanifesto.org/. (Accessed on 01/14/2020).
[9]
[n.d.]. DataOps: More Than DevOps for Data Pipelines. https://www.eckerson.com/articles/dataops-more-than-devops-for-data-pipelines. (Accessed on 01/25/2020).
[10]
[n.d.]. Diving into DataOps: The Underbelly of Modern Data Pipelines. https://www.eckerson.com/articles/diving-into-dataops-the-underbelly-of-modern-data-pipelines. (Accessed on 01/14/2020).
[11]
[n.d.]. The Emergence of DataOps Empowers the Future of Data Management | Analytics Insight. https://www.analyticsinsight.net/emergence-dataops-empowers-future-data-management/. (Accessed on 01/24/2020).
[12]
[n.d.]. From DevOps to DataOps - DataOps Tools Transformation | Tamr. https://www.tamr.com/blog/from-devops-to-dataops-by-andy-palmer/. (Accessed on 01/14/2020).
[13]
[n.d.]. Get Ready for DataOps - DATAVERSITY. https://www.dataversity.net/get-ready-for-dataops/. (Accessed on 01/25/2020).
[14]
[n.d.]. What is DataOps? - DataOps zone. https://dataopszone.com/what-is-dataops/. (Accessed on 01/25/2020).
[15]
[n.d.]. What is DataOps? Everything You Need to Know | Oracle Data Science. https://blogs.oracle.com/datascience/what-is-dataops-everything-you-need-to-know. (Accessed on 01/25/2020).
[16]
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 291--300.
[17]
Harvinder Atwal. 2020. The DataOps Factory. In Practical DataOps. Springer, 249--266.
[18]
Pamela Baxter and Susan Jack. 2008. Qualitative case study methodology: Study design and implementation for novice researchers. The qualitative report 13, 4 (2008), 544--559.
[19]
Jan Bosch. 2017. Speed, data, and ecosystems: Excelling in a software-driven world. CRC press.
[20]
Julian Ereth. 2018. DataOps-Towards a Definition. In LWDA. 104--112.
[21]
Vahid Garousi, Michael Felderer, and Mika V Mäntylä. 2016. The need for multivocal literature reviews in software engineering: complementing systematic literature reviews with grey literature. In Proceedings of the 20th international conference on evaluation and assessment in software engineering. ACM, 26.
[22]
Vahid Garousi, Michael Felderer, and Mika V Mäntylä. 2019. Guidelines for including grey literature and conducting multivocal literature reviews in software engineering. Information and Software Technology 106 (2019), 101--121.
[23]
Sylvia Ilieva, Penko Ivanov, and Eliza Stefanova. 2004. Analyses of an agile methodology implementation. In Proceedings. 30th Euromicro Conference, 2004. IEEE, 326--333.
[24]
HV Jagadish, Johannes Gehrke, Alexandros Labrinidis, Yannis Papakonstantinou, Jignesh M Patel, Raghu Ramakrishnan, and Cyrus Shahabi. 2014. Big data and its technical challenges. Commun. ACM 57, 7 (2014), 86--94.
[25]
Lucy Ellen Lwakatare, Pasi Kuvaja, and Markku Oivo. 2016. An exploratory study of devops extending the dimensions of devops with practices. ICSEA 2016 104 (2016).
[26]
Moira Maguire and Brid Delahunt. 2017. Doing a thematic analysis: A practical, step-by-step guide for learning and teaching scholars. AISHE-J: The All Ireland Journal of Teaching and Learning in Higher Education 9, 3 (2017).
[27]
Rodney T Ogawa and Betty Malen. 1991. Towards rigor in reviews of multivocal literatures: Applying the exploratory case study method. Review of educational research 61, 3 (1991), 265--286.
[28]
Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering 14, 2 (2009), 131.
[29]
Prabin Ranjan Sahoo and Anshu Premchand. 2019. DataOps in Manufacturing and Utilities Industries. (2019).
[30]
Mojtaba Shahin, Muhammad Ali Babar, and Liming Zhu. 2017. Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE Access 5 (2017), 3909--3943.

Cited By

View all

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSSP '20: Proceedings of the International Conference on Software and System Processes
June 2020
208 pages
ISBN:9781450375122
DOI:10.1145/3379177
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Agile Methodology
  2. Continuous Monitoring
  3. Data Pipelines
  4. Data technologies
  5. DataOps
  6. DevOps

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICSSP '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)20
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media