skip to main content
10.1145/3408877.3432457acmconferencesArticle/Chapter ViewAbstractPublication PagessigcseConference Proceedingsconference-collections
research-article

A Data-centric Computing Curriculum for a Data Science Major

Published: 05 March 2021 Publication History

Abstract

Many universities are introducing a new major in Data Science into their offering, to reflect the explosive growth in this field and the career opportunities it provides. As a field Data Science has elements from Computer Science and from Statistics, and curricula plans differ widely, both in the balance between the CS and Stats aspects, and also in the emphasis within the computing topics. This paper reports on the curriculum that has been taught for three years now at the University of Sydney. In particular, we describe the approach of a sequence of computing subjects which were developed specifically for the major, in order to bring students over several years to a sophisticated understanding of the data-handling aspects of Data Science. Students also take traditional subjects from both CS (such as Data Structures or AI) and from Statistics (such as Learning from Data and Statistical Inference). The data-centric specially-designed subjects we discuss in this paper are (i) Informatics: Data and Computation (in the first year), (ii) Big Data and Data Diversity (in the second year), and then upper-division subjects on (iii) Data Science Platforms, and (iv) Human-in-the-Loop Data Analytics.

References

[1]
Joel C. Adams. 2020. Creating a Balanced Data Science Program. In SIGCSE '20: The 51st ACM Technical Symposium on Computer Science Education, Portland, OR, USA, March 11--14, 2020 . 185--191. https://doi.org/10.1145/3328778.3366800
[2]
Mike Ananny and Kate Crawford. 2018. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. new media & society, Vol. 20, 3 (2018), 973--989.
[3]
Noah Apthorpe, Yan Shvartzshnaider, Arunesh Mathur, Dillon Reisman, and Nick Feamster. 2018. Discovering smart home internet of things privacy norms using contextual integrity. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Vol. 2, 2 (2018), 1--23.
[4]
Wendy Barber and Albert Badre. 1998. Culturability: The merging of culture and usability. In Proceedings of the 4th Conference on Human Factors and the Web, Vol. 7. 1--10.
[5]
Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2017. Fairness in machine learning. NIPS Tutorial, Vol. 1 (2017).
[6]
Jacques Bertin. 1983. Semiology of graphics; diagrams networks maps . Technical Report.
[7]
Thomas C. Bressoud and Gavin Thomas. 2019. A Novel Course in Data Systems with Minimal Prerequisites. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE'19). 15--21.
[8]
Andrea Danyluk and Paul Leidig. 2019 a. Computing Competencies for Undergraduate Data Science Curricula (Draft 2). http://dstf.acm.org/DSReportDraft2Full.pdf .
[9]
Andrea Danyluk and Paul Leidig. 2019 b. Computing Competencies for Undergraduate Data Science Curricula (Initial Draft). dstf.acm.org/DSReportInitialFull.pdf .
[10]
Debzani Deb, M. Muztaba Fuad, and Keith Irwin. 2019. A Module-based Approach to Teaching Big data and Cloud Computing Topics at CS Undergraduate Level. In Proceedings of the 50th ACM Technical Symposium on Computer Science Education (Minneapolis, MN, USA) (SIGCSE'19). 2--8.
[11]
Cynthia Dwork and Jonathan Ullman. 2018. The fienberg problem: How to allow human interactive data analysis in the age of differential privacy. Journal of Privacy and Confidentiality, Vol. 8, 1 (2018).
[12]
Lisa Gitelman. 2013. Raw data is an oxymoron .MIT press.
[13]
Rex Hartson and Pardha S Pyla. 2018. The UX book: Agile UX design for a quality user experience .Morgan Kaufmann.
[14]
Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar. 2012. The MADlib Analytics Library or MAD Skills, the SQL . Proc. VLDB Endow., Vol. 5, 12 (2012), 1700--1711. https://doi.org/10.14778/2367502.2367510
[15]
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1--16.
[16]
Habib Karbasian and Aditya Johri. 2020. Insights for Curriculum Development: Identifying Emerging Data Science Topics through Analysis of Q&A Communities. In SIGCSE '20: The 51st ACM Technical Symposium on Computer Science Education, Portland, OR, USA, March 11--14, 2020 . 192--198. https://doi.org/10.1145/3328778.3366817
[17]
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E John, and Brad A Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 174.
[18]
Donald Ervin Knuth. 1984. Literate programming. Comput. J., Vol. 27, 2 (1984), 97--111.
[19]
Shriram Krishnamurthi and Kathi Fisler. 2020. Data-centricity: a challenge and opportunity for computing education. Commun. ACM, Vol. 63, 8 (2020), 24--26. https://doi.org/10.1145/3408056
[20]
Jeffrey T Leek and Roger D Peng. 2015. Statistics: P values are just the tip of the iceberg. Nature, Vol. 520, 7549 (2015), 612--612.
[21]
George A Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review, Vol. 63, 2 (1956), 81.
[22]
Bureau of Transportation Statistics. 2020. Airline On-Time Performance Data. https://www.transtats.bts.gov/Tables.asp?DB_ID=120
[23]
Gerry Pallier, Rebecca Wilkinson, Vanessa Danthiir, Sabina Kleitman, Goran Knezevic, Lazar Stankov, and Richard D Roberts. 2002. The role of individual differences in the accuracy of confidence judgments. The Journal of general psychology, Vol. 129, 3 (2002), 257--299.
[24]
Jo ao Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire. 2019. A large-scale study about quality and reproducibility of jupyter notebooks. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). IEEE, 507--517.
[25]
Bina Ramamurthy. 2016. A Practical and Sustainable Model for Learning and Teaching Data Science. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education (Memphis, TN, USA) (SIGCSE'16). 169--174.
[26]
Uwe Röhm, Lexi Brent, Tim Dawborn, and Bryn Jeffries. 2020. SQL for Data Scientists: Designing SQL Tutorials for Scalable Online Teaching. Proceedings of the VLDB (PVLDB), Vol. 13, 12 (2020), 2989--2992.
[27]
Stephanie Rosenthal and Tingting (Rachel) Chung. 2020. A Data Science Major: Building Skills and Confidence. In SIGCSE '20: The 51st ACM Technical Symposium on Computer Science Education, Portland, OR, USA, March 11--14, 2020. 178--184. https://doi.org/10.1145/3328778.3366791
[28]
Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, Mai H Nguyen, Sara Brin Rosenthal, Fernando Pérez, et almbox. 2019. Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks.
[29]
Adam Rule, Aurélien Tabard, and James D Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--12.
[30]
Eduardo Salas, Dana E Sims, and C Shawn Burke. 2005. Is there a 'big five' in teamwork? Small group research, Vol. 36, 5 (2005), 555--599.
[31]
Ben Rydal Shapiro, Amanda Meng, Cody O'Donnell, Charlotte Lou, Edwin Zhao, Bianca Dankwa, and Andrew Hostetler. 2020. Re-Shape: A method to teach data ethics for data science education. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--13.
[32]
Ben Shneiderman. 1996. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proceedings of the 1996 IEEE Symposium on Visual Languages, Boulder, Colorado, USA, September 3--6, 1996. 336--343. https://doi.org/10.1109/VL.1996.545307
[33]
David G. Sullivan. 2013. A Data-centric Introduction to Computer Science for Non-majors. In Proceeding of the 44th ACM Technical Symposium on Computer Science Education (Denver, Colorado, USA) (SIGCSE '13). ACM, New York, NY, USA, 71--76.
[34]
Edward R Tufte. 2001. The visual display of quantitative information. Vol. 2. Graphics press Cheshire, CT.
[35]
Edward R Tufte. 2006. Beautiful evidence .Graphis Pr.
[36]
Edward R Tufte, Susan R McKay, Wolfgang Christian, and James R Matey. 1998. Visual explanations: Images and quantities, evidence and narrative.

Cited By

View all

Index Terms

  1. A Data-centric Computing Curriculum for a Data Science Major

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGCSE '21: Proceedings of the 52nd ACM Technical Symposium on Computer Science Education
    March 2021
    1454 pages
    ISBN:9781450380621
    DOI:10.1145/3408877
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. curriculum
    2. data science

    Qualifiers

    • Research-article

    Conference

    SIGCSE '21
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,595 of 4,542 submissions, 35%

    Upcoming Conference

    SIGCSE Virtual 2024
    1st ACM Virtual Global Computing Education Conference
    December 5 - 8, 2024
    Virtual Event , NC , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)54
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 15 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media