skip to main content
research-article
Open access

Designing types for R, empirically

Published: 13 November 2020 Publication History

Abstract

The R programming language is widely used in a variety of domains. It was designed to favor an interactive style of programming with minimal syntactic and conceptual overhead. This design is well suited to data analysis, but a bad fit for tools such as compilers or program analyzers. In particular, R has no type annotations, and all operations are dynamically checked at run-time. The starting point for our work are the two questions: what expressive power is needed to accurately type R code? and which type system is the R community willing to adopt? Both questions are difficult to answer without actually experimenting with a type system. The goal of this paper is to provide data that can feed into that design process. To this end, we perform a large corpus analysis to gain insights in the degree of polymorphism exhibited by idiomatic R code and explore potential benefits that the R community could accrue from a simple type system. As a starting point, we infer type signatures for 25,215 functions from 412 packages among the most widely used open source R libraries. We then conduct an evaluation on 8,694 clients of these packages, as well as on end-user code from the Kaggle data science competition website.

Supplementary Material

Auxiliary Presentation Video (oopsla20main-p225-p-video.mp4)
This is a video presentation of the paper "Designing Types for R, Empirically", which appears at OOPSLA'20. In our paper, we propose a type annotation framework for R functions. We also undertake a large empirical study, collecting a vast amount of data on how R programmers use the language’s rich dynamic types, querying this data to help validate our type language design. This video presentation will show a cross section of our work: we will detail the design and evaluation process for R’s vectorized primitive types, shedding light on our approach to retrofitting a type checking framework onto a dynamic language.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. In Conference on Programming Language Design and Implementation (PLDI). https://doi.org/10.1145/3385412.3385997
[2]
Jong-hoon (David) An, Avik Chaudhuri, Jefrey S. Foster, and Michael Hicks. 2011. Dynamic Inference of Static Types for Ruby. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. https://doi.org/10.1145/1926385.1926437
[3]
Esben Andreasen, Colin S. Gordon, Satish Chandra, Manu Sridharan, Frank Tip, and Koushik Sen. 2016. Trace Typing: An Approach for Evaluating Retrofitted Type Systems. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.4230/LIPIcs.ECOOP. 2016.1
[4]
Richard A. Becker, John M. Chambers, and Allan R. Wilks. 1988. The New S Language. Chapman & Hall, London.
[5]
Jef Bezanson, Jiahao Chen, Ben Chung, Stefan Karpinski, Viral B. Shah, Jan Vitek, and Lionel Zoubritzky. 2018. Julia: Dynamism and Performance Reconciled by Design. Proc. ACM Program. Lang. 2, OOPSLA ( 2018 ). https://doi.org/10. 1145/3276490
[6]
Gavin Bierman, Martin Abadi, and Mads Torgersen. 2014. Understanding TypeScript. In European Conference on ObjectOriented Programming (ECOOP).
[7]
Gavin M. Bierman, Erik Meijer, and Mads Torgersen. 2010. Adding Dynamic Types to C#. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/978-3-642-14107-2_5
[8]
Michael Furr, Jong-hoon (David) An, and Jefrey S. Foster. 2009. Profile-guided static typing for dynamic scripting languages. In OOPSLA. https://doi.org/10.1145/1640089.1640110
[9]
Aviral Goel and Jan Vitek. 2019. On the Design, Implementation, and Use of Laziness in R. Proc. ACM Program. Lang. 3, OOPSLA ( 2019 ). https://doi.org/10.1145/3360579
[10]
Ross Ihaka and Robert Gentleman. 1996. R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics 5, 3 ( 1996 ), 299-314. http://www.amstat.org/publications/jcgs/
[11]
Uwe Ligges. [n. d.]. 20 Years of CRAN (Video on Channel9. In Keynote at UseR!
[12]
André Murbach Maidl, Fabio Mascarenhas, and Roberto Ierusalimschy. 2014. Typed Lua: An Optional Type System for Lua. In Workshop on Dynamic Languages and Applications (DyLa). https://doi.org/10.1145/2617548.2617553
[13]
Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. 2012. Evaluating the Design of the R Language: Objects and Functions for Data Analysis. In European Conference on Object-Oriented Programming (ECOOP). https://doi.org/10.1007/ 978-3-642-31057-7_6
[14]
Python Team. 2020. Type Hints for Python. https://docs.python.org/3/library/typing.html.
[15]
Ole Tange et al. 2011. Gnu parallel-the command-line power tool. The USENIX Magazine 36, 1 ( 2011 ).
[16]
Sam Tobin-Hochstadt and Matthias Felleisen. 2008. The design and implementation of typed Scheme. In Symposium on Principles of Programming Languages (POPL).
[17]
Julien Verlaguet. 2013. Hack for HipHop. CUFP, 2013, http://tinyurl.com/lk8fy9q.
[18]
Tobias Wrigstad, Francesco Zappa Nardelli, Sylvain Lebresne, Johan Östlund, and Jan Vitek. 2010. Integrating typed and untyped code in a scripting language. In Symposium on Principles of Programming Languages (POPL). https: //doi.org/10.1145/1706299.1706343

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
November 2020
3108 pages
EISSN:2475-1421
DOI:10.1145/3436718
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Published in PACMPL Volume 4, Issue OOPSLA

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. R
  2. dynamic languages
  3. type declarations

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)197
  • Downloads (Last 6 weeks)33
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media