skip to main content
10.1145/2509136.2509523acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Detecting API documentation errors

Published: 29 October 2013 Publication History

Abstract

When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead programmers as they may contain errors (e.g., broken code names and obsolete code samples). Although most API documents are actively maintained and updated, studies show that many new and latent errors do exist. It is tedious and error-prone to find such errors manually as API documents can be enormous with thousands of pages. Existing tools are ineffective in locating documentation errors because traditional natural language (NL) tools do not understand code names and code samples, and traditional code analysis tools do not understand NL sentences. In this paper, we propose the first approach, DOCREF, specifically designed and developed to detect API documentation errors. We formulate a class of inconsistencies to indicate potential documentation errors, and combine NL and code analysis techniques to detect and report such inconsistencies. We have implemented DOCREF and evaluated its effectiveness on the latest documentations of five widely-used API libraries. DOCREF has detected more than 1,000 new documentation errors, which we have reported to the authors. Many of the errors have already been confirmed and fixed, after we reported them.

References

[1]
A. Bacchelli, M. D'Ambros, and M. Lanza. Extracting source code from e-mails. In Proc. 18th ICPC, pages 24--33, 2010.
[2]
A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In Proc. 32nd ICSE, pages 375--384, 2010.
[3]
A. Bacchelli, T. Dal Sasso, M. D'Ambros, and M. Lanza. Content classification of development emails. In Proc. 34th ICSE, pages 375--385, 2012.
[4]
R. Buse and W. Weimer. Automatic documentation inference for exceptions. In Proc. ISSTA, pages 273--282, 2008.
[5]
B. Carpenter and B. Baldwin. Text analysis with LingPipe 4. LingPipe Inc, 2011.
[6]
C. E. Chaski. Empirical evaluations of language-based author identification techniques. Forensic Linguistics, 8:1--65, 2001.
[7]
B. Dagenais and L. J. Hendren. Enabling static analysis for partial Java programs. In Proc. 23rd OOPSLA, pages 313--328, 2008.
[8]
B. Dagenais and M. P. Robillard. Creating and evolving developer documentation: understanding the decisions of open source contributors. In Proc. 18th FSE, pages 127--136, 2010.
[9]
B. Dagenais and M. P. Robillard. Recovering traceability links between an API and its learning resources. In Proc. 34rd ICSE, pages 47--57, 2012.
[10]
S. de Souza, N. Anquetil, and K. de Oliveira. A study of the documentation essential to software maintenance. In Proc. 23rd SIGDOC, pages 68--75, 2005.
[11]
U. Dekel and J. D. Herbsleb. Improving API documentation usability with knowledge pushing. In Proc. 31st ICSE, pages 320--330, 2009.
[12]
E. Duala-Ekoko and M. P. Robillard. Asking and answering questions about unfamiliar APIs: An exploratory study. In Proc. 34rd ICSE, pages 266--276, June 2012.
[13]
A. Fantechi, S. Gnesi, G. Lami, and A. Maccari. Applications of linguistic techniques for use case analysis. Requirement Engineering, 8(3):161--170, 2003.
[14]
I. S. Fraser and L. M. Hodson. Twenty-one kicks at the grammar horse: Close-up: Grammar and composition. English journal, 67(9):49--54, 1978.
[15]
J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification, Java SE 7 Edition. 2012.
[16]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proc. 34th ICSE, pages 837--847, 2012.
[17]
E. W. Høst and B. M. Østvold. Debugging method names. In Proc. 23rd ECOOP, pages 294--317, 2009.
[18]
J. Kim, S. Lee, S.-W. Hwang, and S. Kim. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 31(1):1, 2013.
[19]
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proc. 41st ACL, pages 423--430, 2003.
[20]
L. Kof. Scenarios: Identifying missing objects and actions by means of computational linguistics. In Proc. 15th RE, pages 121--130, 2007.
[21]
V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In MobiSys, to appear, 2013.
[22]
T. C. Lethbridge, J. Singer, and A. Forward. How software engineers use documentation: The state of the practice. Software, IEEE, 20(6): 35--39, 2003.
[23]
W. Maalej and M. P. Robillard. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, to appear.
[24]
T. Mens and T. Tourwé. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004.
[25]
M. Miłkowski. Developing an open-source, rule-based proofreading tool. Software: Practice and Experience, 40(7):543--566, 2010.
[26]
J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proc. 20th SIGDOC, pages 133--141, 2002.
[27]
N. Nystrom, M. Clarkson, and A. Myers. Polyglot: An extensible compiler framework for Java. Compiler Construction, 2622:138--152, 2003.
[28]
R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language API descriptions. In Proc. 34th ICSE, pages 815--825, 2012.
[29]
R. Prieto-Díaz. Status report: Software reusability. Software, IEEE, 10(3):61--66, 1993.
[30]
P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proc. 35th ICSE, page 11, 2013.
[31]
M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703--732, 2011.
[32]
P. Sawyer, P. Rayson, and R. Garside. REVERE: Support for requirements synthesis from documents. Information Systems Frontiers, 4(3): 343--353, 2002.
[33]
D. Schreck, V. Dallmeier, and T. Zimmermann. How documentation evolves over time. In Proc. IWPSE, pages 4--10, 2007.
[34]
F. Sebastiani. Machine learning in automated text categorization. ACM computing surveys, 34(1):1--47, 2002.
[35]
L. Shi, H. Zhong, T. Xie, and M. Li. An empirical study on evolution of API documentation. In Proc. FASE, pages 416--431, 2011.
[36]
N. Synytskyy, J. R. Cordy, and T. R. Dean. Robust multilingual parsing using island grammars. In Proc. CASCON, pages 266--278, 2003.
[37]
L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or Bad Comments?*/. In Proc. 21st SOSP, pages 145--158, 2007.
[38]
S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In Proc. 5th ICST, pages 260--269, 2012.
[39]
X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proc. 20th FSE, pages 12:1--12:11, 2012.
[40]
H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In Proc. 23rd ECOOP, pages 318--343, 2009.
[41]
H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In Proc. 24th ASE, pages 307--318, 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
October 2013
904 pages
ISBN:9781450323741
DOI:10.1145/2509136
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. api documentation error
  2. outdated documentation

Qualifiers

  • Research-article

Conference

SPLASH '13
Sponsor:

Acceptance Rates

OOPSLA '13 Paper Acceptance Rate 50 of 189 submissions, 26%;
Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)96
  • Downloads (Last 6 weeks)6
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media