research-article

Detecting API documentation errors

Authors:

Zhendong SuAuthors Info & Claims

OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Pages 803 - 816

https://doi.org/10.1145/2509136.2509523

Published: 29 October 2013 Publication History

Abstract

When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead programmers as they may contain errors (e.g., broken code names and obsolete code samples). Although most API documents are actively maintained and updated, studies show that many new and latent errors do exist. It is tedious and error-prone to find such errors manually as API documents can be enormous with thousands of pages. Existing tools are ineffective in locating documentation errors because traditional natural language (NL) tools do not understand code names and code samples, and traditional code analysis tools do not understand NL sentences. In this paper, we propose the first approach, DOCREF, specifically designed and developed to detect API documentation errors. We formulate a class of inconsistencies to indicate potential documentation errors, and combine NL and code analysis techniques to detect and report such inconsistencies. We have implemented DOCREF and evaluated its effectiveness on the latest documentations of five widely-used API libraries. DOCREF has detected more than 1,000 new documentation errors, which we have reported to the authors. Many of the errors have already been confirmed and fixed, after we reported them.

References

[1]

A. Bacchelli, M. D'Ambros, and M. Lanza. Extracting source code from e-mails. In Proc. 18th ICPC, pages 24--33, 2010.

Digital Library

[2]

A. Bacchelli, M. Lanza, and R. Robbes. Linking e-mails and source code artifacts. In Proc. 32nd ICSE, pages 375--384, 2010.

Digital Library

[3]

A. Bacchelli, T. Dal Sasso, M. D'Ambros, and M. Lanza. Content classification of development emails. In Proc. 34th ICSE, pages 375--385, 2012.

Digital Library

[4]

R. Buse and W. Weimer. Automatic documentation inference for exceptions. In Proc. ISSTA, pages 273--282, 2008.

Digital Library

[5]

B. Carpenter and B. Baldwin. Text analysis with LingPipe 4. LingPipe Inc, 2011.

[6]

C. E. Chaski. Empirical evaluations of language-based author identification techniques. Forensic Linguistics, 8:1--65, 2001.

[7]

B. Dagenais and L. J. Hendren. Enabling static analysis for partial Java programs. In Proc. 23rd OOPSLA, pages 313--328, 2008.

Digital Library

[8]

B. Dagenais and M. P. Robillard. Creating and evolving developer documentation: understanding the decisions of open source contributors. In Proc. 18th FSE, pages 127--136, 2010.

Digital Library

[9]

B. Dagenais and M. P. Robillard. Recovering traceability links between an API and its learning resources. In Proc. 34rd ICSE, pages 47--57, 2012.

Digital Library

[10]

S. de Souza, N. Anquetil, and K. de Oliveira. A study of the documentation essential to software maintenance. In Proc. 23rd SIGDOC, pages 68--75, 2005.

Digital Library

[11]

U. Dekel and J. D. Herbsleb. Improving API documentation usability with knowledge pushing. In Proc. 31st ICSE, pages 320--330, 2009.

Digital Library

[12]

E. Duala-Ekoko and M. P. Robillard. Asking and answering questions about unfamiliar APIs: An exploratory study. In Proc. 34rd ICSE, pages 266--276, June 2012.

Digital Library

[13]

A. Fantechi, S. Gnesi, G. Lami, and A. Maccari. Applications of linguistic techniques for use case analysis. Requirement Engineering, 8(3):161--170, 2003.

Digital Library

[14]

I. S. Fraser and L. M. Hodson. Twenty-one kicks at the grammar horse: Close-up: Grammar and composition. English journal, 67(9):49--54, 1978.

[15]

J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification, Java SE 7 Edition. 2012.

Digital Library

[16]

A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proc. 34th ICSE, pages 837--847, 2012.

Digital Library

[17]

E. W. Høst and B. M. Østvold. Debugging method names. In Proc. 23rd ECOOP, pages 294--317, 2009.

Digital Library

[18]

J. Kim, S. Lee, S.-W. Hwang, and S. Kim. Enriching documents with examples: A corpus mining approach. ACM Transactions on Information Systems, 31(1):1, 2013.

Digital Library

[19]

D. Klein and C. D. Manning. Accurate unlexicalized parsing. In Proc. 41st ACL, pages 423--430, 2003.

Digital Library

[20]

L. Kof. Scenarios: Identifying missing objects and actions by means of computational linguistics. In Proc. 15th RE, pages 121--130, 2007.

[21]

V. Le, S. Gulwani, and Z. Su. SmartSynth: Synthesizing smartphone automation scripts from natural language. In MobiSys, to appear, 2013.

Digital Library

[22]

T. C. Lethbridge, J. Singer, and A. Forward. How software engineers use documentation: The state of the practice. Software, IEEE, 20(6): 35--39, 2003.

Digital Library

[23]

W. Maalej and M. P. Robillard. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering, to appear.

[24]

T. Mens and T. Tourwé. A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2):126--139, 2004.

Digital Library

[25]

M. Miłkowski. Developing an open-source, rule-based proofreading tool. Software: Practice and Experience, 40(7):543--566, 2010.

Digital Library

[26]

J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proc. 20th SIGDOC, pages 133--141, 2002.

Digital Library

[27]

N. Nystrom, M. Clarkson, and A. Myers. Polyglot: An extensible compiler framework for Java. Compiler Construction, 2622:138--152, 2003.

Digital Library

[28]

R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar. Inferring method specifications from natural language API descriptions. In Proc. 34th ICSE, pages 815--825, 2012.

Digital Library

[29]

R. Prieto-Díaz. Status report: Software reusability. Software, IEEE, 10(3):61--66, 1993.

Digital Library

[30]

P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proc. 35th ICSE, page 11, 2013.

Digital Library

[31]

M. P. Robillard and R. DeLine. A field study of API learning obstacles. Empirical Software Engineering, 16(6):703--732, 2011.

Digital Library

[32]

P. Sawyer, P. Rayson, and R. Garside. REVERE: Support for requirements synthesis from documents. Information Systems Frontiers, 4(3): 343--353, 2002.

Digital Library

[33]

D. Schreck, V. Dallmeier, and T. Zimmermann. How documentation evolves over time. In Proc. IWPSE, pages 4--10, 2007.

Digital Library

[34]

F. Sebastiani. Machine learning in automated text categorization. ACM computing surveys, 34(1):1--47, 2002.

Digital Library

[35]

L. Shi, H. Zhong, T. Xie, and M. Li. An empirical study on evolution of API documentation. In Proc. FASE, pages 416--431, 2011.

Digital Library

[36]

N. Synytskyy, J. R. Cordy, and T. R. Dean. Robust multilingual parsing using island grammars. In Proc. CASCON, pages 266--278, 2003.

Digital Library

[37]

L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /* iComment: Bugs or Bad Comments?*/. In Proc. 21st SOSP, pages 145--158, 2007.

Digital Library

[38]

S. H. Tan, D. Marinov, L. Tan, and G. T. Leavens. @tComment: Testing Javadoc comments to detect comment-code inconsistencies. In Proc. 5th ICST, pages 260--269, 2012.

Digital Library

[39]

X. Xiao, A. Paradkar, S. Thummalapenta, and T. Xie. Automated extraction of security policies from natural-language software documents. In Proc. 20th FSE, pages 12:1--12:11, 2012.

Digital Library

[40]

H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. MAPO: Mining and recommending API usage patterns. In Proc. 23rd ECOOP, pages 318--343, 2009.

Digital Library

[41]

H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring resource specifications from natural language API documentation. In Proc. 24th ASE, pages 307--318, 2009.

Digital Library

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Zhang YLiu ZFeng YXu BFilkov VRay BZhou M(2024)Leveraging Large Language Model to Assist Detecting Rust Code Comment InconsistencyProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695010(356-366)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695010
Zhang Yd'Amorim M(2024)Detecting Code Comment Inconsistencies using LLM and Program AnalysisCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3664458(683-685)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3664458
Show More Cited By

Index Terms

Detecting API documentation errors
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Hypertext / hypermedia creation
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation
  2. Software notations and tools
    1. Software libraries and repositories

Recommendations

Live API documentation
ICSE 2014: Proceedings of the 36th International Conference on Software Engineering

Application Programming Interfaces (APIs) provide powerful abstraction mechanisms that enable complex functionality to be used by client programs. However, this abstraction does not come for free: understanding how to use an API can be difficult. While ...
Detecting API documentation errors
OOPSLA '13

When programmers encounter an unfamiliar API library, they often need to refer to its documentations, tutorials, or discussions on development forums to learn its proper usage. These API documents contain valuable information, but may also mislead ...
API documentation and software community values: a survey of open-source API documentation
SIGDOC '13: Proceedings of the 31st ACM international conference on Design of communication

Studies of what software developers need from API documentation have reported consistent findings over the years; however, these studies all used similar methods--usually a form of observation or survey. Our study looks at API documentation as artifacts ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

October 2013

904 pages

ISBN:9781450323741

DOI:10.1145/2509136

Co-chair:
Antony Hosking
Purdue University, USA
,
General Chair:
Patrick Eugster
Purdue University, USA
,
Program Chair:
Cristina V. Lopes
University of California, Irvine, USA

ACM SIGPLAN Notices Volume 48, Issue 10
OOPSLA '13
October 2013
867 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2544173
Editor:
Mark W. Bailey
Hamilton College, Clinton, NY
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPLASH '13

Sponsor:

SIGPLAN

SPLASH '13: Conference on Systems, Programming, and Applications: Software for Humanity

October 29 - 31, 2013

Indiana, Indianapolis, USA

Acceptance Rates

OOPSLA '13 Paper Acceptance Rate 50 of 189 submissions, 26%;

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

70
Total Citations
View Citations
871
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)6

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fan YWang M(2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
https://doi.org/10.3390/a17010028
Zhang YLiu ZFeng YXu BFilkov VRay BZhou M(2024)Leveraging Large Language Model to Assist Detecting Rust Code Comment InconsistencyProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695010(356-366)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695010
Zhang Yd'Amorim M(2024)Detecting Code Comment Inconsistencies using LLM and Program AnalysisCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3664458(683-685)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3664458
Hu PLiang RCao YChen KZhang RCalandrino JTroncoso C(2023)AURCProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620317(1415-1432)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620317
Tan WWagner MTreude C(2023)Wait, wasn’t that code here before? Detecting Outdated Software Documentation2023 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58846.2023.00071(553-557)Online publication date: 1-Oct-2023
https://doi.org/10.1109/ICSME58846.2023.00071
Rocha AMaia M(2023)Mining relevant solutions for programming tasks from search engine resultsIET Software10.1049/sfw2.1212717:4(455-471)Online publication date: 14-Jun-2023
https://dl.acm.org/doi/10.1049/sfw2.12127
Rani PBlasi AStulova NPanichella SGorla ANierstrasz O(2023)A decade of code comment quality assessmentJournal of Systems and Software10.1016/j.jss.2022.111515195:COnline publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jss.2022.111515
Li RYang YLiu JHu PMeng G(2022)The inconsistency of documentation: a study of online C standard library documentsCybersecurity10.1186/s42400-022-00118-95:1Online publication date: 2-Jul-2022
https://doi.org/10.1186/s42400-022-00118-9
Zhong H(2022)Enriching Compiler Testing with Real Program from Bug ReportProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556894(1-12)Online publication date: 10-Oct-2022
https://dl.acm.org/doi/10.1145/3551349.3556894
Eberhart ZBansal AMcMillan C(2022)A Wizard of Oz Study Simulating API Usage Dialogues With a Virtual AssistantIEEE Transactions on Software Engineering10.1109/TSE.2020.304093548:6(1883-1904)Online publication date: 1-Jun-2022
https://doi.org/10.1109/TSE.2020.3040935
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents