skip to main content
10.1145/1998076.1998124acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
research-article

Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag

Published: 13 June 2011 Publication History

Abstract

Various approaches for plagiarism detection exist. All are based on more or less sophisticated text analysis methods such as string matching, fingerprinting or style comparison. In this paper a new approach called Citation-based Plagiarism Detection is evaluated using a doctoral thesis, in which a volunteer crowd-sourcing project called GuttenPlag identified substantial amounts of plagiarism through careful manual inspection. This new approach is able to identify similar and plagiarized documents based on the citations used in the text. It is shown that citation-based plagiarism detection performs significantly better than text-based procedures in identifying strong paraphrasing, translation and some idea plagiarism. Detection rates can be improved by combining citation-based with text-based plagiarism detection.

References

[1]
Guttenplag wiki. Online Resource, 2011. Retrieved Apr. 10, 2011 from http://de.guttenplag.wikia.com.
[2]
BLOOMFIELD, L. A. The Plagiarism Resource Site. Online Resource, 2011. Retrieved Mar. 20, 2011 from http://www.plagiarism.phys.virginia.edu.
[3]
CLOUGH, P. Plagiarism in natural and programming languages an overview of current tools and technologies. Tech. rep., Department of Computer Science, University of Sheffield, July 2000.
[4]
COCEL. Concise Oxford Companion to the English Language. Oxford Reference Online. Oxford University Press, 1998.
[5]
FRÖHLICH, G. Plagiate und unethische Autorenschaften. Information - Wissenschaft & Praxis 57, 2 (2006), 81--89.
[6]
GARFIELD, E. Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas. Science 122, 3159 (July 1955), 108--111.
[7]
GIPP, B., AND BEEL, J. Citation Based Plagiarism Detection - A New Approach to Identify Plagiarized Work Language Independently. In Proceedings of the 21st ACM Conference on Hyptertext and Hypermedia (HT'10) (New York, USA, June 2010), ACM, pp. 273--274.
[8]
GUTTENBERG, K.-T. F. Verfassung und Verfassungsvertrag : Konstitutionelle Entwicklungsstufen in den USA und der EU. Dissertation (Retracted as plagiarism), Universität Bayreuth, Berlin, 2009.
[9]
HOAD, T. C., AND ZOBEL, J. Methods for Identifying Versioned and Plagiarised Documents. Journal of the American Society for Information Science and Technology 54, 3 (2003), 203--215.
[10]
HOCHSCHULE FÜR TECHNIK UND WIRTSCHAFT BERLIN. Portal Plagiat - Test von Plagiatserkennungssoftware. Online Resource. Retrieved Apr. 08, 2011 from http://plagiat.htw-berlin.de/software/.
[11]
LYON, C., MALCOLM, J., AND DICKERSON, B. Detecting Short Passages of Similar Text in Large Document Collections. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (2001), L. Lee and D. Harman, Eds., pp. 118--125.
[12]
MAURER, H., KAPPE, F., AND ZAKA, B. Plagiarism - A Survey. Journal of Universal Computer Science 12, 8 (Aug. 2006), 1050--1084.
[13]
MONOSTORI, K., ZASLAVSKY, A., AND SCHMIDT, H. Document Overlap Detection System for Distributed Digital Libraries. In DL '00: Proceedings of the fifth ACM conference on Digital libraries (New York, NY, USA, 2000), ACM, pp. 226--227.
[14]
POTTHAST, M., BARRÓN-CEDEÑO, A., EISELT, A., STEIN, B., AND ROSSO, P. Overview of the 2nd international competition on plagiarism detection. In Notebook Papers of CLEF 2010 LABs and Workshops, 22--23 September, Padua, Italy (Sept. 2010), M. Braschler, D. Harman, and E. Pianta, Eds.
[15]
POTTHAST, M., STEIN, B., BARRÓN-CEDEÑO, A., AND ROSSO, P. An evaluation framework for plagiarism detection. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) (Block A, Xue Yan Building, Tsinghua University, Beijing 100084, China, Aug. 2010), C.-R. Huang and D. Jurafsky, Eds., Tsinghua University Press.
[16]
RUDMAN, J. The state of authorship attribution studies: Some problems and solutions. Computers and the Humanities 31 (1997), 351--365.
[17]
STEIN, B., KOPPEL, M., AND STAMATATOS, E., Eds. Proceedings of the SIGIR 2007 International Workshop on Plagiarism Analysis, Authorship Identification, and Near Duplicate Detection, PAN 2007, Amsterdam, Netherlands, July 27, 2007 (2007), vol. 276 of CEUR Workshop Proceedings, CEUR-WS.org.
[18]
STEIN, B., LIPKA, N., AND PRETTENHOFER, P. Intrinsic Plagiarism Analysis. Language Resources and Evaluation (2010), 1--20.
[19]
SUN, Z., ERRAMI, M., LONG, T., RENARD, C., CHORADIA, N., AND GARNER, H. Systematic characterizations of text similarity in full text biomedical publications. PLoS ONE 5, 9 (Sept. 2010), e12704.
[20]
WEBER-WULFF, D. Test cases for plagiarism detection software. In Proceedings of the 4th International Plagiarism Conference (Newcastle Upon Tyne, 2010).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
JCDL '11: Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
June 2011
500 pages
ISBN:9781450307444
DOI:10.1145/1998076
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. citation-based plagiarism detection
  2. plagiarism detection systems

Qualifiers

  • Research-article

Conference

JCDL '11
Sponsor:
JCDL '11: Joint Conference on Digital Libraries
June 13 - 17, 2011
Ontario, Ottawa, Canada

Acceptance Rates

Overall Acceptance Rate 415 of 1,482 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media