skip to main content
research-article

Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity

Published: 30 November 2012 Publication History

Abstract

The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries). We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.

References

[1]
Abbasi, A. and Chen, H. 2008. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26, 2, 1--29.
[2]
Adams, C. 2006. With a little help from my friends (and colleagues): The multidisciplinary requirement for privacy. http://www.idtrail.org/content/view/402/42/.
[3]
Afroz, S., Brennan, M., and Greenstadt, R. 2012. Detecting hoaxes, frauds, and deception in writing style online. In Proceedings of the IEEE Symposium on Security and Privacy.
[4]
Brennan, M. and Greenstadt, R. 2009. Practical attacks on authorship recognition techniques. Innov. Appl. Artif. Intell.
[5]
Chaski, C. E. 2005. Who's at the keyboard: Authorship attribution in digital evidence investigations. In Proceedings of the 8th Biennial Conference on Forensic Linguistics/Language and Law.
[6]
Clark, J. H. and Hannon, C. J. 2007. A classifier system for author recognition using synonym-based features. In Lecture Notes in Computer Science, vol. 4827. Springer, 839--849.
[7]
Colosimo, M., Graef, R., Lampert, S., and Peterson, M. 2009. State of the art biometrics excellence roadmap. Tech. rep., U.S. Department of Justice Federal Bureau of Investigation.
[8]
Domscheit-Berg, D., Klopp, T., and Chase, J. 2011. Inside Wikileaks: My Time with Julian Assange at the World's Most Dangerous Website. Crown Publishers.
[9]
Holmes, D. and Forsyth, R. 1995. The federalist revisited: New directions in authorship attribution. Liter. Linguist. Comput. 10, 111--127.
[10]
Juola, P. 2006. Authorship attribution. Found. Trends Inf. Retr. 1, 233--334.
[11]
Juola, P. and Vescovi, D. 2010. Empirical evaluation of authorship obfuscation using jgaap. In Proceedings of the 3rd ACM Workshop on Artificial Intelligence and Security (AISec '10). ACM, New York, 14--18.
[12]
Kacmarcik, G. and Gamon, M. 2006. Obfuscating document stylometry to preserve author anonymity. In Proceedings of the COLING/ACL Main Conference Poster Sessions. Association for Computational Linguistics, Morristown, NJ, USA, 444--451.
[13]
Klarreich, E. 2003. Bookish math. Sci. News 164, 25.
[14]
Malyutov, M. 2006. Information transfer and combinatories. In Lecture Notes in Computer Science, vol. 4123. Springer, 3.
[15]
Matthews, R. 1993. Linguistics on trial: Forensic scientists have fiercely condemned a technique used in court to show that confessions have been tampered with. New Sci. 1887.
[16]
McCarthy, C. 2008. The Road. Vintage International. Knopf Doubleday Publishing Group.
[17]
Miller, G. 1995. Wordnet: A lexical database for english. Comm. ACM 38, 39--41.
[18]
Morton, A. and Michaelson, S. 1996. The qsum plot. Internal rep. CSR-3-90.
[19]
Oakes, M. P. 2004. Ant colony optimisation for stylometry: The federalist papers. In Proceedings of the 5th International Conference on Recent Advances in Soft Computing. 86--91.
[20]
Rao, J. R. and Rohatgi, P. 2000. Can pseudonymity really guarantee privacy? In Proceedings of the 9th Conference on USENIX Security Symposium.
[21]
Somers, H. and Tweedie, F. 2003. Authorship attribution and pastiche. Comput. Humanit. 37, 407--429.
[22]
The Institute for Linguistic Evidence. 2008. Mission & philosophy. www.linguisticevidence.org.
[23]
The Tor Project. 2012. Tor metrics portal: Users. https://metrics.torproject.org/users.html.
[24]
The Tor Project. 2012. Who uses tor? https://www.torproject.org/about/torusers.html.en.
[25]
Tweedie, F. J., Singh, S., and Holmes, D. 1996. Neural network applications in stylometry: The federalist papers. Comput. Humanit. 30, 1, 1--10.
[26]
Uzuner, U. and Katz, B. 2005. A comparative study of language models for book and author recognition. In Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). 969.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Information and System Security
ACM Transactions on Information and System Security  Volume 15, Issue 3
November 2012
105 pages
ISSN:1094-9224
EISSN:1557-7406
DOI:10.1145/2382448
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 November 2012
Accepted: 01 July 2012
Revised: 01 May 2012
Received: 01 February 2012
Published in TISSEC Volume 15, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Stylometry
  2. adversarial stylometry
  3. anonymity
  4. authorship recognition
  5. machine learning
  6. privacy
  7. text mining

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)77
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media