skip to main content
research-article

E-mail authorship attribution using customized associative classification

Published: 01 August 2015 Publication History

Abstract

E-mail communication is often abused for conducting social engineering attacks including spamming, phishing, identity theft and for distributing malware. This is largely attributed to the problem of anonymity inherent in the standard electronic mail protocol. In the literature, authorship attribution is studied as a text categorization problem where the writing styles of individuals are modeled based on their previously written sample documents. The developed model is employed to identify the most plausible writer of the text. Unfortunately, most existing studies focus solely on improving predictive accuracy and not on the inherent value of the evidence collected. In this study, we propose a customized associative classification technique, a popular data mining method, to address the authorship attribution problem. Our approach models the unique writing style features of a person, measures the associativity of these features and produces an intuitive classifier. The results obtained by conducting experiments on a real dataset reveal that the presented method is very effective.

References

[1]
A. Abbasi, H. Chen, Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace, ACM Trans Inf Syst, 26 (2008) 1-29.
[2]
R. Agrawal, R. Srikant, Mining sequential patterns, in: Proc. of the 11th International Conference on Data Engineering (ICDE), 1995, pp. 3-14.
[3]
R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases, in: Proc. of the ACM SIGMOD International Conference on Management of Data, 1993, pp. 207-216.
[4]
B. Allison, L. Guthrie, Authorship attribution of e-mail: comparing classifiers over a new corpus for evaluation, in: Proceedings of the sixth International Conference on Language Resources and Evaluation (LREC'08), 2008.
[5]
F. Coenen, G. Goulbourne, P. Leng, Tree structures for mining association rules, Data Min Knowl Discov, 8 (2004) 25-51.
[6]
M. Corney, O. de Vel, A. Anderson, G. Mohay, Gender-preferential text mining of e-mail discourse, in: Proc. Of the 18th Annual Computer Security Applications Conference (ACSAC), 2002, pp. 282.
[7]
O. de Vel, Mining e-mail authorship, KDD (August 2000).
[8]
O. de Vel, A. Anderson, M. Corney, G. Mohay, Multi-topic e-mail authorship attribution forensics, in: Proc. Of ACM Conference on Computer Security - workshop on data mining for security applications, 2001.
[9]
O. de Vel, A. Anderson, M. Corney, G. Mohay, Mining e-mail content for author identification forensics, SIGMOD Rec, 30 (2001) 55-64.
[10]
S.H.H. Ding, B.C.M. Fung, M. Debbabi, A visualizable evidence-driven approach for authorship attribution, ACM Trans Inf Syst Secur (TISSEC), 17 (March 2015) 12:1-12:30.
[11]
C. Fachkha, E. Bou-Harb, A. Boukhtouta, S. Dinh, F. Iqbal, M. Debbabi, Investigating the dark cyberspace: profiling, threat-based analysis and correlation, in: Proceedings of the 2012 7th International Conference on Risks and Security of Internet and Systems (CRiSIS), CRISIS 12, IEEE Computer Society, Washington, DC, USA, 2012, pp. 1-8.
[12]
N. Friedman, D. Geiger, M. Goldszmidt, Bayesian network classifiers, Mach Learn, 29 (1977) 131-163.
[13]
J. Grieve, Quantitative authorship attribution: an evaluation of techniques, Lit Linguist Comput, 22 (July 2007).
[14]
J. Han, X. Yin, Cpar: classification based on predictive association rules, in: Proc. of the third society for industrial and applied mathematics. society for industrial and applied mathematics, 2003.
[15]
J. Han, M. Kamber, J. Pei, Data mining: concepts and techniques (The Morgan Kaufmann series in data management systems), 2006.
[16]
J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, SIGMOD'00 (May 2000) 1-12.
[17]
F. Iqbal, H. Binsalleeh, B.C.M. Fung, M. Debbabi, Mining writeprints from anonymous e-mails for forensic investigation, Digit Investig (2010) 1-9.
[18]
F. Iqbal, L.A. Khan, B.C.M. Fung, M. Debbabi, E-mail authorship verification for forensic investigation, in: Proc. of the 25th ACM SIGAPP symposium on applied computing (SAC), ACM Press, Sierre, Switzerland, March 2010, pp. 1591-1598.
[19]
F. Iqbal, H. Binsalleeh, B.C.M. Fung, M. Debbabi, A unified data mining solution for authorship analysis in anonymous textual communications, Inf Sci Special Issue Data Min Inf Secur, 231 (May 2013) 98-112.
[20]
Moshe Koppel and Jonathan Schler. Authorship verification as a one-class classification problem.
[21]
G.R. Ledger, T.V.N. Merriam, Shakespeare, Fletcher, and the two Noble Kinsmen, Lit Linguist Comput, 9 (1994) 235-248.
[22]
W. Li, Classification based on multiple association rules, April 2001.
[23]
W. Li, J. Han, J. Pei, CMAR: accurate and efficient classification based on multiple class-association rules, in: In proc. of ICDM, 2001.
[24]
R.P. Lippmann, An introduction to computing with neural networks, IEEE Acoust Speech Signal Process Mag, 4 (1987) 4-22.
[25]
B. Liu, W. Hsu, Y. Ma, Integrating classification and association rule mining, KDD (August 1998) 80-86.
[26]
T.C. Mendenhall, The characteristic curves of composition, Science, 11 (1887) 237-249.
[27]
F. Mosteller, D.L. Wallace, Applied Bayesian and classical inference: the case of the Federalist papers, Springer-Verlag, New York, 1964.
[28]
J. Pearl, Bayesian networks: a model of self-activated memory for evidential reasoning, in: Proc. of the 7th Conference of the cognitive science society, 1985, pp. 329-334.
[29]
J.R. Quinlan, Induction of decision trees, Mach Learn, 1 (1986) 81-106.
[30]
E. Stamatatos, A survey of modern authorship attribution methods, J Am Soc Inf Sci Technol (JASIST), 60 (March 2009) 538-556.
[31]
G. Teng, M. Lai, J. Ma, Y. Li, E-mail authorship mining based on svm for computer forensic, in: Proc. of the 3rd International Conference on Machine Learning and Cyhemetics, August 2004.
[32]
F. Thabtah, P. Cowling, Y. Peng, Mcar: multi-class classification based on association rule, in: ACS/IEEE 2005 International Conference on Computer Systems and Applications, 2005, pp. 33.
[33]
R. Zheng, Y. Qin, Z. Huang, H. Chen, Authorship analysis in cybercrime investigation, in: Proc. of the NSF/NIJ symposium on intelligence and security informatics (ISI), 2003, pp. 59-73.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Digital Investigation: The International Journal of Digital Forensics & Incident Response
Digital Investigation: The International Journal of Digital Forensics & Incident Response  Volume 14, Issue S1
August 2015
164 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 August 2015

Author Tags

  1. Anonymity
  2. Associative classification
  3. Authorship
  4. Crime investigation
  5. Data mining
  6. Rule mining
  7. Writeprint

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media