skip to main content
research-article

Entropy-Based Automation Detection on Twitter Using DNA Profiling

Published: 04 November 2023 Publication History

Abstract

Twitter is a popular microblogging-based online social network (OSN) that acts as a platform for users to express themselves and enrich public relationships. Monthly active users in twitter have reached approximately 237.8 million by 2023. With the rise in popularity, there is also a proportional increase in the number of automated accounts. Some bots conduct productive tasks such as posting news and delivering disaster alerts. However, there also exist some bots that are used as vectors to mislead legitimate users by spreading misinformation or distributing malware. Therefore, detecting malicious bots is crucial for maintaining a safe and secure Twitter environment. In this paper, a novel technique to identify bots by analyzing the degree of regularity in user behavior is proposed. Real-time tweets of users are mined and their online behaviors are characterized as DNA sequences. Further, we integrate approximate entropy to assess the degree of regularity in numerically encoded DNA sequences. Accounts with entropy values lower than a fixed threshold represent bots. The outcomes of the experiments conducted in real-time Twitter data demonstrated that the proposed detection technique achieves a precision of 0.9573, recall of 0.9961, F1 score of 0.9609, and accuracy of 0.9563.

References

[1]
Yamaguchi Y, Amagasa T, Kitagawa H. Tag-based user topic discovery using twitter lists. In: 2011 International Conference on Advances in Social Networks Analysis and Mining, IEEE, 2011, pp. 13–20.
[2]
Liu H, Han J, Motoda H. Uncovering deception in social media. Springer; 2014. p. 162.
[4]
Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, and Menczer F Arming the public with artificial intelligence to counter social bots Hum Behav Emerg Technol 2019 1 1 48-61
[5]
Shukla H, Jagtap N, and Patil B Enhanced Twitter bot detection using ensemble machine learning Int Conf Invent Comput Technol (ICICT) 2021
[6]
Himelein-Wachowiak M, Giorgi S, Devoto A, Rahman M, Ungar L, Schwartz HA, Epstein DH, Leggio L, and Curtis B Bots and misinformation spread on social media: implications for COVID-19 J Med Internet Res 2021 23 5
[8]
Kouzy R, Abi Jaoude J, Kraitem A, El-Alam MB, Karam B, Adib EE, Baddour K, et al. Coronavirus goes viral: quantifying the COVID-19 misinformation epidemic on twitter Cureus 2020 12 3 e7255
[9]
Ferrara E What types of COVID-19 conspiracies are populated by Twitter bots? First Monday 2020
[10]
Mehta B, Salmon J, and Ibrahim S Potential shortages of hydroxychloroquine for patients with lupus during the coronavirus disease 2019 pandemic In JAMA Health Forum 2020 1 4
[11]
Loomba S, de Figueiredo A, Piatek SJ, de Graaf K, and Larson HJ Measuring the impact of COVID-19 vaccine misinformation on vaccination intent in the UK and USA Nat Hum Behav 2021 5 3 337-348
[12]
Stella M, Ferrara E, and De Domenico M Bots increase exposure to negative and inflammatory content in online social systems Proc Natl Acad Sci 2018 115 49 12435-12440
[13]
Shao C, Ciampaglia GL, Varol O, Yang KC, Flammini A, and Menczer F The spread of low-credibility content by social bots Nat Commun 2018 9 1 1-9
[14]
Starbird K, Arif A, and Wilson T Disinformation as collaborative work: surfacing the participatory nature of strategic information operations” Proc ACM Hum-Comput Interact 2019 3 1-26
[15]
Cresci S, Lillo F, Regoli D, Tardelli S, Tesconi M. $ FAKE: Evidence of spam and bot activity in stock microblogs on Twitter. In: Twelfth International AAAI Conference on Web and Social Media. 2018.
[16]
Haustein S, Bowman TD, Holmberg K, Tsou A, Sugimoto CR, and Larivière V Tweets as impact indicators: examining the implications of automated “bot” accounts on Twitter J Am Soc Inf Sci 2016 67 1 232-238
[17]
Chu Z, Gianvecchio S, Wang H, and Jajodia S Detecting automation of twitter accounts: are you a human, bot, or cyborg? IEEE Trans Dependable Secure Comput 2012 9 6 811-824
[19]
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. BotOrNot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, 2016, pp. 273–74.
[20]
Rauchfleisch A and Kaiser J The false positive problem of automatic bot detection in social science research PLoS ONE 2020 15 10
[21]
Luceri L, Deb A, Giordano S, and Ferrara E Evolution of bot and human behavior during elections First Monday 2019
[22]
Gorwa R and Guilbeault D Unpacking the Social Media Bot: A Typology to Guide Research and Policy Policy Internet 2020 12 2 225
[23]
Chavoshi N, Hamooni H, Mueen A. Identifying correlated bots in twitter. In: International conference on social informatics. Springer; 2016. p. 14–21.
[24]
Echeverria J, Zhou S. Discovery, retrieval, and analysis of the 'Star Wars' Botnet in Twitter. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2017, pp.1–8.
[25]
Song J, Lee S, and J. Kim J, Spam filtering in twitter using sender receiver relationship International workshop on recent advances in intrusion detection 2011 Berlin Springer 301-307
[26]
Warriner AB, Kuperman V, and Brysbaert M Norms of valence, arousal, and dominance for 13,915 English lemmas Behav Res Methods 2013 45 4 1191-1207
[27]
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M Fame for sale: efficient detection of fake Twitter followers Decis Support Syst 2015 80 56-71
[28]
Li K and Fu Y Prediction of human activity by discovering temporal sequence patterns IEEE Trans Pattern Anal Mach Intell 2014 36 8 1644-1657
[29]
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M. The paradigm-shift of social spambots: evidence, theories, and tools for the arms race. In: WWW’17 companion. ACM; 2023.
[30]
R. Agrawal and R. Srikant, “Mining sequential patterns,” in ICDE’95. IEEE, pp. 3–14, 1995.
[31]
Arnold M and Ohlebusch E Linear time algorithms for generalizations of the longest common substring problem Algorithmica 2011 60 4 806-818
[32]
Wang D and Tapan S A robust elicitation algorithm for discovering DNA motifs using fuzzy self-organizing maps IEEE Trans Neural Netw Learn Syst 2013 24 10 1677-1688
[33]
Kvålseth TO On the measurement of randomness (uncertainty): a more informative entropy Entropy 2016 18 5 159
[34]
Holzinger A, Hörtenhuber M, Mayer C, Bachler M, Wassertheurer S, Pinho AJ, and Koslicki D On entropy-based data mining Interactive knowledge discovery and data mining in biomedical informatics 2014 Berlin Springer 209-226
[35]
Kabakus AT and Kara R A survey of spam detection methods on twitter Int J Adv Comput Sci Appl 2017 8 3 29-38
[36]
Latah M Detection of malicious social bots: a survey and a refined taxonomy Expert Syst Appl 2020 151
[37]
Cui P, Liu H, Aggarwal C, and Wang F Online behavioral analysis and modeling (guest editorial) IEEE Intell Syst 2016 31 1 2-4
[38]
Bucur D Gender homophily in online book networks Inf Sci 2019 481 229-243
[39]
Liu S, Wang S, and Zhu F Structured learning from heterogeneous behavior for social identity linkage IEEE Trans Knowl Data Eng 2015 27 7 2005-2019
[40]
Chou C-K and Chen M-S Learning multiple factors-aware Diffusion models in social networks IEEE Trans Knowl Data Eng 2018 30 7 1268-1281
[41]
Kudugunta S and Ferrara E Deep neural networks for bot detection Inf Sci 2018 467 312-322
[42]
Jeong J, Moon S. Interval signature: persistence and distinctiveness of inter-event time distributions in online human behavior. In: WWW’17 companion. ACM; 2017. p. 1585–93.
[43]
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M DNA-inspired online behavioral modeling and its application to spambot detection IEEE Intell Syst 2016 31 5 58-64
[44]
Cresci S, Pietro RD, Petrocchi M, Spognardi A, and Tesconi M Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling IEEE Trans Dependable Secure Comput 2018 15 4 561-576
[45]
Cresci S, Petrocchi M, Spognardi A, and Tognazzi S On the capability of evolved spambots to evade detection via genetic engineering Online Soc Netw Media 2019 9 1-16
[46]
Cresci S, di Pietro R, Petrocchi M, Spognardi A, and Tesconi M Exploiting digital DNA for the analysis of similarities in twitter behaviours IEEE Int Conf Data Sci Adv Anal (DSAA) 2017
[47]
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, and Tesconi M Emergent properties, models, and laws of behavioral similarities within groups of twitter users Comput Commun 2020 150 47-61
[48]
Gianvecchio S, Xie M, Wu Z, Wang H. Measurement and classification of humans and bots in internet chat. In: USENIX Security Symposium, 2008, pp. 155–170.
[49]
Gianvecchio S, Xie M, Wu Z, and Wang H Humans and bots in internet chat: measurement, analysis, and automated classification IEEE/ACM Trans Netw 2011 19 5 1557-1571
[50]
Ghosh R, Surachawala T, Lerman K. Entropy-based classification of 'retweeting' activity on twitter. 2011. arXiv preprint arXiv:1106.0346.
[51]
Bereziński P, Jasiul B, and Szpyrka M An entropy-based network anomaly detection method Entropy 2015 17 4 2367-2408
[52]
Bhuvaneswari A and Valliyammai C Information entropy based event detection during disaster in cyber-social networks J Intell Fuzzy Syst 2019 36 5 3981-3992
[53]
Perdana RS, Muliawati TH, and Alexandro R Bot spammer detection in Twitter using tweet similarity and time interval entropy Jurnal Ilmu Komputer dan Informasi 2015 8 1 19-25
[54]
Rout RR, Lingam G, and Somayajulu DV Detection of malicious social bots using learning automata with url features in twitter network IEEE Trans Comput Soc Syst 2020 7 4 1004-1018
[55]
Jin X, Lin CX, Luo J, and Han J Socialspamguard: a data mining-based spam detection system for social media networks Proc VLDB Endow 2011 4 12 1458-1461
[56]
Dougherty ER, Huang Y, Kim S, Cai X, and Yamaguchi R Genomic signal processing Curr Genom 2009 10 6 364
[57]
Kumar MR and Vaegae NK A new numerical approach for DNA representation using modified Gabor wavelet transform for the identification of protein coding regions Biocybern Biomed Eng 2020 40 2 836-848
[58]
Vinga S and Almeida JS Rényi continuous entropy of DNA sequences J Theor Biol 2004 231 3 377-388
[59]
Aljohani NR, Fayoumi A, and Hassan SU Bot prediction on social networks of Twitter in altmetrics using deep graph convolutional networks Soft Comput 2020 24 11109
[60]
Twitter Dev. Developer Agreement and Policy. Twitter Incorporated. 2020. https://developer.twitter.com/en/developerterms/agreement-and-policy. Accessed 15 Nov 2020.
[61]
Samper-Escalante LD, Loyola-González O, Monroy R, and Medina-Pérez MA Bot datasets on twitter: analysis and challenges Appl Sci 2021 11 9 4105
[62]
Firdaus SN, Ding C, and Sadeghian A Retweet: a popular information diffusion mechanism–a survey paper Online Soc Netw Media 2018 6 26-40
[63]
Wang G, Mohanlal M, Wilson C, Wang X, Metzger M, Zheng H, Zhao BY. Social turing tests: Crowdsourcing sybil detection. 2012. arXiv preprint arXiv:1205.3856.
[64]
Avvenuti M, Bellomo S, Cresci S, La Polla MN, Tesconi M. Hybrid crowdsensing: A novel paradigm to combine the strengths of opportunistic and participatory crowdsensing. In: Proceedings of the 26th International Conference on World Wide Web companion, 2017, pp. 1413–21.
[65]
Chernick MR and LaBudde RA An introduction to bootstrap methods with applications to R 2014 Cham John Wiley & Sons
[66]
Chen X, Solomon IC, Chon KH. Comparison of the use of approximate entropy and sample entropy: applications to neural respiratory signal. In: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference 2006, IEEE, 2005, pp. 4212–5.
[67]
Pincus S Approximate entropy (ApEn) as a complexity measure Chaos 1995 5 1 110-117
[68]
Pincus SM Approximate entropy as a measure of system complexity Proc Natl Acad Sci 1991 88 6 2297-2301
[69]
Richman JS and Moorman JR Physiological time-series analysis using approximate entropy and sample entropy Am J Physiol-Heart Circ Physiol 2000
[70]
Costa M, Goldberger AL, and Peng CK Multiscale entropy analysis of biological signals Phys Rev E 2005 71 2
[71]
Pincus SM and Huang WM Approximate entropy: statistical properties and applications Commun Stat Theory Methods 1992 21 11 3061-3077
[72]
Gilmary R, Venkatesan A, Vaiyapuri G, and Balamurali D DNA-influenced automated behavior detection on twitter through relative entropy Sci Rep 2022 16 1 8022
[73]
Keller TR and Klinger U Social bots in election campaigns: Theoretical, empirical, and methodological implications Polit Commun 2019 36 1 171-189
[74]
Vosoughi S, Roy D, and Aral S The spread of true and false news online Science 2018 359 6380 1146-1151
[75]
Gilmary R, Venkatesan A, and Vaiyapuri G Discovering social bots on Twitter: a thematic review Int J Internet Technol Secured Trans 2021 11 4 369-395
[76]
Tyagi R, Paul T, Manoj BS, Thanudas B. A novel HTTP botnet traffic detection method. In: 2015 Annual IEEE India Conference (INDICON), 2015, pp. 1–6.
[77]
Paul T, Tyagi R, Manoj BS, Thanudas B. Fast-flux botnet detection from network traffic. In: 2014 Annual IEEE India Conference (INDICON), 2014, pp. 1–6.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SN Computer Science
SN Computer Science  Volume 4, Issue 6
Oct 2023
2230 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 04 November 2023
Accepted: 09 September 2023
Received: 03 December 2021

Author Tags

  1. Bot detection
  2. Entropy
  3. Online social networks (OSNs)
  4. Twitter
  5. User behavior

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media