research-article

HawkesEye: Detecting Fake Retweeters Using Hawkes Process and Topic Modeling

Authors:

Hridoy Sankar Dutta,

Vishal Raj Dutta,

Aditya Adhikary,

Tanmoy ChakrabortyAuthors Info & Claims

IEEE Transactions on Information Forensics and Security, Volume 15

Pages 2667 - 2678

https://doi.org/10.1109/TIFS.2020.2970601

Published: 01 January 2020 Publication History

Abstract

Retweets are essential to boost the popularity of a tweet, and a large number of fake retweeters can contribute heavily to this aspect. We define a fake retweeter as a Twitter account that retweets spammy tweets, retweets an abnormally large amount of tweets in a short period, or misuses a trending hashtag to promote events irrelevant to the topic of discussion. We introduce an up-to-date, temporally diverse, trend-oriented labeled dataset to address the problem of fake retweeter detection. We develop a novel classifier, called HawkesEye which makes predictions based on a temporal window, in contrast to existing approaches which require a <italic>graph-like</italic> relationship between tweet entities, or the presence of the <italic>entire retweeting timeline</italic> of a retweeter. HawkesEye utilizes both temporal and textual information using a class-specific topic model and Hawkes processes. Experiments on our curated dataset show significant improvement over four state-of-the-art methods, with precision and recall scores of 0.964 and 0.960 on a balanced dataset, respectively – HawkesEye beats the best baseline by 6.16% and 25.98% relative improvement in terms of precision and recall, respectively. We also diagnose our model to understand the advantages and pitfalls of the underlying mechanism. We believe that the extent of this study is not restricted to Twitter, but generalizable to other social media systems such as Facebook and Instagram with similar reposting capabilities.

References

[1]

H. S. Dutta, A. Chetan, B. Joshi, and T. Chakraborty, “Retweet us, we will retweet you: Spotting collusive retweeters involved in blackmarket services,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM), Aug. 2018, pp. 242–249.

[2]

A. Chetan, B. Joshi, H. S. Dutta, and T. Chakraborty, “CoReRank: Ranking to detect users involved in blackmarket-based collusive retweeting activities,” in Proc. 12th ACM Int. Conf. Web Search Data Mining (WSDM), 2019, pp. 330–338.

[3]

R. Ghosh, T. Surachawala, and K. Lerman, “Entropy-based classification of ‘retweeting’ activity on twitter,” 2011, arXiv:1106.0346. [Online]. Available: https://arxiv.org/abs/1106.0346

[4]

A. G. Hawkes, “Spectra of some self-exciting and mutually exciting point processes,” Biometrika, vol. 58, no., pp. 83–90, 1971. [Online]. Available: http://www.jstor.org/stable/2334319

[5]

Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec, “SEISMIC: A self-exciting point process model for predicting tweet popularity,” 2015, arXiv:1506.02594. [Online]. Available: http://arxiv.org/abs/1506.02594

[6]

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944937

[7]

M. Giatsoglou, D. Chatzakou, N. Shah, C. Faloutsos, and A. Vakali, “Retweeting activity on twitter: Signs of deception,” in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining. Cham, Switzerland: Springer, 2015, pp. 122–134.

[8]

K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: Social honeypots+ machine learning,” in Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), 2010, pp. 435–442.

[9]

F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gonçalves, “Detecting spammers and content promoters in online video social networks,” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2009, pp. 620–627.

[10]

N. Chavoshi, H. Hamooni, and A. Mueen, “DeBot: Twitter Bot detection via warped correlation,” in Proc. IEEE 16th Int. Conf. Data Mining (ICDM), Dec. 2016, pp. 817–822.

[11]

C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer, “Botornot: A system to evaluate social bots,” in Proc. 25th Int. Conf. Companion World Wide Web, Int. World Wide Web Conf. Steering Committee, 2016, pp. 273–274.

[12]

M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Detecting suspicious following behavior in multimillion-node social networks,” in Proc. 23rd Int. Conf. World Wide Web, 2014, pp. 305–306.

[13]

K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: An analysis of twitter spam,” in Proc. ACM SIGCOMM Conf. Internet Meas. Conf., 2011, pp. 243–258.

[14]

H. S. Dutta and T. Chakraborty, “Blackmarket-driven collusion among retweeters—Analysis, detection, and characterization,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 1935–1944, 2020.

[15]

S. Dhawan, S. C. R. Gangireddy, S. Kumar, and T. Chakraborty, “Spotting collusive behaviour of online fraud groups in customer reviews,” 2019, arXiv:1905.13649. [Online]. Available: https://arxiv.org/abs/1905.13649

[16]

A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi, “Faking sandy: Characterizing and identifying fake images on twitter during hurricane sandy,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 729–736.

[17]

N. Vo, K. Lee, C. Cao, T. Tran, and H. Choi, “Revealing and detecting malicious retweeter groups,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining, Jul. 2017, pp. 363–368.

[18]

S. Gupta, P. Kumaraguru, and T. Chakraborty, “MalReg: Detecting and analyzing malicious retweeter groups,” in Proc. ACM India Joint Int. Conf. Data Sci. Manage. Data (CoDS-COMAD). New York, NY, USA: ACM, 2019, pp. 61–69. 10.1145/3297001.3297009.

Digital Library

[19]

D. Yuanet al., “Detecting fake accounts in online social networks at the time of registrations,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Nov. 2019, pp. 1423–1438.

[20]

M. Balaanand, N. Karthikeyan, S. Karthik, R. Varatharajan, G. Manogaran, and C. B. Sivaparthipan, “An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter,” J. Supercomput., vol. 75, no. 9, pp. 6085–6105, Sep. 2019.

Digital Library

[21]

E. Van Der Walt and J. Eloff, “Using machine learning to detect fake identities: Bots vs humans,” IEEE Access, vol. 6, pp. 6540–6549, 2018.

[22]

M. M. Swe and N. Nyein Myo, “Fake accounts detection on twitter using blacklist,” in Proc. IEEE/ACIS 17th Int. Conf. Comput. Inf. Sci. (ICIS), Jun. 2018, pp. 562–566.

[23]

A. H. Zadeh and R. Sharda, Hawkes Point Processes for Social Media Analytics. Cham, Switzerland: Springer, 2014.

[24]

J. R. Zipkin, F. P. Schoenberg, K. Coronges, and A. L. Bertozzi, “Point-process models of social network interactions: Parameter estimation and missing data recovery,” Eur. J. Appl. Math, vol. 27, no. 3, pp. 502–529, Jun. 2016.

[25]

S. Gao, J. Ma, and Z. Chen, “Modeling and predicting retweeting dynamics on microblogging platforms,” in Proc. 8th ACM Int. Conf. Web Search Data Mining (WSDM). New York, NY, USA: ACM, 2015, pp. 107–116. 10.1145/2684822.2685303.

Digital Library

[26]

R. Kobayashi and R. Lambiotte, “Tideh: Time-dependent hawkes process for predicting retweet dynamics,” 2016, arXiv:1603.09449. [Online]. Available: http://arxiv.org/abs/1603.09449

[27]

F. Chen and W. H. Tan, “Marked self-exciting point process modelling of information diffusion on Twitter,” Ann. Appl. Stat., vol. 12, no. 4, pp. 2175–2196, Dec. 2018.

[28]

M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song, “COEVOLVE: A joint point process model for information diffusion and network evolution,” J. Mach. Learn. Res., vol. 18, pp. 1–49, Jan. 2017.

[29]

M.-A. Rizoiu, L. Xie, S. Sanner, M. Cebrian, H. Yu, and P. Van Hentenryck, “Expecting to be hip: Hawkes intensity processes for social media popularity,” in Proc. 26th Int. Conf. World Wide Web, Int. World Wide Web Conf. Steering Committee, 2017, pp. 735–744.

[30]

M. Lukasik, P. K. Srijith, D. Vu, K. Bontcheva, A. Zubiaga, and T. Cohn, “Hawkes processes for continuous time sequence classification: An application to rumour stance classification in twitter,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Assoc. Comput. Linguistics, vol. 2, 2016, pp. 393–398. [Online]. Available: http://aclweb.org/anthology/P16-2064

[31]

M.-A. Rizoiu, Y. Lee, and S. Mishra, “A tutorial on hawkes processes for events in social media,” pp. 191–218, Dec. 2017, arXiv:1708.06401. [Online]. Available: https://arxiv.org/abs/1708.06401

[32]

A. Steinskog, J. Therkelsen, and B. Gambäck, “Twitter topic modeling by tweet aggregation,” in Proc. 21st Nordic Conf. Comput. Linguistics, 2017, pp. 77–86.

[33]

J. Weng, E.-P. Lim, J. Jiang, and Q. He, “Twitterrank: Finding topic-sensitive influential twitterers,” in Proc. 3rd ACM Int. Conf. Web Search Data Mining (WSDM). New York, NY, USA: ACM, 2010, pp. 261–270. 10.1145/1718487.1718520.

Digital Library

[34]

H. Mei and J. Eisner, “The neural hawkes process: A neurally self-modulating multivariate point process,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, Long Beach, CA, USA, Dec. 2017, pp. 6754–6764.

[35]

P. J. Laub, T. Taimre, and P. K. Pollett, “Hawkes processes,” 2015, arXiv:1507.02822. [Online]. Available: https://arxiv.org/abs/1507.02822

[36]

T. Ozaki, “Maximum likelihood estimation of Hawkes’ self-exciting point processes,” Ann. Inst. Stat. Math., vol. 31, no. 1, pp. 145–155, Dec. 1979. 10.1007/bf02480272.

[37]

A. H. Wang, “Detecting spam bots in online social networking sites: A machine learning approach,” in Proc. 24th Annu. IFIP WG 11.3 Work. Conf. Data Appl. Secur. Privacy (DBSec). Berlin, Germany: Springer-Verlag, 2010, pp. 335–342. [Online]. Available: http://dl.acm.org/citation.cfm?id=1875947.1875979

[38]

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. 10.1162/neco.1997.9.8.1735.

Digital Library

Cited By

Pushpa Rani KVidyullatha PSrinivas Rao K(2024)An Intelligent Tuned Topic Modelling Questing Answering System as Job AssistantWireless Personal Communications: An International Journal10.1007/s11277-024-11160-w135:3(1761-1782)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s11277-024-11160-w
Dalmasso NZhao RGhassemi MPotluru VBalch TVeloso M(2023)Efficient Event Series Data Modeling via First-Order Constrained OptimizationProceedings of the Fourth ACM International Conference on AI in Finance10.1145/3604237.3626893(463-471)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3604237.3626893
Ellaky ZBenabbou FOuahabi S(2023)Systematic Literature Review of Social Media Bots Detection SystemsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.04.00435:5Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1016/j.jksuci.2023.04.004
Show More Cited By

Index Terms

HawkesEye: Detecting Fake Retweeters Using Hawkes Process and Topic Modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Quantifying Political Leaning from Tweets, Retweets, and Retweeters
The widespread use of online social networks (OSNs) to disseminate information and exchange opinions, by the general public, news media, and political actors alike, has enabled new avenues of research in computational political science. In this paper, we ...
Detecting opinion spams and fake news using text classification

In recent years, deceptive content such as fake news and fake reviews, also known as opinion spams, have increasingly become a dangerous prospect for online users. Fake reviews have affected consumers and stores alike. Furthermore, the problem of fake ...
Fake tweet buster: a webtool to identify users promoting fake news ontwitter
HT '14: Proceedings of the 25th ACM conference on Hypertext and social media

We present the "Fake Tweet Buster" (FTB), a web application that identifies tweets with fake images and users that are consistently uploading and/or promoting fake information on Twitter. To do that we mix three techniques: (i) reverse image searching, (...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Forensics and Security

IEEE Transactions on Information Forensics and Security Volume 15, Issue

2020

2247 pages

ISSN:1556-6013

Issue’s Table of Contents

1556-6013 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Publisher

IEEE Press

Publication History

Published: 01 January 2020

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Pushpa Rani KVidyullatha PSrinivas Rao K(2024)An Intelligent Tuned Topic Modelling Questing Answering System as Job AssistantWireless Personal Communications: An International Journal10.1007/s11277-024-11160-w135:3(1761-1782)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s11277-024-11160-w
Dalmasso NZhao RGhassemi MPotluru VBalch TVeloso M(2023)Efficient Event Series Data Modeling via First-Order Constrained OptimizationProceedings of the Fourth ACM International Conference on AI in Finance10.1145/3604237.3626893(463-471)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3604237.3626893
Ellaky ZBenabbou FOuahabi S(2023)Systematic Literature Review of Social Media Bots Detection SystemsJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2023.04.00435:5Online publication date: 13-Jul-2023
https://dl.acm.org/doi/10.1016/j.jksuci.2023.04.004
Esposito CMoscato VSperlì G(2023)Detecting malicious reviews and users affecting social reviewing systemsComputers and Security10.1016/j.cose.2023.103407133:COnline publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1016/j.cose.2023.103407
Wang DZhang XWan YYu DXu GDeng S(2022)Modeling Sequential Listening Behaviors With Attentive Temporal Point Process for Next and Next New Music RecommendationIEEE Transactions on Multimedia10.1109/TMM.2021.311454524(4170-4182)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TMM.2021.3114545
Sharma SGupta V(2022)Role of twitter user profile features in retweet prediction for big data streamsMultimedia Tools and Applications10.1007/s11042-022-12815-181:19(27309-27338)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s11042-022-12815-1
Gera SSinha A(2022)C-ANN: a deep leaning model for detecting black-marketed colluders in Twitter social networkNeural Computing and Applications10.1007/s00521-021-06756-334:18(15113-15127)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s00521-021-06756-3
Dutta HAggarwal KChakraborty TConlan OHerder E(2021)DECIFE: Detecting Collusive Users Involved in Blackmarket Following Services on TwitterProceedings of the 32nd ACM Conference on Hypertext and Social Media10.1145/3465336.3475108(91-100)Online publication date: 30-Aug-2021
https://dl.acm.org/doi/10.1145/3465336.3475108
Arora UDutta HJoshi BChetan AChakraborty T(2020)Analyzing and Detecting Collusive Users Involved in Blackmarket Retweeting ActivitiesACM Transactions on Intelligent Systems and Technology10.1145/338053711:3(1-24)Online publication date: 18-Apr-2020
https://dl.acm.org/doi/10.1145/3380537

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents