skip to main content
research-article

HawkesEye: Detecting Fake Retweeters Using Hawkes Process and Topic Modeling

Published: 01 January 2020 Publication History

Abstract

Retweets are essential to boost the popularity of a tweet, and a large number of fake retweeters can contribute heavily to this aspect. We define a fake retweeter as a Twitter account that retweets spammy tweets, retweets an abnormally large amount of tweets in a short period, or misuses a trending hashtag to promote events irrelevant to the topic of discussion. We introduce an up-to-date, temporally diverse, trend-oriented labeled dataset to address the problem of fake retweeter detection. We develop a novel classifier, called HawkesEye which makes predictions based on a temporal window, in contrast to existing approaches which require a <italic>graph-like</italic> relationship between tweet entities, or the presence of the <italic>entire retweeting timeline</italic> of a retweeter. HawkesEye utilizes both temporal and textual information using a class-specific topic model and Hawkes processes. Experiments on our curated dataset show significant improvement over four state-of-the-art methods, with precision and recall scores of 0.964 and 0.960 on a balanced dataset, respectively &#x2013; HawkesEye beats the best baseline by 6.16&#x0025; and 25.98&#x0025; relative improvement in terms of precision and recall, respectively. We also diagnose our model to understand the advantages and pitfalls of the underlying mechanism. We believe that the extent of this study is not restricted to Twitter, but generalizable to other social media systems such as Facebook and Instagram with similar reposting capabilities.

References

[1]
H. S. Dutta, A. Chetan, B. Joshi, and T. Chakraborty, “Retweet us, we will retweet you: Spotting collusive retweeters involved in blackmarket services,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining (ASONAM), Aug. 2018, pp. 242–249.
[2]
A. Chetan, B. Joshi, H. S. Dutta, and T. Chakraborty, “CoReRank: Ranking to detect users involved in blackmarket-based collusive retweeting activities,” in Proc. 12th ACM Int. Conf. Web Search Data Mining (WSDM), 2019, pp. 330–338.
[3]
R. Ghosh, T. Surachawala, and K. Lerman, “Entropy-based classification of ‘retweeting’ activity on twitter,” 2011, arXiv:1106.0346. [Online]. Available: https://arxiv.org/abs/1106.0346
[4]
A. G. Hawkes, “Spectra of some self-exciting and mutually exciting point processes,” Biometrika, vol. 58, no., pp. 83–90, 1971. [Online]. Available: http://www.jstor.org/stable/2334319
[5]
Q. Zhao, M. A. Erdogdu, H. Y. He, A. Rajaraman, and J. Leskovec, “SEISMIC: A self-exciting point process model for predicting tweet popularity,” 2015, arXiv:1506.02594. [Online]. Available: http://arxiv.org/abs/1506.02594
[6]
D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003. [Online]. Available: http://dl.acm.org/citation.cfm?id=944919.944937
[7]
M. Giatsoglou, D. Chatzakou, N. Shah, C. Faloutsos, and A. Vakali, “Retweeting activity on twitter: Signs of deception,” in Proc. Pacific-Asia Conf. Knowl. Discovery Data Mining. Cham, Switzerland: Springer, 2015, pp. 122–134.
[8]
K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: Social honeypots+ machine learning,” in Proc. 33rd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr. (SIGIR), 2010, pp. 435–442.
[9]
F. Benevenuto, T. Rodrigues, V. Almeida, J. Almeida, and M. Gonçalves, “Detecting spammers and content promoters in online video social networks,” in Proc. 32nd Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2009, pp. 620–627.
[10]
N. Chavoshi, H. Hamooni, and A. Mueen, “DeBot: Twitter Bot detection via warped correlation,” in Proc. IEEE 16th Int. Conf. Data Mining (ICDM), Dec. 2016, pp. 817–822.
[11]
C. A. Davis, O. Varol, E. Ferrara, A. Flammini, and F. Menczer, “Botornot: A system to evaluate social bots,” in Proc. 25th Int. Conf. Companion World Wide Web, Int. World Wide Web Conf. Steering Committee, 2016, pp. 273–274.
[12]
M. Jiang, P. Cui, A. Beutel, C. Faloutsos, and S. Yang, “Detecting suspicious following behavior in multimillion-node social networks,” in Proc. 23rd Int. Conf. World Wide Web, 2014, pp. 305–306.
[13]
K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: An analysis of twitter spam,” in Proc. ACM SIGCOMM Conf. Internet Meas. Conf., 2011, pp. 243–258.
[14]
H. S. Dutta and T. Chakraborty, “Blackmarket-driven collusion among retweeters—Analysis, detection, and characterization,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 1935–1944, 2020.
[15]
S. Dhawan, S. C. R. Gangireddy, S. Kumar, and T. Chakraborty, “Spotting collusive behaviour of online fraud groups in customer reviews,” 2019, arXiv:1905.13649. [Online]. Available: https://arxiv.org/abs/1905.13649
[16]
A. Gupta, H. Lamba, P. Kumaraguru, and A. Joshi, “Faking sandy: Characterizing and identifying fake images on twitter during hurricane sandy,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 729–736.
[17]
N. Vo, K. Lee, C. Cao, T. Tran, and H. Choi, “Revealing and detecting malicious retweeter groups,” in Proc. IEEE/ACM Int. Conf. Adv. Social Netw. Anal. Mining, Jul. 2017, pp. 363–368.
[18]
S. Gupta, P. Kumaraguru, and T. Chakraborty, “MalReg: Detecting and analyzing malicious retweeter groups,” in Proc. ACM India Joint Int. Conf. Data Sci. Manage. Data (CoDS-COMAD). New York, NY, USA: ACM, 2019, pp. 61–69. 10.1145/3297001.3297009.
[19]
D. Yuanet al., “Detecting fake accounts in online social networks at the time of registrations,” in Proc. ACM SIGSAC Conf. Comput. Commun. Secur., Nov. 2019, pp. 1423–1438.
[20]
M. Balaanand, N. Karthikeyan, S. Karthik, R. Varatharajan, G. Manogaran, and C. B. Sivaparthipan, “An enhanced graph-based semi-supervised learning algorithm to detect fake users on Twitter,” J. Supercomput., vol. 75, no. 9, pp. 6085–6105, Sep. 2019.
[21]
E. Van Der Walt and J. Eloff, “Using machine learning to detect fake identities: Bots vs humans,” IEEE Access, vol. 6, pp. 6540–6549, 2018.
[22]
M. M. Swe and N. Nyein Myo, “Fake accounts detection on twitter using blacklist,” in Proc. IEEE/ACIS 17th Int. Conf. Comput. Inf. Sci. (ICIS), Jun. 2018, pp. 562–566.
[23]
A. H. Zadeh and R. Sharda, Hawkes Point Processes for Social Media Analytics. Cham, Switzerland: Springer, 2014.
[24]
J. R. Zipkin, F. P. Schoenberg, K. Coronges, and A. L. Bertozzi, “Point-process models of social network interactions: Parameter estimation and missing data recovery,” Eur. J. Appl. Math, vol. 27, no. 3, pp. 502–529, Jun. 2016.
[25]
S. Gao, J. Ma, and Z. Chen, “Modeling and predicting retweeting dynamics on microblogging platforms,” in Proc. 8th ACM Int. Conf. Web Search Data Mining (WSDM). New York, NY, USA: ACM, 2015, pp. 107–116. 10.1145/2684822.2685303.
[26]
R. Kobayashi and R. Lambiotte, “Tideh: Time-dependent hawkes process for predicting retweet dynamics,” 2016, arXiv:1603.09449. [Online]. Available: http://arxiv.org/abs/1603.09449
[27]
F. Chen and W. H. Tan, “Marked self-exciting point process modelling of information diffusion on Twitter,” Ann. Appl. Stat., vol. 12, no. 4, pp. 2175–2196, Dec. 2018.
[28]
M. Farajtabar, Y. Wang, M. Gomez-Rodriguez, S. Li, H. Zha, and L. Song, “COEVOLVE: A joint point process model for information diffusion and network evolution,” J. Mach. Learn. Res., vol. 18, pp. 1–49, Jan. 2017.
[29]
M.-A. Rizoiu, L. Xie, S. Sanner, M. Cebrian, H. Yu, and P. Van Hentenryck, “Expecting to be hip: Hawkes intensity processes for social media popularity,” in Proc. 26th Int. Conf. World Wide Web, Int. World Wide Web Conf. Steering Committee, 2017, pp. 735–744.
[30]
M. Lukasik, P. K. Srijith, D. Vu, K. Bontcheva, A. Zubiaga, and T. Cohn, “Hawkes processes for continuous time sequence classification: An application to rumour stance classification in twitter,” in Proc. 54th Annu. Meeting Assoc. Comput. Linguistics, Assoc. Comput. Linguistics, vol. 2, 2016, pp. 393–398. [Online]. Available: http://aclweb.org/anthology/P16-2064
[31]
M.-A. Rizoiu, Y. Lee, and S. Mishra, “A tutorial on hawkes processes for events in social media,” pp. 191–218, Dec. 2017, arXiv:1708.06401. [Online]. Available: https://arxiv.org/abs/1708.06401
[32]
A. Steinskog, J. Therkelsen, and B. Gambäck, “Twitter topic modeling by tweet aggregation,” in Proc. 21st Nordic Conf. Comput. Linguistics, 2017, pp. 77–86.
[33]
J. Weng, E.-P. Lim, J. Jiang, and Q. He, “Twitterrank: Finding topic-sensitive influential twitterers,” in Proc. 3rd ACM Int. Conf. Web Search Data Mining (WSDM). New York, NY, USA: ACM, 2010, pp. 261–270. 10.1145/1718487.1718520.
[34]
H. Mei and J. Eisner, “The neural hawkes process: A neurally self-modulating multivariate point process,” in Proc. Adv. Neural Inf. Process. Syst., vol. 30, Long Beach, CA, USA, Dec. 2017, pp. 6754–6764.
[35]
P. J. Laub, T. Taimre, and P. K. Pollett, “Hawkes processes,” 2015, arXiv:1507.02822. [Online]. Available: https://arxiv.org/abs/1507.02822
[36]
T. Ozaki, “Maximum likelihood estimation of Hawkes’ self-exciting point processes,” Ann. Inst. Stat. Math., vol. 31, no. 1, pp. 145–155, Dec. 1979. 10.1007/bf02480272.
[37]
A. H. Wang, “Detecting spam bots in online social networking sites: A machine learning approach,” in Proc. 24th Annu. IFIP WG 11.3 Work. Conf. Data Appl. Secur. Privacy (DBSec). Berlin, Germany: Springer-Verlag, 2010, pp. 335–342. [Online]. Available: http://dl.acm.org/citation.cfm?id=1875947.1875979
[38]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. 10.1162/neco.1997.9.8.1735.

Cited By

View all

Index Terms

  1. HawkesEye: Detecting Fake Retweeters Using Hawkes Process and Topic Modeling
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image IEEE Transactions on Information Forensics and Security
          IEEE Transactions on Information Forensics and Security  Volume 15, Issue
          2020
          2247 pages

          Publisher

          IEEE Press

          Publication History

          Published: 01 January 2020

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 12 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media