HawkesEye: Detecting fake retweeters using Hawkes process and topic modeling

HS Dutta, VR Dutta, A Adhikary… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
HS Dutta, VR Dutta, A Adhikary, T Chakraborty
IEEE Transactions on Information Forensics and Security, 2020ieeexplore.ieee.org
Retweets are essential to boost the popularity of a tweet, and a large number of fake
retweeters can contribute heavily to this aspect. We define a fake retweeter as a Twitter
account that retweets spammy tweets, retweets an abnormally large amount of tweets in a
short period, or misuses a trending hashtag to promote events irrelevant to the topic of
discussion. We introduce an up-to-date, temporally diverse, trend-oriented labeled dataset
to address the problem of fake retweeter detection. We develop a novel classifier, called …
Retweets are essential to boost the popularity of a tweet, and a large number of fake retweeters can contribute heavily to this aspect. We define a fake retweeter as a Twitter account that retweets spammy tweets, retweets an abnormally large amount of tweets in a short period, or misuses a trending hashtag to promote events irrelevant to the topic of discussion. We introduce an up-to-date, temporally diverse, trend-oriented labeled dataset to address the problem of fake retweeter detection. We develop a novel classifier, called HawkesEye which makes predictions based on a temporal window, in contrast to existing approaches which require a graph-like relationship between tweet entities, or the presence of the entire retweeting timeline of a retweeter. HawkesEye utilizes both temporal and textual information using a class-specific topic model and Hawkes processes. Experiments on our curated dataset show significant improvement over four state-of-the-art methods, with precision and recall scores of 0.964 and 0.960 on a balanced dataset, respectively – HawkesEye beats the best baseline by 6.16% and 25.98% relative improvement in terms of precision and recall, respectively. We also diagnose our model to understand the advantages and pitfalls of the underlying mechanism. We believe that the extent of this study is not restricted to Twitter, but generalizable to other social media systems such as Facebook and Instagram with similar reposting capabilities.
ieeexplore.ieee.org