Corerank: Ranking to detect users involved in blackmarket-based collusive retweeting activities

A Chetan, B Joshi, HS Dutta… - Proceedings of the Twelfth …, 2019 - dl.acm.org
Proceedings of the Twelfth ACM International Conference on Web Search and …, 2019dl.acm.org
Twitter's popularity has fostered the emergence of various illegal user activities-one such
activity is to artificially bolster visibility of tweets by gaining large number of retweets within a
short time span. The natural way to gain visibility is time-consuming. Therefore, users who
want their tweets to get quick visibility try to explore shortcuts-one such shortcut is to
approach the blackmarket services, and gain retweets for their own tweets by retweeting
other customers' tweets. Thus the users intrinsically become a part of a collusive ecosystem …
Twitter's popularity has fostered the emergence of various illegal user activities - one such activity is to artificially bolster visibility of tweets by gaining large number of retweets within a short time span. The natural way to gain visibility is time-consuming. Therefore, users who want their tweets to get quick visibility try to explore shortcuts - one such shortcut is to approach the blackmarket services, and gain retweets for their own tweets by retweeting other customers' tweets. Thus the users intrinsically become a part of a collusive ecosystem controlled by these services. In this paper, we propose CoReRank, an unsupervised framework to detect collusive users (who are involved in producing artificial retweets), and suspicious tweets (which are submitted to the blackmarket services) simultaneously. CoReRank leverages the retweeting (or quoting) patterns of users, and measures two scores - the 'credibility' of a user and the 'merit' of a tweet. We propose a set of axioms to derive the interdependency between these two scores, and update them in a recursive manner. The formulation is further extended to handle the cold start problem. CoReRank is guaranteed to converge in a finite number of iterations and has linear time complexity. We also propose a semi-supervised version of CoReRank (called CoReRank+) which leverages a partial ground-truth labeling of users and tweets. Extensive experiments are conducted to show the superiority of CoReRank compared to six baselines on a novel dataset we collected and annotated. CoReRank beats the best unsupervised baseline method by 269% (20%) (relative) average precision and 300% (22.22%) (relative) average recall in detecting collusive (genuine) users. CoReRank+ beats the best supervised baseline method by 33.18% AUC. CoReRank also detects suspicious tweets with 0.85 (0.60) average precision (recall). To our knowledge, CoReRank is the first unsupervised method to detect collusive users and suspicious tweets simultaneously with theoretical guarantees.
ACM Digital Library