skip to main content
research-article

Analyzing and Detecting Collusive Users Involved in Blackmarket Retweeting Activities

Published: 18 April 2020 Publication History

Abstract

With the rise in popularity of social media platforms like Twitter, having higher influence on these platforms has a greater value attached to it, since it has the power to influence many decisions in the form of brand promotions and shaping opinions. However, blackmarket services that allow users to inorganically gain influence are a threat to the credibility of these social networking platforms. Twitter users can gain inorganic appraisals in the form of likes, retweets, and follows through these blackmarket services either by paying for them or by joining syndicates wherein they gain such appraisals by providing similar appraisals to other users. These customers tend to exhibit a mix of organic and inorganic retweeting behavior, making it tougher to detect them.
In this article, we investigate these blackmarket customers engaged in collusive retweeting activities. We collect and annotate a novel dataset containing various types of information about blackmarket customers and use these sources of information to construct multiple user representations. We adopt Weighted Generalized Canonical Correlation Analysis (WGCCA) to combine these individual representations to derive user embeddings that allow us to effectively classify users as: genuine users, bots, promotional customers, and normal customers. Our method significantly outperforms state-of-the-art approaches (32.95% better macro F1-score than the best baseline).

References

[1]
Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao. 2011. Analyzing user modeling on Twitter for personalized news recommendations. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization. Springer, 1--12.
[2]
Anupama Aggarwal and Ponnurangam Kumaraguru. 2014. Followers or phantoms? An anatomy of purchased Twitter followers. arXiv preprint arXiv:1408.1534 (2014).
[3]
Udit Arora, William Scott Paka, and Tanmoy Chakraborty. 2019. Multitask learning for black-market tweet detection. arXiv preprint arXiv:1907.04072 (2019).
[4]
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida. 2010. Detecting spammers on Twitter. In Proceedings of the Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS’10), Vol. 6. 12.
[5]
Adrian Benton, Raman Arora, and Mark Dredze. 2016. Learning multiview embeddings of Twitter users. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 14--19.
[6]
Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Christos Faloutsos. 2013. Copycatch: Stopping group attacks by spotting lockstep behavior in social networks. In Proceedings of the 22nd International Conference on World Wide Web. ACM, 119--130.
[7]
J. Douglas Carroll. 1968. Generalization of canonical correlation analysis to three or more sets of variables. In Proceedings of the 76th Convention of the American Psychological Association, Vol. 3. 227--228.
[8]
Jacopo Castellini, Valentina Poggioni, and Giulia Sorbi. 2017. Fake Twitter followers detection by denoising autoencoder. In Proceedings of the International Conference on Web Intelligence. ACM, 195--202.
[9]
Nikan Chavoshi, Hossein Hamooni, and Abdullah Mueen. 2016. DeBot: Twitter bot detection via warped correlation. In Proceedings of the IEEE International Conference on Data Mining (ICDM’16). 817--822.
[10]
Le Chen, Alan Mislove, and Christo Wilson. 2016. An empirical analysis of algorithmic pricing on Amazon Marketplace. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1339--1349.
[11]
Liang Chen, Yipeng Zhou, and Dah Ming Chiu. 2015. Analysis and detection of fake views in online video services. ACM Trans. Multim. Comput. Commun. Applic. 11, 2s (2015), 44.
[12]
Aditya Chetan, Brihi Joshi, Hridoy Sankar Dutta, and Tanmoy Chakraborty. 2019. CoReRank: Ranking to detect users involved in black-market-based collusive retweeting activities. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining. ACM, 330--338.
[13]
Zi Chu, Steven Gianvecchio, Haining Wang, and Sushil Jajodia. 2010. Who is tweeting on Twitter: Human, bot, or cyborg? In Proceedings of the 26th Computer Security Applications Conference. ACM, 21--30.
[14]
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: Efficient detection of fake Twitter followers. Dec. Supp. Syst. 80 (2015), 56--71.
[15]
Clayton Allen Davis, Onur Varol, Emilio Ferrara, Alessandro Flammini, and Filippo Menczer. 2016. Botornot: A system to evaluate social bots. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 273--274.
[16]
Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, and Tanmoy Chakraborty. 2019. Spotting collusive behaviour of online fraud groups in customer reviews. arXiv preprint arXiv:1905.13649 (2019).
[17]
Bhuwan Dhingra, Zhong Zhou, Dylan Fitzpatrick, Michael Muehl, and William Cohen. 2016. Tweet2Vec: Character-based distributed representations for social media. In Proceedings of the 54th Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 269--274.
[18]
John P. Dickerson, Vadim Kagan, and V. S. Subrahmanian. 2014. Using sentiment to detect bots on Twitter: Are humans more opinionated than bots? In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press, 620--627.
[19]
Tao Ding, Warren K. Bickel, and Shimei Pan. 2017. Multi-view unsupervised user feature embedding for social media-based substance use prediction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2275--2284.
[20]
H. S. Dutta and T. Chakraborty. 2020. Black-market-driven collusion among retweeters—Analysis, detection, and characterization. IEEE Trans. Inf. Forens. Sec. 15 (2020), 1935--1944.
[21]
Hridoy Sankar Dutta, Aditya Chetan, Brihi Joshi, and Tanmoy Chakraborty. 2018. Retweet us, we will retweet you: Spotting collusive retweeters involved in black-market services. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM’18). 242--249.
[22]
H. S. Dutta, V. R. Dutta, A. Adhikary, and T. Chakraborty. 2020. HawkesEye: Detecting fake retweeters using Hawkes process and topic modeling. IEEE Trans. Inf. Forens. Sec. (Jan. 30, 2020).
[23]
Ahmed El Azab. 2016. Fake accounts detection in Twitter based on minimum weighted feature. International Scholarly and Scientific Research and Innovation 10, 1 (2016), 13--18.
[24]
Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N. Choudhary. 2012. Towards online spam filtering in social networks. In Proceedings of the Network and Distributed System Security Symposium (NDSS’12), Vol. 12. 1--16.
[25]
Maria Giatsoglou, Despoina Chatzakou, Neil Shah, Alex Beutel, Christos Faloutsos, and Athena Vakali. 2015. Nd-sync: Detecting synchronized fraud activities. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 201--214.
[26]
Maria Giatsoglou, Despoina Chatzakou, Neil Shah, Christos Faloutsos, and Athena Vakali. 2015. Retweeting activity on Twitter: Signs of deception. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 122--134.
[27]
Chris Grier, Kurt Thomas, Vern Paxson, and Michael Zhang. 2010. @ spam: The underground on 140 characters or less. In Proceedings of the 17th ACM Conference on Computer and Communications Security. ACM, 27--37.
[28]
Srishti Gupta, Abhinav Khattar, Arpit Gogia, Ponnurangam Kumaraguru, and Tanmoy Chakraborty. 2018. Collective classification of spam campaigners on Twitter: A hierarchical meta-path based approach. arXiv preprint arXiv:1802.04168 (2018).
[29]
Sonu Gupta, Ponnurangam Kumaraguru, and Tanmoy Chakraborty. 2019. Malreg: Detecting and analyzing malicious retweeter groups. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data. ACM, 61--69.
[30]
Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in online freelance marketplaces: Evidence from TaskRabbit and Fiverr. In Proceedings of the Conference on Computer Supported Cooperative Work (CSCW’17). 1914--1933.
[31]
Xia Hu, Jiliang Tang, Yanchao Zhang, and Huan Liu. 2013. Social spammer detection in microblogging. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’13), Vol. 13. 2633--2639.
[32]
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014. Catchsync: Catching synchronized behavior in large directed graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941--950.
[33]
Nitin Jindal and Bing Liu. 2007. Review spam detection. In Proceedings of the 16th International Conference on World Wide Web. ACM, 1189--1190.
[34]
Asha Gowda Karegowda, A. S. Manjunath, and M. A. Jayaram. 2010. Comparative study of attribute selection using gain ratio and correlation based feature selection. Int. J. Inf. Technol. Knowl. Manag. 2, 2 (2010), 271--277.
[35]
Jon R. Kettenring. 1971. Canonical analysis of several sets of variables. Biometrika 58, 3 (1971), 433--451.
[36]
Srijan Kumar, Justin Cheng, Jure Leskovec, and V. S. Subrahmanian. 2017. An army of me: Sockpuppets in online discussion communities. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 857--866.
[37]
Eric Lancaster, Tanmoy Chakraborty, and V. S. Subrahmanian. 2018. MALTP: Parallel prediction of malicious tweets. IEEE Trans. Computat. Soc. Syst. 5, 4 (2018), 1096--1108.
[38]
Kyumin Lee, James Caverlee, and Steve Webb. 2010. Uncovering social spammers: Social honeypots+ machine learning. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 435--442.
[39]
Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. In Proceedings of the International Joint Conference on Artificial Intelligence, Vol. 22. 2488.
[40]
Yuli Liu, Yiqun Liu, Min Zhang, and Shaoping Ma. 2016. Pay me and I’ll follow you: Detection of crowdturfing following activities in microblog environment. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’16). 3789--3796.
[41]
Ashish Mehrotra, Mallidi Sarreddy, and Sanjay Singh. 2016. Detection of fake Twitter followers using graph centrality measures. In Proceedings of the 2nd International Conference on Contemporary Computing and Informatics (IC3I’16). IEEE, 499--504.
[42]
Marti Motoyama, Damon McCoy, Kirill Levchenko, Stefan Savage, and Geoffrey M. Voelker. 2011. An analysis of underground forums. In Proceedings of the ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 71--80.
[43]
Alan Ritter, Colin Cherry, and Bill Dolan. 2010. Unsupervised modeling of Twitter conversations. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 172--180.
[44]
Peter M. Robinson. 1973. Generalized canonical analysis for time series. J. Multivar. Anal. 3, 2 (1973), 141--160.
[45]
Neil Shah, Hemank Lamba, Alex Beutel, and Christos Faloutsos. 2017. The many faces of link fraud. In Proceedings of the IEEE International Conference on Data Mining (ICDM’17). IEEE, 1069--1074.
[46]
Thamar Solorio, Ragib Hasan, and Mainul Mizan. 2013. Sockpuppet detection in Wikipedia: A corpus of real-world deceptive writing for linking identities. arXiv preprint arXiv:1310.6772 (2013).
[47]
Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, and Ben Y. Zhao. 2013. Follow the green: Growth and dynamics in Twitter follower markets. In Proceedings of the Conference on Internet Measurement Conference. ACM, 163--176.
[48]
Arthur Tenenhaus and Michel Tenenhaus. 2011. Regularized generalized canonical correlation analysis. Psychometrika 76, 2 (2011), 257.
[49]
Kurt Thomas, Damon McCoy, Chris Grier, Alek Kolcz, and Vern Paxson. 2013. Trafficking fraudulent accounts: The role of the underground market in Twitter spam and abuse. In Proceedings of the USENIX Security Symposium. 195--210.
[50]
Michel van de Velden. 2011. On generalized canonical correlation analysis. In Proceedings of the 58th World Statistical Congress.
[51]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (2008), 2579--2605.
[52]
Sokratis Vidros, Constantinos Kolias, Georgios Kambourakis, and Leman Akoglu. 2017. Automatic detection of online recruitment frauds: Characteristics, methods, and a public dataset. Fut. Internet 9, 1 (2017), 6.
[53]
Alex Hai Wang. 2010. Detecting spam bots in online social networking sites: A machine learning approach. In Proceedings of the IFIP Conference on Data and Applications Security and Privacy. Springer, 335--342.
[54]
De Wang, Shamkant B. Navathe, Ling Liu, Danesh Irani, Acar Tamersoy, and Calton Pu. 2013. Click traffic analysis of short URL spam on Twitter. In Proceedings of the 9th International Conference on Collaborative Computing: Networking, Applications and Worksharing (Collaboratecom’13). IEEE, 250--259.
[55]
Zhiheng Xu, Yang Zhang, Yao Wu, and Qing Yang. 2012. Modeling user posting behavior on social media. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 545--554.
[56]
Xueling Zheng, Yiu Ming Lai, Kam-Pui Chow, Lucas C. K. Hui, and Siu-Ming Yiu. 2011. Sockpuppet detection in online discussion forums. In Proceedings of the 7th International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP’11). IEEE, 374--377.

Cited By

View all

Index Terms

  1. Analyzing and Detecting Collusive Users Involved in Blackmarket Retweeting Activities

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 11, Issue 3
      Survey Paper and Regular Papers
      June 2020
      286 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3392081
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 April 2020
      Accepted: 01 January 2020
      Revised: 01 January 2020
      Received: 01 July 2019
      Published in TIST Volume 11, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. OSNs
      2. Retweeters
      3. Twitter
      4. blackmarket
      5. collusion
      6. multiview learning

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • SPARC
      • Infosys Centre of AI, IIIT-Delhi, India
      • Ramanujan Fellowship, DST
      • Google India Faculty Award

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)36
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media