skip to main content
research-article

Detecting and Analyzing Collusive Entities on YouTube

Published: 24 November 2021 Publication History

Abstract

YouTube sells advertisements on the posted videos, which in turn enables the content creators to monetize their videos. As an unintended consequence, this has proliferated various illegal activities such as artificial boosting of views, likes, comments, and subscriptions. We refer to such videos (gaining likes and comments artificially) and channels (gaining subscriptions artificially) as “collusive entities.” Detecting such collusive entities is an important yet challenging task. Existing solutions mostly deal with the problem of spotting fake views, spam comments, fake content, and so on, and oftentimes ignore how such fake activities emerge via collusion. Here, we collect a large dataset consisting of two types of collusive entities on YouTube—videos submitted to gain collusive likes and comment requests and channels submitted to gain collusive subscriptions.
We begin by providing an in-depth analysis of collusive entities on YouTube fostered by various blackmarket services. Following this, we propose models to detect three types of collusive YouTube entities: videos seeking collusive likes, channels seeking collusive subscriptions, and videos seeking collusive comments. The third type of entity is associated with temporal information. To detect videos and channels for collusive likes and subscriptions, respectively, we utilize one-class classifiers trained on our curated collusive entities and a set of novel features. The SVM-based model shows significant performance with a true positive rate of 0.911 and 0.910 for detecting collusive videos and collusive channels, respectively. To detect videos seeking collusive comments, we propose CollATe, a novel end-to-end neural architecture that leverages time-series information of posted comments along with static metadata of videos. CollATe is composed of three components: metadata feature extractor (which derives metadata-based features from videos), anomaly feature extractor (which utilizes the time-series data to detect sudden changes in the commenting activity), and comment feature extractor (which utilizes the text of the comments posted during collusion and computes a similarity score between the comments). Extensive experiments show the effectiveness of CollATe (with a true positive rate of 0.905) over the baselines.

References

[1]
Amelia Acker. 2018. Data Craft: The manipulation of social media metadata. Data Societ. Res. Inst. (2018). https://ccn.unistra.fr/websites/ccn/documentation/Recherche-Data/DS_Data_Craft_Manipulation_of_Social_Media_Metadata.pdf.
[2]
Anupama Aggarwal and Ponnurangam Kumaraguru. 2014. Followers or phantoms? An anatomy of purchased Twitter followers. arXiv preprint arXiv:1408.1534 (2014).
[3]
Shreyas Aiyar and Nisha P. Shetty. 2018. N-gram assisted YouTube spam comment detection. Procedia Comput. Sci. 132 (2018), 174–182.
[4]
Mustafa Alassad, Nitin Agarwal, and Muhammad Nihal Hussain. 2019. Examining intensive groups in YouTube commenter networks. In International Conference on Social Computing, Behavioral-cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation. Springer, 224–233.
[5]
Túlio C. Alberto, Johannes V. Lochter, and Tiago A. Almeida. 2015. TubeSpam: Comment spam filtering on YouTube. In IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, 138–143.
[6]
Mansour Alsaleh, Abdulrahman Alarifi, Fatima Al-Quayed, and AbdulMalik Al-Salman. 2015. Combating comment spam with machine learning approaches. In IEEE 14th International Conference on Machine Learning and Applications (ICMLA). IEEE, 295–300.
[7]
Udit Arora, Hridoy Sankar Dutta, Brihi Joshi, Aditya Chetan, and Tanmoy Chakraborty. 2020. Analyzing and detecting collusive users involved in blackmarket retweeting activities. ACM Trans. Intell. Syst. Technol. 11, 3 (2020), 1–24.
[8]
Udit Arora, William Scott Paka, and Tanmoy Chakraborty. 2019. Multitask learning for blackmarket tweet detection. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 127–130.
[9]
Prudhvi Ratna Badri Satya, Kyumin Lee, Dongwon Lee, Thanh Tran, and Jason Jiasheng Zhang. 2016. Uncovering fake likers in online social networks. In 25th ACM International on Conference on Information and Knowledge Management. ACM, 2365–2370.
[10]
Ganesh Bagler. 2008. Analysis of the airport network of India as a complex weighted network. Phys. A: Statist. Mech. Applic. 387, 12 (2008), 2972–2980.
[11]
George E. P. Box and David A. Pierce. 1970. Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J. Amer. statist. Assoc. 65, 332 (1970), 1509–1526.
[12]
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. LOF: Identifying density-based local outliers. In ACM SIGMOD International Conference on Management of Data. 93–104.
[13]
Vlad Bulakh, Christopher W. Dunn, and Minaxi Gupta. 2014. Identifying fraudulently promoted online videos. In 23rd International Conference on World Wide Web. 1111–1116.
[14]
Jacopo Castellini, Valentina Poggioni, and Giulia Sorbi. 2017. Fake Twitter followers detection by denoising autoencoder. In International Conference on Web Intelligence. ACM, 195–202.
[15]
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In 11th International Workshop on Semantic Evaluation (SemEval’17). Association for Computational Linguistics, 1–14.
[16]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
[17]
Liang Chen, Yipeng Zhou, and Dah Ming Chiu. 2015. Analysis and detection of fake views in online video services. ACM Trans. Multimedia Comput., Commun. Applic. 11, 2s (2015), 44.
[18]
Aditya Chetan, Brihi Joshi, Hridoy Sankar Dutta, and Tanmoy Chakraborty. 2019. CoReRank: Ranking to detect users involved in blackmarket-based collusive retweeting activities. In 12th ACM International Conference on Web Search and Data Mining. ACM, 330–338.
[19]
Rashid Chowdury, Md Nuruddin Monsur Adnan, G. A. N. Mahmud, and Rashedur M. Rahman. 2013. A data mining based spam detection system for YouTube. In 8th International Conference on Digital Information Management (ICDIM’13). IEEE, 373–378.
[20]
Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, and Maurizio Tesconi. 2015. Fame for sale: Efficient detection of fake Twitter followers. Decis. Supp. Syst. 80 (2015), 56–71.
[21]
Vacha Dave, Saikat Guha, and Yin Zhang. 2012. Measuring and fingerprinting click-spam in ad networks. In ACM SIGCOMM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication. ACM, 175–186.
[22]
Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohamed Ali Kaafar, and M. Zubair Shafiq. 2014. Paying for likes?: Understanding Facebook like fraud using honeypots. In Conference on Internet Measurement Conference. ACM, 129–136.
[23]
Sarthika Dhawan, Siva Charan Reddy Gangireddy, Shiv Kumar, and Tanmoy Chakraborty. 2019. Spotting collusive behaviour of online fraud groups in customer reviews. arXiv preprint arXiv:1905.13649 (2019).
[24]
Hridoy Sankar Dutta, Kartik Aggarwal, and Tanmoy Chakraborty. 2021. DECIFE: Detecting collusive users involved in blackmarket following services on Twitter. arXiv preprint arXiv:2107.11697 (2021).
[25]
Hridoy Sankar Dutta and Tanmoy Chakraborty. 2019. Blackmarket-driven collusion among retweeters–Analysis, detection, and characterization. IEEE Trans. Inf. Forens. Secur. 15 (2019).
[26]
Hridoy Sankar Dutta and Tanmoy Chakraborty. 2020. Blackmarket-driven collusion on online media: A survey. arXiv preprint arXiv:2008.13102 (2020).
[27]
Hridoy Sankar Dutta, Aditya Chetan, Brihi Joshi, and Tanmoy Chakraborty. 2018. Retweet us, we will retweet you: Spotting collusive retweeters involved in blackmarket services. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 242–249.
[28]
H. S. Dutta, V. R. Dutta, A. Adhikary, and T. Chakraborty. 2020. HawkesEye: Detecting fake retweeters using Hawkes process and topic modeling. IEEE Trans. Inf. Forens. Secur. 15 (2020), 2667–2678.
[29]
Manuel Egele, Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2013. COMPA: Detecting compromised accounts on social networks. In 20th Annual Network & Distributed System Security Symposium. 1–17.
[30]
Manuel Egele, Gianluca Stringhini, Christopher Kruegel, and Giovanni Vigna. 2015. Towards detecting compromised accounts on social networks. IEEE Trans. Depend. Secure Comput. 14, 4 (2015), 447–460.
[31]
Marc Faddoul, Guillaume Chaslot, and Hany Farid. 2020. A longitudinal analysis of YouTube’s promotion of conspiracy videos. arXiv preprint arXiv:2003.03318 (2020).
[32]
Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gautam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi. 2012. Understanding and combating link farming in the Twitter social network. In Proceedings of the 21st International Conference on World Wide Web. ACM, 61–70.
[33]
Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, and Xiuqiang He. 2017. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv preprint arXiv:1703.04247 (2017).
[34]
Aditi Gupta, Hemank Lamba, Ponnurangam Kumaraguru, and Anupam Joshi. 2013. Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy. In 22nd International Conference on World Wide Web. ACM, 729–736.
[35]
Mia Hubert, Michiel Debruyne, and Peter J. Rousseeuw. 2018. Minimum covariance determinant and extensions. Wiley Interdiscip. Rev.: Comput. Statist. 10, 3 (2018), e1421.
[36]
Muhammad Nihal Hussain, Serpil Tokdemir, Nitin Agarwal, and Samer Al-Khateeb. 2018. Analyzing disinformation and crowd manipulation tactics on YouTube. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 1092–1095.
[37]
Muhammad Ikram, Lucky Onwuzurike, Shehroze Farooqi, Emiliano De Cristofaro, Arik Friedman, Guillaume Jourjon, Mohammed Ali Kaafar, and M. Zubair Shafiq. 2017. Measuring, characterizing, and detecting Facebook like farms. ACM Trans. Priv. Secur. 20, 4 (2017), 13.
[38]
Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014. CATCHSYNC: Catching synchronized behavior in large directed graphs. In 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 941–950.
[39]
Michael H. Keller. 2018. The flourishing business of fake YouTube views. The New York Times 11 (2018). http://authenticationinart.org/wp-content/uploads/2018/08/flourishing-fake-youtube.pdf.
[40]
Aleksander Kołcz and Choon Hui Teo. 2009. Feature weighting for improved classifier robustness. In 6th Conference on Email and Anti-spam. 1–10.
[41]
Srijan Kumar and Neil Shah. 2018. False information on web and social media: A survey. arXiv preprint arXiv:1804.08559 (2018).
[42]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957–966.
[43]
Yixuan Li, Oscar Martinez, Xing Chen, Yi Li, and John E. Hopcroft. 2016. In a world that counts: Clustering and detecting fake social engagement at scale. In 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 111–120.
[44]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In 8th IEEE International Conference on Data Mining. IEEE, 413–422.
[45]
Pankaj Malhotra, Lovekesh Vig, Gautam Shroff, and Puneet Agarwal. 2015. Long short term memory networks for anomaly detection in time series. In Proceedings. Presses universitaires de Louvain, vol. 89. 89–94. https://books.google.com/books?hl=en&lr=&id=USGLCgAAQBAJ&oi=fnd&pg=PA89&ots=FtfcmqEUSO&sig=WNUFEBgYkzvW5tMkK9HCNP3FChM.
[46]
Miriam Marciel, Rubén Cuevas, Albert Banchs, Roberto González, Stefano Traverso, Mohamed Ahmed, and Arturo Azcorra. 2016. Understanding the detection of view fraud in video content portals. In 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 357–368.
[47]
Ashish Mehrotra, Mallidi Sarreddy, and Sanjay Singh. 2016. Detection of fake Twitter followers using graph centrality measures. In 2nd International Conference on Contemporary Computing and Informatics (IC3I). IEEE, 499–504.
[48]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2007. Detectives: Detecting coalition hit inflation attacks in advertising networks streams. In 16th International Conference on World Wide Web. ACM, 241–250.
[49]
Shirin Nilizadeh, François Labrèche, Alireza Sedighian, Ali Zand, José Fernandez, Christopher Kruegel, Gianluca Stringhini, and Giovanni Vigna. 2017. Poised: Spotting Twitter spam off the beaten paths. In ACM SIGSAC Conference on Computer and Communications Security. ACM, 1159–1174.
[50]
Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, and Robert C. Williamson. 2001. Estimating the support of a high-dimensional distribution. Neural Comput. 13, 7 (2001), 1443–1471.
[51]
Indira Sen, Anupama Aggarwal, Shiven Mian, Siddharth Singh, Ponnurangam Kumaraguru, and Anwitaman Datta. 2018. Worth its weight in likes: Towards detecting fake likes on Instagram. In 10th ACM Conference on Web Science. ACM, 205–209.
[52]
Neil Shah, Hemank Lamba, Alex Beutel, and Christos Faloutsos. 2017. The many faces of link fraud. In IEEE International Conference on Data Mining (ICDM). IEEE, 1069–1074.
[53]
Vishwanath A. Sindagi and Vishal M. Patel. 2018. A survey of recent advances in CNN-based single image crowd counting and density estimation. Pattern Recog. Lett. 107 (2018), 3–16.
[54]
Gianluca Stringhini, Pierre Mourlanne, Gregoire Jacob, Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2015. EVILCOHORT: Detecting communities of malicious accounts on online services. In 24th USENIX Security Symposium. 563–578.
[55]
Gianluca Stringhini, Gang Wang, Manuel Egele, Christopher Kruegel, Giovanni Vigna, Haitao Zheng, and Ben Y. Zhao. 2013. Follow the green: Growth and dynamics in Twitter follower markets. In Conference on Internet Measurement Conference. ACM, 163–176.
[56]
Ashish Sureka. 2011. Mining user comment activity for detecting forum spammers in YouTube. arXiv preprint arXiv:1103.5044 (2011).
[57]
Kurt Thomas, Chris Grier, Dawn Song, and Vern Paxson. 2011. Suspended accounts in retrospect: An analysis of Twitter spam. In ACM SIGCOMM Conference on Internet Measurement Conference. ACM, 243–258.
[58]
Zhigang Tu, Wei Xie, Dejun Zhang, Ronald Poppe, Remco C. Veltkamp, Baoxin Li, and Junsong Yuan. 2019. A survey of variational and CNN-based optical flow techniques. Sig. Process.: Image Commun. 72 (2019), 9–24.
[59]
Alper Kürşat Uysal. 2018. Feature selection for comment spam filtering on YouTube. Data Sci. Applic. 1, 1 (2018), 4–8.
[60]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, Dec. (2010), 3371–3408.
[61]
Alex Hai Wang. 2010. Don’t follow me: Spam detection in Twitter. In International Conference on Security and Cryptography (SECRYPT). IEEE, 1–10.
[62]
Guan Wang, Sihong Xie, Bing Liu, and S. Yu Philip. 2011. Review graph based online store review spammer detection. In IEEE 11th International Conference on Data Mining. IEEE, 1242–1247.
[63]
Guolei Yang, Neil Zhenqiang Gong, and Ying Cai. 2017. Fake co-visitation injection attacks to recommender systems. In Annual Network & Distributed System Security Symposium. 1–17.
[64]
Yuhanis Yusof and Omar Hadeb Sadoon. 2017. Detecting video spammers in YouTube social media. In International Conference on Computing and Informatics. 228–234.
[65]
Daniel Yue Zhang, Jose Badilla, Herman Tong, and Dong Wang. 2018. An end-to-end scalable copyright detection system for online video sharing platforms. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 626–629.
[66]
Daniel Yue Zhang, Qi Li, Herman Tong, Jose Badilla, Yang Zhang, and Dong Wang. 2018. Crowdsourcing-based copyright infringement detection in live video streams. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, 367–374.
[67]
Jinsong Zhang, Yi Zhang, Lianfa Bai, and Jing Han. 2018. Lossless-constraint denoising based auto-encoders. Sig. Process.: Image Commun. 63 (2018), 92–99.
[68]
Wuxain Zhang and Hung-Min Sun. 2017. Instagram spam detection. In IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC). IEEE, 227–228.
[69]
Zhen-hui Zhu, Yang Zhi, and Ya-fei Dai. 2016. A new approach to detect user collusion behavior in online QA system. In International Conference on Computer Networks and Communication Technology (CNCT’16). Atlantis Press, 836–842.

Cited By

View all

Index Terms

  1. Detecting and Analyzing Collusive Entities on YouTube

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Intelligent Systems and Technology
      ACM Transactions on Intelligent Systems and Technology  Volume 12, Issue 5
      October 2021
      383 pages
      ISSN:2157-6904
      EISSN:2157-6912
      DOI:10.1145/3484925
      • Editor:
      • Huan Liu
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 November 2021
      Accepted: 01 July 2021
      Revised: 01 December 2020
      Received: 01 April 2020
      Published in TIST Volume 12, Issue 5

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. YouTube
      2. collusion
      3. blackmarket
      4. artificial boosting
      5. OSNs

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • SERB
      • Ramanujan Fellowship
      • ihub-Anubhuti-iiitd Foundation

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)90
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 12 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media