skip to main content
10.1145/3583780.3614662acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

CallMine: Fraud Detection and Visualization of Million-Scale Call Graphs

Published: 21 October 2023 Publication History

Abstract

Given a million-scale dataset of who-calls-whom data containing imperfect labels, how can we detect existing and new fraud patterns? We propose CallMine, with carefully designed features and visualizations. Our CallMine method has the following properties: (a) Scalable, being linear on the input size, handling about 35 million records in around one hour on a stock laptop; (b) Effective, allowing natural interaction with human analysts; (c) Flexible, being applicable in both supervised and unsupervised settings; (d) Automatic, requiring no user-defined parameters.
In the real world, in a multi-million-scale dataset, CallMine was able to detect fraudsters 7,000x faster, namely in a matter of hours, while expert humans took over 10 months to detect them.
CIKM-ARP Categories: Application; Analytics and machine learning; Data presentation.

References

[1]
Leman Akoglu, Pedro O. S. Vaz de Melo, and Christos Faloutsos. 2012. Quantifying Reciprocity in Large Weighted Communication Networks. In PAKDD (2) (Lecture Notes in Computer Science, Vol. 7302). Springer, 85--96.
[2]
Leman Akoglu, Hanghang Tong, and Danai Koutra. 2015. Graph based anomaly detection and description: a survey. Data Min. Knowl. Discov. 29, 3 (2015), 626--688.
[3]
Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander. 1999. OPTICS: Ordering Points To Identify the Clustering Structure. In SIGMOD Conference. ACM Press, 49--60.
[4]
Duen Horng Chau, Aniket Kittur, Jason I. Hong, and Christos Faloutsos. 2011. Apolo: making sense of large network data by combining rich user interaction and machine learning. In CHI. ACM, 167--176.
[5]
Communications Fraud Control Association (CFCA). 2019. Fraud Loss Survey. https://cfca.org/wp-content/uploads/2021/02/CFCA-2019-Fraud-Loss-Survey.pdf Version 1.0.
[6]
Communications Fraud Control Association (CFCA). 2021. Fraud Loss Survey. https://cfca.org/wp-content/uploads/2021/12/CFCA-Fraud-Loss-Survey-2021--2.pdf Version 1.0.
[7]
Alceu Ferraz Costa, Yuto Yamaguchi, Agma Juci Machado Traina, Caetano Traina Jr., and Christos Faloutsos. 2015. RSC: Mining and Modeling Temporal Activity in Social Media. In KDD. ACM, 269--278.
[8]
Pedro O. S. Vaz de Melo, Leman Akoglu, Christos Faloutsos, and Antonio Alfredo Ferreira Loureiro. 2010. Surprising Patterns for the Call Duration Distribution of Mobile Phone Users. In ECML/PKDD (3) (Lecture Notes in Computer Science, Vol. 6323). Springer, 354--369.
[9]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, Evangelos Simoudis, Jiawei Han, and Usama M. Fayyad (Eds.). AAAI Press, 226--231. http://www.aaai.org/Library/KDD/1996/kdd96-037.php
[10]
Dhivya Eswaran, Stephan Günnemann, Christos Faloutsos, Disha Makhija, and Mohit Kumar. 2017. ZooBP: Belief Propagation for Heterogeneous Networks. Proc. VLDB Endow. 10, 5 (2017), 625--636.
[11]
Maria Giatsoglou, Despoina Chatzakou, Neil Shah, Alex Beutel, Christos Faloutsos, and Athena Vakali. 2015. ND-Sync: Detecting Synchronized Fraud Activities. In PAKDD (2) (Lecture Notes in Computer Science, Vol. 9078). Springer, 201--214.
[12]
Maria Giatsoglou, Despoina Chatzakou, Neil Shah, Christos Faloutsos, and Athena Vakali. 2015. Retweeting Activity on Twitter: Signs of Deception. In PAKDD (1) (Lecture Notes in Computer Science, Vol. 9077). Springer, 122--134.
[13]
Palash Goyal, Sujit Rokka Chhetri, and Arquimedes Canedo. 2020. dyngraph2vec: Capturing network dynamics using dynamic graph representation learning. Knowl. Based Syst. 187 (2020).
[14]
Nikhil Gupta, Dhivya Eswaran, Neil Shah, Leman Akoglu, and Christos Faloutsos. 2018. Beyond Outlier Detection: LookOut for Pictorial Explanation. In ECML/PKDD (Lecture Notes in Computer Science, Vol. 11051). Springer, 122--138.
[15]
Greg Hamerly and Charles Elkan. 2003. Learning the k in k-means. In NIPS. MIT Press, 281--288.
[16]
William L. Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In NIPS. 1024--1034.
[17]
Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage. In KDD. ACM, 895--904.
[18]
Yajun Huang, Jingbin Zhang, Yiyang Yang, Zhiguo Gong, and Zhifeng Hao. 2020. GNNVis: Visualize Large-Scale Data by Learning a Graph Neural Network Representation. In CIKM. ACM, 545--554.
[19]
Alfred Inselberg and Bernard Dimsdale. 1990. Parallel Coordinates: A Tool for Visualizing Multi-dimensional Geometry. In IEEE Visualization. IEEE Computer Society Press, 361--378.
[20]
Di Jin, Aristotelis Leventidis, Haoming Shen, Ruowang Zhang, Junyue Wu, and Danai Koutra. 2017. PERSEUS-HUB: Interactive and Collective Exploration of Large-Scale Graphs. Informatics 4, 3 (2017), 22.
[21]
Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart. 2020. Representation Learning for Dynamic Graphs: A Survey. J. Mach. Learn. Res. 21 (2020), 70:1--70:73.
[22]
Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In ICLR (Poster). OpenReview.net.
[23]
Srijan Kumar, Xikun Zhang, and Jure Leskovec. 2019. Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks. In KDD. ACM, 1269--1278.
[24]
Meng-Chieh Lee, Shubhranshu Shekhar, Christos Faloutsos, Timothy Noah Hutson, and Leon D. Iasemidis. 2021. Gen2Out: Detecting and Ranking Generalized Anomalies. In IEEE BigData. IEEE, 801--811.
[25]
Siwei Li, Zhiyan Zhou, Anish Upadhayay, Omar Shaikh, Scott Freitas, Haekyu Park, Zijie J. Wang, Susanta Routray, Matthew Hull, and Duen Horng Chau. 2020. Argo Lite: Open-Source Interactive Graph Exploration and Visualization in Browsers. In CIKM. ACM, 3071--3076.
[26]
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation Forest. In Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), December 15--19, 2008, Pisa, Italy. IEEE Computer Society, 413--422. https://doi.org/10.1109/ICDM.2008.17
[27]
Giang Hoang Nguyen, John Boaz Lee, Ryan A. Rossi, Nesreen K. Ahmed, Eunyee Koh, and Sungchul Kim. 2018. Continuous-Time Dynamic Network Embeddings. In WWW (Companion Volume). ACM, 969--976.
[28]
Namyong Park, Fuchen Liu, Purvanshi Mehta, Dana Cristofor, Christos Faloutsos, and Yuxiao Dong. 2022. EvoKG: Jointly Modeling Event Time and Network Structure for Reasoning over Temporal Knowledge Graphs. In WSDM. ACM, 794--803.
[29]
Robert S. Pienta, Minsuk Kahng, Zhiyuan Lin, Jilles Vreeken, Partha P. Talukdar, James Abello, Ganesh Parameswaran, and Duen Horng Chau. 2017. FACETS: Adaptive Local Exploration of Large Graphs. In Proceedings of the 2017 SIAM International Conference on Data Mining, Houston, Texas, USA, April 27--29, 2017, Nitesh V. Chawla and Wei Wang (Eds.). SIAM, 597--605. https://doi.org/10.1137/1.9781611974973.67
[30]
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos. 2016. CoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms. In ICDM. IEEE Computer Society, 469--478.
[31]
Charles D. Stolper, Minsuk Kahng, Zhiyuan Lin, Florian Foerster, Aakash Goel, John T. Stasko, and Duen Horng Chau. 2014. GLO-STIX: Graph-Level Operations for Specifying Techniques and Interactive eXploration. IEEE Trans. Vis. Comput. Graph. 20, 12 (2014), 2320--2328. https://doi.org/10.1109/TVCG.2014.2346444
[32]
Felix Wu, Amauri H. Souza Jr., Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Q.Weinberger. 2019. Simplifying Graph Convolutional Networks. In ICML (Proceedings of Machine Learning Research, Vol. 97). PMLR, 6861--6871.
[33]
Da Xu, Chuanwei Ruan, Evren Körpeoglu, Sushant Kumar, and Kannan Achan. 2020. Inductive representation learning on temporal graphs. In ICLR. OpenReview. net.
[34]
Jonathan S. Yedidia, William T. Freeman, and YairWeiss. 2000. Generalized Belief Propagation. In NIPS. MIT Press, 689--695.
[35]
Le-kui Zhou, Yang Yang, Xiang Ren, FeiWu, and Yueting Zhuang. 2018. Dynamic Network Embedding by Modeling Triadic Closure Process. In AAAI. AAAI Press, 571--578.
[36]
Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. 2003. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In ICML. AAAI Press, 912--919.

Index Terms

  1. CallMine: Fraud Detection and Visualization of Million-Scale Call Graphs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management
      October 2023
      5508 pages
      ISBN:9798400701245
      DOI:10.1145/3583780
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 October 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. fraud detection
      2. graph mining
      3. phone call network
      4. visualization

      Qualifiers

      • Research-article

      Funding Sources

      • Air Force Research Laboratory (AFRL), the Office of Naval Research (ONR) and the Army Research Office (ARO)
      • Portuguese Foundation for Science and Technology - FCT under CMU Portugal
      • Sao Paulo Research Foundation - FAPESP
      • Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - Brasil (CAPES)
      • AIDA project - Adaptive, Intelligent and Distributed Assurance Platform
      • National Science Foundation Graduate Research
      • Pennsylvania Infrastructure Technology Alliance - PITA award
      • European Regional Development Fund - ERDF through the Operational Program for Competitiveness and Internationalisation - COMPETE 2020
      • National Council for Scientific and Technological Development (CNPq)

      Conference

      CIKM '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 230
        Total Downloads
      • Downloads (Last 12 months)174
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 21 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media