skip to main content
research-article

Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

Published: 30 July 2022 Publication History

Abstract

Outlier detection is an important task in data mining, and many technologies for it have been explored in various applications. However, owing to the default assumption that outliers are not concentrated, unsupervised outlier detection may not correctly identify group anomalies with higher levels of density. Although high detection rates and optimal parameters can usually be achieved by using supervised outlier detection, obtaining a sufficient number of correct labels is a time-consuming task. To solve these problems, we focus on semi-supervised outlier detection with few identified anomalies and a large amount of unlabeled data. The task of semi-supervised outlier detection is first decomposed into the detection of discrete anomalies and that of partially identified group anomalies, and a distribution construction sub-module and a data augmentation sub-module are then proposed to identify them, respectively. In this way, the dual multiple generative adversarial networks (Dual-MGAN) that combine the two sub-modules can identify discrete as well as partially identified group anomalies. In addition, in view of the difficulty of determining the stop node of training, two evaluation indicators are introduced to evaluate the training status of the sub-GANs. Extensive experiments on synthetic and real-world data show that the proposed Dual-MGAN can significantly improve the accuracy of outlier detection, and the proposed evaluation indicators can reflect the training status of the sub-GANs.

References

[1]
C. C. Aggarwal. 2017. Outlier Analysis. Springer International Publishing.
[2]
S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon. 2018. GANomaly: Semi-supervised anomaly detection via adversarial training. In Proceedings of the Asian Conference on Computer Vision. 622–637. DOI:
[3]
F. Angiulli. 2019. CFOF: A concentration free measure for anomaly detection. ACM Transactions on Knowledge Discovery from Data 14, 1 (2019), 1–53. DOI:
[4]
A. Belhadi, Y. Djenouri, and J. C. Lin. 2019. Comparative study on trajectory outlier detection algorithms. In Proceedings of the International Conference on Data Mining Workshops.415–423. DOI:
[5]
J. Bian, X. L. Hui, S. Y. Sun, X. G. Zhao, and M. Tan. 2019. A novel and efficient CVAE-GAN-Based approach with informative manifold for semi-supervised anomaly detection. IEEE Access 7 (2019), 88903–88916. DOI:
[6]
G. O. Campos, A. Zimek, J. Sander, R. J. Campello, M. Barbora, E. Schubert, I. Assent, and M. E. Houle. 2016. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery 30, 4 (2016), 891–927. DOI:
[7]
M. H. Chehreghani. 2016. K-Nearest neighbor search and outlier detection via minimax distances. In Proceedings of the SIAM International Conference on Data Mining. 405–413. DOI:
[8]
D. W. Cheng, X. Y. Wang, Y. Zhang, and L. Q. Zhang. 2020. Graph neural network for fraud detection via spatial-temporal attention. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
[9]
A. Daneshpazhouh and A. Sami. 2013. Semi-supervised outlier detection with only positive and unlabeled data based on fuzzy clustering. In Proceedings of the 5th Conference on Information and Knowledge Technology. 344–348. DOI:
[10]
A. Daneshpazhouh and A. Sami. 2014. Entropy-Based outlier detection using semi-supervised approach with few positive examples. Pattern Recognition Letters 49, nov. 1 (2014), 77–84. DOI:
[11]
K. Ghosh Dastidar, J. Jurgovsky, W. Siblini, L. He-Guelton, and M. Granitzer. 2020. NAG: Neural feature aggregation framework for credit card fraud detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.92–101. DOI:
[12]
Y. Dou, G. Ma, P. S. Yu, and S. Xie. 2020. Robust spammer detection by nash reinforcement learning. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 924–933. DOI:
[13]
S. M. Erfani, S. Rajasegarar, S. Karunasekera, and C. Leckie. 2016. High-Dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition 58 (2016), 121–134. DOI:
[14]
T. Ergen and S. S. Kozat. 2020. Unsupervised anomaly detection with LSTM neural networks. IEEE Transactions on Neural Networks and Learning Systems 31, 8 (2020), 3127–3141. DOI:
[15]
U. Fiorea, A. D. Santis, F. Perla, P. Zanetti, and F. Palmieri. 2017. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Information Sciences 479 (2017), 448–455. DOI:
[16]
J. Gao, H. B. Cheng, and P. N. Tan. 2006. Semi-supervised outlier detection. In Proceedings of the ACM symposium on Applied Computing. 635–636. DOI:
[17]
Y. D. Gao, B. Shi, B. Dong, Y. Y. Wang, L. Y. Mi, and Q. H. Zheng. 2021. Tax evasion detection with FBNE-PU algorithm based on PnCGCN and PU learning. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
[18]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial networks. Advances in Neural Information Processing Systems 63, 11 (2014), 139–144.
[19]
S. Kim, Y. C. Tsai, K. Singh, Y. Choi, and M. Cha. 2020. DATE: Dual attentive tree-aware embedding for customs fraud detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2880–2890. DOI:
[20]
M. Kimura and T. Yanagihara. 2018. Anomaly detection using GANs for visual inspection in noisy training data. In Proceedings of the Computer Vision—ACCV 2018 Workshops. 373–385. DOI:
[21]
Y. Li, P. Hu, J. Z. Liu, D. Peng, J. T. Zhou, and X. Peng. 2020. Contrastive clustering. CoRR. abs/2009.09687.
[22]
H. J. Liao, C. Lin, Y. C. Lin, and K. Y. Tung. 2013. Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications 36, 1 (2013), 16–24. DOI:
[23]
Q. Liao, H. Y. Chai, H. Han, X. Zhang, X. Wang, W. Xia, and Y. Ding. 2021. An integrated multi-task model for fake news detection. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
[24]
S. K. Lim, Y. Loo, N. T. Tran, N. M. Cheung, G. Roig, and Y. Elovici. 2018. DOPING: Generative data augmentation for unsupervised anomaly detection with GAN. In Proceedings of the IEEE International Conference on Data Mining.1122–1127. DOI:
[25]
J. L. P. Lima, D. Macêdo, and C. Zanchettin. 2019. Heartbeat anomaly detection using adversarial oversampling. In Proceedings of the International Joint Conference on Neural Networks.1–7. DOI:
[26]
R. F. Lima and A. C. M. Pereira. 2017. Feature selection approaches to fraud detection in e-payment systems. In Proceedings of the International Conference on Electronic Commerce and Web Technologies. 111–126. DOI:
[27]
B. Liu, Y. S. Xiao, L. B. Cao, Z. F. Hao, and F. Q. Deng. 2013. SVDD-Based outlier detection on uncertain data. Knowledge and Information Systems 34, 3 (2013), 597–618. DOI:
[28]
B. Liu, Y. S. Xiao, P. S. Yu, Z. F. Hao, and L. B. Cao. 2014. An efficient approach for outlier detection with imperfect data labels. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1602–1616. DOI:
[29]
F. T. Liu, K. M. Ting, and Z. H. Zhou. 2008. Isolation forest. In Proceedings of the IEEE International Conference on Data Mining. 413–422. DOI:
[30]
S. H. Liu, B. Hooi, and C. Faloutsos. 2019. A contrast metric for fraud detection in rich graphs. IEEE Transactions on Knowledge and Data Engineering 31, 12 (2019), 2235–2248. DOI:
[31]
Y. Z. Liu, Z. Li, C. Zhou, Y. C. Jiang, J. S. Sun, M. Wang, and X. N. He. 2020. Generative adversarial active learning for unsupervised outlier detection. IEEE Transactions on Knowledge and Data Engineering 32, 8 (2020), 1517–1528. DOI:
[32]
F. Lüer, D. Mautz, and C. Böhm. 2019. Anomaly detection in time series using generative adversarial networks. In Proceedings of the International Conference on Data Mining Workshops.1047–1048. DOI:
[33]
E. Manzoor, S. M. Milajerdi, and L. Akoglu. 2016. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1035–1044. DOI:
[34]
J. L. Mao, T. Wang, C. Q. Jin, and A. Y. Zhou. 2017. Feature grouping-based outlier detection upon streaming trajectories. IEEE Transactions on Knowledge and Data Engineering 29, 12 (2017), 2696–2709. DOI:
[35]
P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli. 2019. A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Communications Surveys Tutorials 21, 1 (2019), 686–728. DOI:
[36]
W. A. Mohotti and R. Nayak. 2020. Efficient outlier detection in text corpus using rare frequency and ranking. ACM Transactions on Knowledge Discovery from Data 14, 6 (2020), 1–30. DOI:
[37]
M. S. Munia, M. Nourani, and S. Houari. 2020. Biosignal oversampling using wasserstein generative adversarial network. In Proceedings of the IEEE International Conference on Healthcare Informatics.1–7. DOI:
[38]
M. Odiathevar, W. K. G. Seah, M. Frean, and A. Valera. 2021. An online offline framework for anomaly scoring and detecting new traffic in network streams. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
[39]
A. D. Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. 2018. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems 29, 8 (2018), 3784–3797. DOI:
[40]
P. Qi, J. Cao, T. Y. Yang, J. B. Guo, and J. T. Li. 2019. Exploiting multi-domain visual information for fake news detection. In Proceedings of the 2019 IEEE International Conference on Data Mining.518–527. DOI:
[41]
T. Qiu, X. Z. Liu, X. B. Zhou, W. Y. Qu, Z. L. Ning, and C. L. P. Chen. 2020. An adaptive social spammer detection model with semi-supervised broad learning. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
[42]
Y. X. Ren, B. Wang, J. W. Zhang, and Y. Chang. 2020. Adversarial active learning based heterogeneous graph neural network for fake news detection. In Proceedings of the 2020 IEEE International Conference on Data Mining.452–461. DOI:
[43]
M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli. 2018. Adversarially learned one-class classifier for novelty detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3379–3388. DOI:
[44]
M. Salehi, C. Leckie, J. C. Bezdek, T. Vaithianathan, and X. Y. Zhang. 2016. Fast memory efficient local outlier detection in data streams. IEEE Transactions on Knowledge and Data Engineering 28, 12 (2016), 3246–3260. DOI:
[45]
T. Schlegl, P. Seebck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs. 2017. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging. 146–157. DOI:
[46]
M. J. Siers and M. Z. Islam. 2021. Class imbalance and cost-sensitive decision trees: A unified survey based on a core similarity. ACM Transactions on Knowledge Discovery from Data 15, 1 (2021), 1–31. DOI:
[47]
H. Y. Song, P. Z. Li, and H. F. Liu. 2021. Deep clustering-based fair outlier detection. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
[48]
I. Steinwart. 2005. A classification framework for anomaly detection. Journal of Machine Learning Research 6, 1 (2005), 211–232.
[49]
B. X. Wang and N. Japkowicz. 2010. Boosting support vector machines for imbalanced data sets. Knowledge and Information Systems 25, 1 (2010), 1–20. DOI:
[50]
D. X. Wang, J. B. Lin, P. Cui, Q. H. Jia, Z. Wang, Y. M. Fang, Q. Yu, J. Zhou, S. Yang, and Y. Qi. 2019. A semi-supervised graph attentive network for financial fraud detection. In Proceedings of the IEEE International Conference on Data Mining. 598–607. DOI:
[51]
Y. X. Xie, M. Qiu, H. B. Zhang, L. Z. Peng, and Z. X. Chen. 2020. Gaussian distribution based oversampling for imbalanced data classification. IEEE Transactions on Knowledge and Data Engineering (2020), 1–1. DOI:
[52]
Z. X. Xue, Y. L. Shang, and A. Feng. 2010. Semi-supervised outlier detection based on fuzzy rough C-means clustering. Knowledge and Information Systems 80, 9 (2010), 1911–1921. DOI:
[53]
X. Yang, L. J. Latecki, and D. Pokrajac. 2009. Outlier detection with globally optimal exemplar-based GMM. In Proceedings of the SIAM International Conference on Data Mining. 145–154. DOI:
[54]
X. W. Yi, X. D. Yang, Y. Y. Huang, S. Y. Ke, J. B. Zhang, T. R. Li, and Y. Zheng. 2021. Gas-Theft suspect detection among boiler room users: A data-driven approach. IEEE Transactions on Knowledge and Data Engineering (2021), 1–1. DOI:
[55]
W. Yu, C. Wei, C. C. Aggarwal, Z. Kai, and W. Wei. 2018. NetWalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2672–2681. DOI:
[56]
H. Zenati, C. S. Foo, B. Lecouat, G. Manek, and V. R. Chandrasekhar. 2018. Efficient GAN-Based anomaly detection. In Proceedings of the Workshop on International Conference on Learning Representations.
[57]
Y. L. Zhang, L. Li, J. Zhou, X. Li, and Z. H. Zhou. 2018. Anomaly detection with partially observed anomalies. In Proceedings of the WWW: International World Wide Web Conference. 639–646. DOI:
[58]
Y. J. Zheng, X. H. Zhou, W. G. Sheng, Y. Xue, and S. Y. Chen. 2018. Generative adversarial network based telecom fraud detection at the receiving bank. Neural Networks 102 (2018), 78–86. DOI:
[59]
C. Zhou and R. C. Paffenroth. 2017. Anomaly detection with robust deep autoencoders. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 665–674. DOI:
[60]
J. T. Zhou, J. Du, H. Zhu, X. Peng, Y. Liu, and R. S. M. Goh. 2019. AnomalyNet: An anomaly detection network for video surveillance. IEEE Transactions on Information Forensics and Security 14, 10 (2019), 2537–2550. DOI:
[61]
B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen. 2018. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations. Retrieved from https://iclr.cc/Conferences/2018/Schedule?showEvent=12.

Cited By

View all

Index Terms

  1. Dual-MGAN: An Efficient Approach for Semi-supervised Outlier Detection with Few Identified Anomalies

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Knowledge Discovery from Data
      ACM Transactions on Knowledge Discovery from Data  Volume 16, Issue 6
      December 2022
      631 pages
      ISSN:1556-4681
      EISSN:1556-472X
      DOI:10.1145/3543989
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 30 July 2022
      Online AM: 15 March 2022
      Accepted: 01 February 2022
      Revised: 01 December 2021
      Received: 01 August 2021
      Published in TKDD Volume 16, Issue 6

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Discrete anomalies
      2. partially identified group anomalies
      3. distribution construction
      4. data augmentation

      Qualifiers

      • Research-article
      • Refereed

      Funding Sources

      • National Natural Science Foundation of China
      • BUCEA Young Scholar Research Capability Improvement Plan
      • National Engineering Laboratory for Big Data Distribution and Exchange Technologies

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)221
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media