research-article

TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data

Authors:

Jae-Gil LeeAuthors Info & Claims

WWW '20: Proceedings of The Web Conference 2020

Pages 1615 - 1624

https://doi.org/10.1145/3366423.3380233

Published: 20 April 2020 Publication History

Abstract

Recently, autoencoder (AE)-based embedding approaches have achieved state-of-the-art performance in many tasks, especially in top-k recommendation with user embedding or node classification with node embedding. However, we find that many real-world data follow the power-law distribution with respect to the data object sparsity. When learning AE-based embeddings of these data, dense inputs move away from sparse inputs in an embedding space even when they are highly correlated. This phenomenon, which we call polarization, obviously distorts the embedding. In this paper, we propose TRAP that leverages two-level regularizers to effectively alleviate the polarization problem. The macroscopic regularizer generally prevents dense input objects from being distant from other sparse input objects, and the microscopic regularizer individually attracts each object to correlated neighbor objects rather than uncorrelated ones. Importantly, TRAP is a meta-algorithm that can be easily coupled with existing AE-based embedding methods with a simple modification. In extensive experiments on two representative embedding tasks using six-real world datasets, TRAP boosted the performance of the state-of-the-art algorithms by up to 31.53% and 94.99% respectively.

References

[1]

Hongyun Cai, Vincent W Zheng, and Kevin Chen-Chuan Chang. 2018. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Transactions on Knowledge and Data Engineering 30, 9(2018), 1616–1637.

Digital Library

[2]

Shaosheng Cao, Wei Lu, and Qiongkai Xu. 2016. Deep Neural Networks for Learning Graph Representations. In Proceedings of the 13th AAAI Conference on Artificial Intelligence. AAAI, 1145–1152.

[3]

Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. 2009. Power-law Distributions in Empirical Data. SIAM Rev. 51, 4 (2009), 661–703.

Digital Library

[4]

Hongchang Gao and Heng Huang. 2018. Deep Attributed Network Embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Vol. 18. IJCAI, 3364–3370.

Digital Library

[5]

Hongchang Gao, Jian Pei, and Heng Huang. 2019. ProGAN: Network Embedding via Proximity Generative Adversarial Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1308–1316.

Digital Library

[6]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems. NeurIPS Foundation, 2672–2680.

[7]

Aditya Grover and Jure Leskovec. 2016. Node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 855–864.

Digital Library

[8]

William L Hamilton, Rex Ying, and Jure Leskovec. 2017. Representation Learning on Graphs: Methods and Applications. ArXiv preprint arXiv:1709.05584(2017).

[9]

Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web. IW3C2, 173–182.

Digital Library

[10]

Diederik P Kingma and Max Welling. 2013. Auto-encoding Variational Bayes. ArXiv preprint arXiv:1312.6114(2013).

[11]

Yehuda Koren. 2008. Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 426–434.

Digital Library

[12]

Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Techniques for Recommender Systems. Computer8(2009), 30–37.

[13]

Dawen Liang, Rahul G Krishnan, Matthew D Hoffman, and Tony Jebara. 2018. Variational Autoencoders for Collaborative Filtering. In Proceedings of the Web Conference 2018. IW3C2, 689–698.

Digital Library

[14]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of Machine Learning Research 9, Nov (2008), 2579–2605.

[15]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed Representations of Words and Phrases and Their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems. NeurIPS Foundation, 3111–3119.

[16]

Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic Matrix Factorization. In Proceedings of the 20th International Conference on Neural Information Processing Systems. NeurIPS Foundation, 1257–1264.

[17]

Katarzyna Musiał and Przemysław Kazienko. 2013. Social Networks on the Internet. World Wide Web 16, 1 (2013), 31–72.

[18]

Vera Pawlowsky-Glahn, Juan José Egozcue, and Raimon Tolosana Delgado. 2015. Modeling and Analysis of Compositional Data (1 ed.). Wiley.

[19]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 701–710.

Digital Library

[20]

Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. Struc2vec: Learning Node Representations from Structural Identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 385–394.

Digital Library

[21]

Suvash Sedhain, Aditya Krishna Menon, Scott Sanner, and Lexing Xie. 2015. Autorec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 111–112.

Digital Library

[22]

Hwanjun Song, Jae-Gil Lee, and Wook-Shin Han. 2017. PAMAE: Parallel k-medoids Clustering with High Accuracy and Efficiency. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1087–1096.

Digital Library

[23]

Petre Stoica and Niclas Sandgren. 2006. Total-variance Reduction via Thresholding: Application to Cepstral Analysis. IEEE Transactions on Signal Processing 55, 1 (2006), 66–72.

Digital Library

[24]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale Information Network Embedding. In Proceedings of the 24th International Conference on World Wide Web. IW3C2, 1067–1077.

Digital Library

[25]

Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. 2008. Extracting and Composing Robust Features with Denoising Autoencoders. In Proceedings of the 25th International Conference on Machine Learning. ICML, 1096–1103.

Digital Library

[26]

Daixin Wang, Peng Cui, and Wenwu Zhu. 2016. Structural Deep Network Embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1225–1234.

Digital Library

[27]

Yao Wu, Christopher DuBois, Alice X Zheng, and Martin Ester. 2016. Collaborative Denoising Auto-encoders for Top-n Recommender Systems. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 153–162.

Digital Library

[28]

Jia-Dong Zhang and Chi-Yin Chow. 2015. Geosoca: Exploiting Geographical, Social and Categorical Correlations for Point-of-interest Recommendations. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 443–452.

Digital Library

[29]

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay. 2019. Deep Learning Based Recommender System: A Survey and New Perspectives. Comput. Surveys 52, 1 (2019), 5.

Digital Library

[30]

Ziwei Zhu, Jianling Wang, and James Caverlee. 2019. Improving Top-K Recommendation via Joint Collaborative Autoencoders. In Proceedings of the Web Conference 2019. IW3C2, 3483–3489.

Cited By

Wang YZhao YZhang YDerr T(2023)Collaboration-Aware Graph Convolutional Network for Recommender SystemsProceedings of the ACM Web Conference 202310.1145/3543507.3583229(91-101)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583229
Lin JWan YXu JQi X(2023)Long-tailed graph neural networks via graph structure learning for node classificationApplied Intelligence10.1007/s10489-023-04534-353:17(20206-20222)Online publication date: 1-Apr-2023
https://doi.org/10.1007/s10489-023-04534-3
Park DKang JSong HYoon SLee J(2022)Multi-view POI-level Cellular Trajectory Reconstruction for Digital Contact Tracing of Infectious Diseases2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00144(1137-1142)Online publication date: Nov-2022
https://doi.org/10.1109/ICDM54844.2022.00144
Show More Cited By

Index Terms

TRAP: Two-level Regularized Autoencoder-based Embedding for Power-law Distributed Data
1. Computing methodologies
  1. Machine learning
2. Information systems
  1. Information systems applications
    1. Data mining

Index terms have been assigned to the content through auto-classification.

Recommendations

Embedding with autoencoder regularization
ECMLPKDD'13: Proceedings of the 2013th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part III

The problem of embedding arises in many machine learning applications with the assumption that there may exist a small number of variabilities which can guarantee the "semantics" of the original high-dimensional data. Most of the existing embedding ...
Self-embedding fragile watermarking based on reference-data interleaving and adaptive selection of embedding mode

A novel self-embedding watermarking scheme for tampering recovery is proposed. Two types of modes, i.e., overlapping and overlapping-free embedding, are used. Flexible, MSB-based, interleaved reference bits are embedded for content recovery. Embedding ...
Lossless data embedding: new paradigm in digital watermarking

One common drawback of virtually all current data embedding methods is the fact that the original image is inevitably distorted due to data embedding itself. This distortion typically cannot be removed completely due to quantization, bit-replacement, or ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '20: Proceedings of The Web Conference 2020

April 2020

3143 pages

ISBN:9781450370233

DOI:10.1145/3366423

Editors:
Yennun Huang
Acadmica sinica, Taiwan
,
Irwin King
The Chinese University of Hong Kong, Hong Kong
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
385
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang YZhao YZhang YDerr T(2023)Collaboration-Aware Graph Convolutional Network for Recommender SystemsProceedings of the ACM Web Conference 202310.1145/3543507.3583229(91-101)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583229
Lin JWan YXu JQi X(2023)Long-tailed graph neural networks via graph structure learning for node classificationApplied Intelligence10.1007/s10489-023-04534-353:17(20206-20222)Online publication date: 1-Apr-2023
https://doi.org/10.1007/s10489-023-04534-3
Park DKang JSong HYoon SLee J(2022)Multi-view POI-level Cellular Trajectory Reconstruction for Digital Contact Tracing of Infectious Diseases2022 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM54844.2022.00144(1137-1142)Online publication date: Nov-2022
https://doi.org/10.1109/ICDM54844.2022.00144
Park DSong HKim MLee JRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Task-agnostic undesirable feature deactivation using out-of-distribution dataProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540570(4040-4052)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540570
Whang SLee J(2020)Data collection and quality challenges for deep learningProceedings of the VLDB Endowment10.14778/3415478.341556213:12(3429-3432)Online publication date: 14-Sep-2020
https://dl.acm.org/doi/10.14778/3415478.3415562

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents