An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Huang, Shao-Lun; Xu, Xiangxiang; Zheng, Lizhong

doi:10.1109/JSAIT.2020.2981538

Computer Science > Information Theory

arXiv:1910.03196 (cs)

[Submitted on 8 Oct 2019]

Title:An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Authors:Shao-Lun Huang, Xiangxiang Xu, Lizhong Zheng

View PDF

Abstract:In this paper, we propose an information-theoretic approach to design the functional representations to extract the hidden common structure shared by a set of random variables. The main idea is to measure the common information between the random variables by Watanabe's total correlation, and then find the hidden attributes of these random variables such that the common information is reduced the most given these attributes. We show that these attributes can be characterized by an exponential family specified by the eigen-decomposition of some pairwise joint distribution matrix. Then, we adopt the log-likelihood functions for estimating these attributes as the desired functional representations of the random variables, and show that such representations are informative to describe the common structure. Moreover, we design both the multivariate alternating conditional expectation (MACE) algorithm to compute the proposed functional representations for discrete data, and a novel neural network training approach for continuous or high-dimensional data. Furthermore, we show that our approach has deep connections to existing techniques, such as Hirschfeld-Gebelein-Rényi (HGR) maximal correlation, linear principal component analysis (PCA), and consistent functional map, which establishes insightful connections between information theory and machine learning. Finally, the performances of our algorithms are validated by numerical simulations.

Comments:	35 pages; Submitted to IEEE Journal on Selected Areas in Information Theory (JSAIT)
Subjects:	Information Theory (cs.IT); Machine Learning (cs.LG)
Cite as:	arXiv:1910.03196 [cs.IT]
	(or arXiv:1910.03196v1 [cs.IT] for this version)
	https://doi.org/10.48550/arXiv.1910.03196
Journal reference:	IEEE Journal on Selected Areas in Information Theory (Volume: 1, Issue: 1, May 2020)
Related DOI:	https://doi.org/10.1109/JSAIT.2020.2981538

Submission history

From: Xiangxiang Xu [view email]
[v1] Tue, 8 Oct 2019 03:36:27 UTC (161 KB)

Computer Science > Information Theory

Title:An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Theory

Title:An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators