MDPI - Publisher of Open Access Journals

24 pages, 534 KiB

Open AccessArticle

Anomaly Detection in High-Dimensional Time Series Data with Scaled Bregman Divergence

by Yunge Wang, Lingling Zhang, Tong Si, Graham Bishop and Haijun Gong

Algorithms 2025, 18(2), 62; https://doi.org/10.3390/a18020062 - 24 Jan 2025

Viewed by 391

The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and [...] Read more.

The purpose of anomaly detection is to identify special data points or patterns that significantly deviate from the expected or typical behavior of the majority of the data, and it has a wide range of applications across various domains. Most existing statistical and machine learning-based anomaly detection algorithms face challenges when applied to high-dimensional data. For instance, the unconstrained least-squares importance fitting (uLSIF) method, a state-of-the-art anomaly detection approach, encounters the unboundedness problem under certain conditions. In this study, we propose a scaled Bregman divergence-based anomaly detection algorithm using both least absolute deviation and least-squares loss for parameter learning. This new algorithm effectively addresses the unboundedness problem, making it particularly suitable for high-dimensional data. The proposed technique was evaluated on both synthetic and real-world high-dimensional time series datasets, demonstrating its effectiveness in detecting anomalies. Its performance was also compared to other density ratio estimation-based anomaly detection methods. Full article

(This article belongs to the Special Issue Machine Learning Models and Algorithms for Image Processing)

► Show Figures

Figure 1

11 pages, 441 KiB

Open AccessArticle

Symplectic Bregman Divergences

by Frank Nielsen

Entropy 2024, 26(12), 1101; https://doi.org/10.3390/e26121101 - 16 Dec 2024

Viewed by 594

Abstract

We present a generalization of Bregman divergences in finite-dimensional symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel–Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel–Young inequality [...] Read more.

We present a generalization of Bregman divergences in finite-dimensional symplectic vector spaces that we term symplectic Bregman divergences. Symplectic Bregman divergences are derived from a symplectic generalization of the Fenchel–Young inequality which relies on the notion of symplectic subdifferentials. The symplectic Fenchel–Young inequality is obtained using the symplectic Fenchel transform which is defined with respect to the symplectic form. Since symplectic forms can be built generically from pairings of dual systems, we obtain a generalization of Bregman divergences in dual systems obtained by equivalent symplectic Bregman divergences. In particular, when the symplectic form is derived from an inner product, we show that the corresponding symplectic Bregman divergences amount to ordinary Bregman divergences with respect to composite inner products. Some potential applications of symplectic divergences in geometric mechanics, information geometry, and learning dynamics in machine learning are touched upon. Full article

(This article belongs to the Special Issue Information Geometry for Data Analysis)

► Show Figures

Figure 1

29 pages, 1927 KiB

Open AccessArticle

Fast Proxy Centers for the Jeffreys Centroid: The Jeffreys–Fisher–Rao Center and the Gauss–Bregman Inductive Center

by Frank Nielsen

Entropy 2024, 26(12), 1008; https://doi.org/10.3390/e26121008 - 22 Nov 2024

Cited by 1 | Viewed by 589

Abstract

The symmetric Kullback–Leibler centroid, also called the Jeffreys centroid, of a set of mutually absolutely continuous probability distributions on a measure space provides a notion of centrality which has proven useful in many tasks, including information retrieval, information fusion, and clustering. However, the [...] Read more.

The symmetric Kullback–Leibler centroid, also called the Jeffreys centroid, of a set of mutually absolutely continuous probability distributions on a measure space provides a notion of centrality which has proven useful in many tasks, including information retrieval, information fusion, and clustering. However, the Jeffreys centroid is not available in closed form for sets of categorical or multivariate normal distributions, two widely used statistical models, and thus needs to be approximated numerically in practice. In this paper, we first propose the new Jeffreys–Fisher–Rao center defined as the Fisher–Rao midpoint of the sided Kullback–Leibler centroids as a plug-in replacement of the Jeffreys centroid. This Jeffreys–Fisher–Rao center admits a generic formula for uni-parameter exponential family distributions and a closed-form formula for categorical and multivariate normal distributions; it matches exactly the Jeffreys centroid for same-mean normal distributions and is experimentally observed in practice to be close to the Jeffreys centroid. Second, we define a new type of inductive center generalizing the principle of the Gauss arithmetic–geometric double sequence mean for pairs of densities of any given exponential family. This new Gauss–Bregman center is shown experimentally to approximate very well the Jeffreys centroid and is suggested to be used as a replacement for the Jeffreys centroid when the Jeffreys–Fisher–Rao center is not available in closed form. Furthermore, this inductive center always converges and matches the Jeffreys centroid for sets of same-mean normal distributions. We report on our experiments, which first demonstrate how well the closed-form formula of the Jeffreys–Fisher–Rao center for categorical distributions approximates the costly numerical Jeffreys centroid, which relies on the Lambert W function, and second show the fast convergence of the Gauss–Bregman double sequences, which can approximate closely the Jeffreys centroid when truncated to a first few iterations. Finally, we conclude this work by reinterpreting these fast proxy Jeffreys–Fisher–Rao and Gauss–Bregman centers of Jeffreys centroids under the lens of dually flat spaces in information geometry. Full article

(This article belongs to the Special Issue Information Theory in Emerging Machine Learning Techniques)

► Show Figures

Figure 1

23 pages, 7837 KiB

Open AccessFeature PaperArticle

Understanding Higher-Order Interactions in Information Space

by Herbert Edelsbrunner, Katharina Ölsböck and Hubert Wagner

Entropy 2024, 26(8), 637; https://doi.org/10.3390/e26080637 - 27 Jul 2024

Viewed by 1664

Abstract

Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. [...] Read more.

Methods used in topological data analysis naturally capture higher-order interactions in point cloud data embedded in a metric space. This methodology was recently extended to data living in an information space, by which we mean a space measured with an information theoretical distance. One such setting is a finite collection of discrete probability distributions embedded in the probability simplex measured with the relative entropy (Kullback–Leibler divergence). More generally, one can work with a Bregman divergence parameterized by a different notion of entropy. While theoretical algorithms exist for this setup, there is a paucity of implementations for exploring and comparing geometric-topological properties of various information spaces. The interest of this work is therefore twofold. First, we propose the first robust algorithms and software for geometric and topological data analysis in information space. Perhaps surprisingly, despite working with Bregman divergences, our design reuses robust libraries for the Euclidean case. Second, using the new software, we take the first steps towards understanding the geometric-topological structure of these spaces. In particular, we compare them with the more familiar spaces equipped with the Euclidean and Fisher metrics. Full article

(This article belongs to the Special Issue Topological Data Analysis Meets Information Theory. New Perspectives for the Analysis of Higher-Order Interactions in Complex Systems)

► Show Figures

Figure 1

19 pages, 4945 KiB

Open AccessArticle

Multivariate Time Series Change-Point Detection with a Novel Pearson-like Scaled Bregman Divergence

by Tong Si, Yunge Wang, Lingling Zhang, Evan Richmond, Tae-Hyuk Ahn and Haijun Gong

Stats 2024, 7(2), 462-480; https://doi.org/10.3390/stats7020028 - 13 May 2024

Viewed by 2585

Abstract

Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. [...] Read more.

Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. Precise identification of change points in time series omics data can provide insights into the dynamic and temporal characteristics inherent to complex biological systems. Many change-point detection methods have traditionally focused on the direct estimation of data distributions. However, these approaches become unrealistic in high-dimensional data analysis. Density ratio methods have emerged as promising approaches for change-point detection since estimating density ratios is easier than directly estimating individual densities. Nevertheless, the divergence measures used in these methods may suffer from numerical instability during computation. Additionally, the most popular

α

-relative Pearson divergence cannot measure the dissimilarity between two distributions of data but a mixture of distributions. To overcome the limitations of existing density ratio-based methods, we propose a novel approach called the Pearson-like scaled-Bregman divergence-based (PLsBD) density ratio estimation method for change-point detection. Our theoretical studies derive an analytical expression for the Pearson-like scaled Bregman divergence using a mixture measure. We integrate the PLsBD with a kernel regression model and apply a random sampling strategy to identify change points in both synthetic data and real-world high-dimensional genomics data of Drosophila. Our PLsBD method demonstrates superior performance compared to many other change-point detection methods. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

16 pages, 656 KiB

Open AccessArticle

Divergences Induced by the Cumulant and Partition Functions of Exponential Families and Their Deformations Induced by Comparative Convexity

by Frank Nielsen

Entropy 2024, 26(3), 193; https://doi.org/10.3390/e26030193 - 23 Feb 2024

Cited by 1 | Viewed by 1584

Abstract

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the [...] Read more.

Exponential families are statistical models which are the workhorses in statistics, information theory, and machine learning, among others. An exponential family can either be normalized subtractively by its cumulant or free energy function, or equivalently normalized divisively by its partition function. Both the cumulant and partition functions are strictly convex and smooth functions inducing corresponding pairs of Bregman and Jensen divergences. It is well known that skewed Bhattacharyya distances between the probability densities of an exponential family amount to skewed Jensen divergences induced by the cumulant function between their corresponding natural parameters, and that in limit cases the sided Kullback–Leibler divergences amount to reverse-sided Bregman divergences. In this work, we first show that the

α

-divergences between non-normalized densities of an exponential family amount to scaled

α

-skewed Jensen divergences induced by the partition function. We then show how comparative convexity with respect to a pair of quasi-arithmetical means allows both convex functions and their arguments to be deformed, thereby defining dually flat spaces with corresponding divergences when ordinary convexity is preserved. Full article

► Show Figures

Figure 1

17 pages, 775 KiB

Open AccessArticle

Block-Active ADMM to Minimize NMF with Bregman Divergences

by Xinyao Li and Akhilesh Tyagi

Sensors 2023, 23(16), 7229; https://doi.org/10.3390/s23167229 - 17 Aug 2023

Cited by 2 | Viewed by 1301

Abstract

Over the last ten years, there has been a significant interest in employing nonnegative matrix factorization (NMF) to reduce dimensionality to enable a more efficient clustering analysis in machine learning. This technique has been applied in various image processing applications within the fields [...] Read more.

Over the last ten years, there has been a significant interest in employing nonnegative matrix factorization (NMF) to reduce dimensionality to enable a more efficient clustering analysis in machine learning. This technique has been applied in various image processing applications within the fields of computer vision and sensor-based systems. Many algorithms exist to solve the NMF problem. Among these algorithms, the alternating direction method of multipliers (ADMM) and its variants are one of the most popular methods used in practice. In this paper, we propose a block-active ADMM method to minimize the NMF problem with general Bregman divergences. The subproblems in the ADMM are solved iteratively by a block-coordinate-descent-type (BCD-type) method. In particular, each block is chosen directly based on the stationary condition. As a result, we are able to use much fewer auxiliary variables and the proposed algorithm converges faster than the previously proposed algorithms. From the theoretical point of view, the proposed algorithm is proved to converge to a stationary point sublinearly. We also conduct a series of numerical experiments to demonstrate the superiority of the proposed algorithm. Full article

(This article belongs to the Special Issue Feature Papers in Physical Sensors 2023)

► Show Figures

Figure 1

25 pages, 642 KiB

Open AccessArticle

Generalizing the Alpha-Divergences and the Oriented Kullback–Leibler Divergences with Quasi-Arithmetic Means

by Frank Nielsen

Algorithms 2022, 15(11), 435; https://doi.org/10.3390/a15110435 - 17 Nov 2022

Viewed by 2901

Abstract

The family of

α

-divergences including the oriented forward and reverse Kullback–Leibler divergences is often used in signal processing, pattern recognition, and machine learning, among others. Choosing a suitable

α

-divergence can either be done beforehand according to some prior knowledge of the [...] Read more.

The family of

α

-divergences including the oriented forward and reverse Kullback–Leibler divergences is often used in signal processing, pattern recognition, and machine learning, among others. Choosing a suitable

α

-divergence can either be done beforehand according to some prior knowledge of the application domains or directly learned from data sets. In this work, we generalize the

α

-divergences using a pair of strictly comparable weighted means. Our generalization allows us to obtain in the limit case

α \to 1

the 1-divergence, which provides a generalization of the forward Kullback–Leibler divergence, and in the limit case

α \to 0

, the 0-divergence, which corresponds to a generalization of the reverse Kullback–Leibler divergence. We then analyze the condition for a pair of weighted quasi-arithmetic means to be strictly comparable and describe the family of quasi-arithmetic

α

-divergences including its subfamily of power homogeneous

α

-divergences. In particular, we study the generalized quasi-arithmetic 1-divergences and 0-divergences and show that these counterpart generalizations of the oriented Kullback–Leibler divergences can be rewritten as equivalent conformal Bregman divergences using strictly monotone embeddings. Finally, we discuss the applications of these novel divergences to k-means clustering by studying the robustness property of the centroids. Full article

(This article belongs to the Special Issue Machine Learning for Pattern Recognition)

► Show Figures

Figure 1

35 pages, 988 KiB

Open AccessFeature PaperEditor’s ChoiceArticle

Revisiting Chernoff Information with Likelihood Ratio Exponential Families

by Frank Nielsen

Entropy 2022, 24(10), 1400; https://doi.org/10.3390/e24101400 - 1 Oct 2022

Cited by 10 | Viewed by 5347

Abstract

The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications [...] Read more.

The Chernoff information between two probability measures is a statistical divergence measuring their deviation defined as their maximally skewed Bhattacharyya distance. Although the Chernoff information was originally introduced for bounding the Bayes error in statistical hypothesis testing, the divergence found many other applications due to its empirical robustness property found in applications ranging from information fusion to quantum information. From the viewpoint of information theory, the Chernoff information can also be interpreted as a minmax symmetrization of the Kullback–Leibler divergence. In this paper, we first revisit the Chernoff information between two densities of a measurable Lebesgue space by considering the exponential families induced by their geometric mixtures: The so-called likelihood ratio exponential families. Second, we show how to (i) solve exactly the Chernoff information between any two univariate Gaussian distributions or get a closed-form formula using symbolic computing, (ii) report a closed-form formula of the Chernoff information of centered Gaussians with scaled covariance matrices and (iii) use a fast numerical scheme to approximate the Chernoff information between any two multivariate Gaussian distributions. Full article

(This article belongs to the Special Issue Robust Distance Metric Learning in the Framework of Statistical Information Theory)

► Show Figures

Graphical abstract

13 pages, 1437 KiB

Open AccessFeature PaperArticle

Improved Information-Theoretic Generalization Bounds for Distributed, Federated, and Iterative Learning

by Leighton Pate Barnes, Alex Dytso and Harold Vincent Poor

Entropy 2022, 24(9), 1178; https://doi.org/10.3390/e24091178 - 24 Aug 2022

Cited by 6 | Viewed by 2040

Abstract

We consider information-theoretic bounds on the expected generalization error for statistical learning problems in a network setting. In this setting, there are K nodes, each with its own independent dataset, and the models from the K nodes have to be aggregated into a [...] Read more.

We consider information-theoretic bounds on the expected generalization error for statistical learning problems in a network setting. In this setting, there are K nodes, each with its own independent dataset, and the models from the K nodes have to be aggregated into a final centralized model. We consider both simple averaging of the models as well as more complicated multi-round algorithms. We give upper bounds on the expected generalization error for a variety of problems, such as those with Bregman divergence or Lipschitz continuous losses, that demonstrate an improved dependence of

1 / K

on the number of nodes. These “per node” bounds are in terms of the mutual information between the training dataset and the trained weights at each node and are therefore useful in describing the generalization properties inherent to having communication or privacy constraints at each node. Full article

(This article belongs to the Special Issue Information Theory and Machine Learning)

► Show Figures

Figure 1

22 pages, 445 KiB

Open AccessArticle

Lie Symmetries of the Nonlinear Fokker-Planck Equation Based on Weighted Kaniadakis Entropy

by Iulia-Elena Hirica, Cristina-Liliana Pripoae, Gabriel-Teodor Pripoae and Vasile Preda

Mathematics 2022, 10(15), 2776; https://doi.org/10.3390/math10152776 - 4 Aug 2022

Cited by 7 | Viewed by 1890

Abstract

The paper studies the Lie symmetries of the nonlinear Fokker-Planck equation in one dimension, which are associated to the weighted Kaniadakis entropy. In particular, the Lie symmetries of the nonlinear diffusive equation, associated to the weighted Kaniadakis entropy, are found. The MaxEnt problem [...] Read more.

The paper studies the Lie symmetries of the nonlinear Fokker-Planck equation in one dimension, which are associated to the weighted Kaniadakis entropy. In particular, the Lie symmetries of the nonlinear diffusive equation, associated to the weighted Kaniadakis entropy, are found. The MaxEnt problem associated to the weighted Kaniadakis entropy is given a complete solution, together with the thermodynamic relations which extend the known ones from the non-weighted case. Several different, but related, arguments point out a subtle dichotomous behavior of the Kaniadakis constant k, distinguishing between the cases

k \in (- 1, 1)

and

k = \pm 1

. By comparison, the Lie symmetries of the NFPEs based on Tsallis q-entropies point out six “exceptional” cases, for:

q = \frac{1}{2}

,

q = \frac{3}{2}

,

q = \frac{4}{3}

,

q = \frac{7}{3}

,

q = 2

and

q = 3

. Full article

(This article belongs to the Special Issue Probability, Statistics and Their Applications 2021)

21 pages, 1068 KiB

Open AccessFeature PaperArticle

Statistical Divergences between Densities of Truncated Exponential Families with Nested Supports: Duo Bregman and Duo Jensen Divergences

by Frank Nielsen

Entropy 2022, 24(3), 421; https://doi.org/10.3390/e24030421 - 17 Mar 2022

Cited by 11 | Viewed by 5794

Abstract

By calculating the Kullback–Leibler divergence between two probability measures belonging to different exponential families dominated by the same measure, we obtain a formula that generalizes the ordinary Fenchel–Young divergence. Inspired by this formula, we define the duo Fenchel–Young divergence and report a majorization [...] Read more.

By calculating the Kullback–Leibler divergence between two probability measures belonging to different exponential families dominated by the same measure, we obtain a formula that generalizes the ordinary Fenchel–Young divergence. Inspired by this formula, we define the duo Fenchel–Young divergence and report a majorization condition on its pair of strictly convex generators, which guarantees that this divergence is always non-negative. The duo Fenchel–Young divergence is also equivalent to a duo Bregman divergence. We show how to use these duo divergences by calculating the Kullback–Leibler divergence between densities of truncated exponential families with nested supports, and report a formula for the Kullback–Leibler divergence between truncated normal distributions. Finally, we prove that the skewed Bhattacharyya distances between truncated exponential families amount to equivalent skewed duo Jensen divergences. Full article

(This article belongs to the Special Issue Information and Divergence Measures)

► Show Figures

Graphical abstract

12 pages, 288 KiB

Open AccessArticle

Update of Prior Probabilities by Minimal Divergence

by Jan Naudts

Entropy 2021, 23(12), 1668; https://doi.org/10.3390/e23121668 - 11 Dec 2021

Cited by 1 | Viewed by 1856

Abstract

The present paper investigates the update of an empirical probability distribution with the results of a new set of observations. The update reproduces the new observations and interpolates using prior information. The optimal update is obtained by minimizing either the Hellinger distance or [...] Read more.

The present paper investigates the update of an empirical probability distribution with the results of a new set of observations. The update reproduces the new observations and interpolates using prior information. The optimal update is obtained by minimizing either the Hellinger distance or the quadratic Bregman divergence. The results obtained by the two methods differ. Updates with information about conditional probabilities are considered as well. Full article

(This article belongs to the Special Issue MaxEnt 2020/2021—The 40th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering)

► Show Figures

Figure 1

18 pages, 431 KiB

Open AccessArticle

An Objective Prior from a Scoring Rule

by Stephen G. Walker and Cristiano Villa

Entropy 2021, 23(7), 833; https://doi.org/10.3390/e23070833 - 29 Jun 2021

Cited by 2 | Viewed by 1676

Abstract

In this paper, we introduce a novel objective prior distribution levering on the connections between information, divergence and scoring rules. In particular, we do so from the starting point of convex functions representing information in density functions. This provides a natural route to [...] Read more.

In this paper, we introduce a novel objective prior distribution levering on the connections between information, divergence and scoring rules. In particular, we do so from the starting point of convex functions representing information in density functions. This provides a natural route to proper local scoring rules using Bregman divergence. Specifically, we determine the prior which solves setting the score function to be a constant. Although in itself this provides motivation for an objective prior, the prior also minimizes a corresponding information criterion. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

28 pages, 1106 KiB

Open AccessEditor’s ChoiceArticle

On a Variational Definition for the Jensen-Shannon Symmetrization of Distances Based on the Information Radius

by Frank Nielsen

Entropy 2021, 23(4), 464; https://doi.org/10.3390/e23040464 - 14 Apr 2021

Cited by 21 | Viewed by 9542

Abstract

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to [...] Read more.

We generalize the Jensen-Shannon divergence and the Jensen-Shannon diversity index by considering a variational definition with respect to a generic mean, thereby extending the notion of Sibson’s information radius. The variational definition applies to any arbitrary distance and yields a new way to define a Jensen-Shannon symmetrization of distances. When the variational optimization is further constrained to belong to prescribed families of probability measures, we get relative Jensen-Shannon divergences and their equivalent Jensen-Shannon symmetrizations of distances that generalize the concept of information projections. Finally, we touch upon applications of these variational Jensen-Shannon divergences and diversity indices to clustering and quantization tasks of probability measures, including statistical mixtures. Full article

(This article belongs to the Special Issue Selected Papers from the 5th conference on Geometric Science of Information)

► Show Figures

Graphical abstract

Search Results (45)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (45)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI