-
A Distance for Geometric Graphs via the Labeled Merge Tree Interleaving Distance
Authors:
Erin Wolf Chambers,
Elizabeth Munch,
Sarah Percival,
Xinyi Wang
Abstract:
Geometric graphs appear in many real-world data sets, such as road networks, sensor networks, and molecules. We investigate the notion of distance between embedded graphs and present a metric to measure the distance between two geometric graphs via merge trees. In order to preserve as much useful information as possible from the original data, we introduce a way of rotating the sublevel set to obt…
▽ More
Geometric graphs appear in many real-world data sets, such as road networks, sensor networks, and molecules. We investigate the notion of distance between embedded graphs and present a metric to measure the distance between two geometric graphs via merge trees. In order to preserve as much useful information as possible from the original data, we introduce a way of rotating the sublevel set to obtain the merge trees via the idea of the directional transform. We represent the merge trees using a surjective multi-labeling scheme and then compute the distance between two representative matrices. We show some theoretically desirable qualities and present two methods of computation: approximation via sampling and exact distance using a kinetic data structure, both in polynomial time. We illustrate its utility by implementing it on two data sets.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
An Invitation to the Euler Characteristic Transform
Authors:
Elizabeth Munch
Abstract:
The Euler characteristic transform (ECT) is a simple to define yet powerful representation of shape. The idea is to encode an embedded shape using sub-level sets of a a function defined based on a given direction, and then returning the Euler characteristics of these sublevel sets. Because the ECT has been shown to be injective on the space of embedded simplicial complexes, it has been used for ap…
▽ More
The Euler characteristic transform (ECT) is a simple to define yet powerful representation of shape. The idea is to encode an embedded shape using sub-level sets of a a function defined based on a given direction, and then returning the Euler characteristics of these sublevel sets. Because the ECT has been shown to be injective on the space of embedded simplicial complexes, it has been used for applications spanning a range of disciplines, including plant morphology and protein structural analysis. In this survey article, we present a comprehensive overview of the Euler characteristic transform, highlighting the main idea on a simple leaf example, and surveying its its key concepts, theoretical foundations, and available applications.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Bounding the Interleaving Distance for Mapper Graphs with a Loss Function
Authors:
Erin W. Chambers,
Elizabeth Munch,
Sarah Percival,
Bei Wang
Abstract:
Data consisting of a graph with a function mapping into $\R^d$ arise in many data applications, encompassing structures such as Reeb graphs, geometric graphs, and knot embeddings. As such, the ability to compare and cluster such objects is required in a data analysis pipeline, leading to a need for distances between them. In this work, we study the interleaving distance on discretization of these…
▽ More
Data consisting of a graph with a function mapping into $\R^d$ arise in many data applications, encompassing structures such as Reeb graphs, geometric graphs, and knot embeddings. As such, the ability to compare and cluster such objects is required in a data analysis pipeline, leading to a need for distances between them. In this work, we study the interleaving distance on discretization of these objects, $\R^d$-mapper graphs, where functor representations of the data can be compared by finding pairs of natural transformations between them. However, in many cases, computation of the interleaving distance is NP-hard. For this reason, we take inspiration from recent work by Robinson to find quality measures for families of maps that do not rise to the level of a natural transformation, called assignments. We then endow the functor images with the extra structure of a metric space and define a loss function which measures how far an assignment is from making the required diagrams of an interleaving commute. Finally we show that the computation of the loss function is polynomial with a given assignment. We believe this idea is both powerful and translatable, with the potential to provide approximations and bounds on interleavings in a broad array of contexts.
△ Less
Submitted 19 March, 2024; v1 submitted 27 July, 2023;
originally announced July 2023.
-
Comparing representations of high-dimensional data with persistent homology: a case study in neuroimaging
Authors:
Ty Easley,
Kevin Freese,
Elizabeth Munch,
Janine Bijsterbosch
Abstract:
Despite much attention, the comparison of reduced-dimension representations of high-dimensional data remains a challenging problem in multiple fields, especially when representations remain high-dimensional compared to sample size. We offer a framework for evaluating the topological similarity of high-dimensional representations of very high-dimensional data, a regime where topological structure i…
▽ More
Despite much attention, the comparison of reduced-dimension representations of high-dimensional data remains a challenging problem in multiple fields, especially when representations remain high-dimensional compared to sample size. We offer a framework for evaluating the topological similarity of high-dimensional representations of very high-dimensional data, a regime where topological structure is more likely captured in the distribution of topological "noise" than a few prominent generators. Treating each representational map as a metric embedding, we compute the Vietoris-Rips persistence of its image. We then use the topological bootstrap to analyze the re-sampling stability of each representation, assigning a "prevalence score" for each nontrivial basis element of its persistence module. Finally, we compare the persistent homology of representations using a prevalence-weighted variant of the Wasserstein distance. Notably, our method is able to compare representations derived from different samples of the same distribution and, in particular, is not restricted to comparisons of graphs on the same vertex set. In addition, representations need not lie in the same metric space. We apply this analysis to a cross-sectional sample of representations of functional neuroimaging data in a large cohort and hierarchically cluster under the prevalence-weighted Wasserstein. We find that the ambient dimension of a representation is a stronger predictor of the number and stability of topological features than its decomposition rank. Our findings suggest that important topological information lies in repeatable, low-persistence homology generators, whose distributions capture important and interpretable differences between high-dimensional data representations.
△ Less
Submitted 23 November, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
NervePool: A Simplicial Pooling Layer
Authors:
Sarah McGuire,
Elizabeth Munch,
Matthew Hirn
Abstract:
For deep learning problems on graph-structured data, pooling layers are important for down sampling, reducing computational cost, and to minimize overfitting. We define a pooling layer, NervePool, for data structured as simplicial complexes, which are generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows for greater flexibility in mode…
▽ More
For deep learning problems on graph-structured data, pooling layers are important for down sampling, reducing computational cost, and to minimize overfitting. We define a pooling layer, NervePool, for data structured as simplicial complexes, which are generalizations of graphs that include higher-dimensional simplices beyond vertices and edges; this structure allows for greater flexibility in modeling higher-order relationships. The proposed simplicial coarsening scheme is built upon partitions of vertices, which allow us to generate hierarchical representations of simplicial complexes, collapsing information in a learned fashion. NervePool builds on the learned vertex cluster assignments and extends to coarsening of higher dimensional simplices in a deterministic fashion. While in practice, the pooling operations are computed via a series of matrix operations, the topological motivation is a set-theoretic construction based on unions of stars of simplices and the nerve complex
△ Less
Submitted 10 May, 2023;
originally announced May 2023.
-
Robust Zero-crossings Detection in Noisy Signals using Topological Signal Processing
Authors:
Sunia Tanweer,
Firas A. Khasawneh,
Elizabeth Munch
Abstract:
We explore a novel application of zero-dimensional persistent homology from Topological Data Analysis (TDA) for bracketing zero-crossings of both one-dimensional continuous functions, and uniformly sampled time series. We present an algorithm and show its robustness in the presence of noise for a range of sampling frequencies. In comparison to state-of-the-art software-based methods for finding ze…
▽ More
We explore a novel application of zero-dimensional persistent homology from Topological Data Analysis (TDA) for bracketing zero-crossings of both one-dimensional continuous functions, and uniformly sampled time series. We present an algorithm and show its robustness in the presence of noise for a range of sampling frequencies. In comparison to state-of-the-art software-based methods for finding zeros of a time series, our method generally converges faster, provides higher accuracy, and is capable of finding all the roots in a given interval instead of converging only to one of them. We also present and compare options for automatically setting the persistence threshold parameter that influences the accurate bracketing of the roots.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Comparing Embedded Graphs Using Average Branching Distance
Authors:
Levent Batakci,
Abigail Branson,
Bryan Castillo,
Candace Todd,
Erin Wolf Chambers,
Elizabeth Munch
Abstract:
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we capture their geometric "shape" in the plane? In this paper we explore a method to compare two…
▽ More
Graphs drawn in the plane are ubiquitous, arising from data sets through a variety of methods ranging from GIS analysis to image classification to shape analysis. A fundamental problem in this type of data is comparison: given a set of such graphs, can we rank how similar they are, in such a way that we capture their geometric "shape" in the plane? In this paper we explore a method to compare two such embedded graphs, via a simplified combinatorial representation called a tail-less merge tree which encodes the structure based on a fixed direction. First, we examine the properties of a distance designed to compare merge trees called the branching distance, and show that the distance as defined in previous work fails to satisfy some of the requirements of a metric. We incorporate this into a new distance function called average branching distance to compare graphs by looking at the branching distance for merge trees defined over many directions. Despite the theoretical issues, we show that the definition is still quite useful in practice by using our open-source code to cluster data sets of embedded graphs.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Persistent Homology of Coarse Grained State Space Networks
Authors:
Audun D. Myers,
Max M. Chumley,
Firas A. Khasawneh,
Elizabeth Munch
Abstract:
This work is dedicated to the topological analysis of complex transitional networks for dynamic state detection. Transitional networks are formed from time series data and they leverage graph theory tools to reveal information about the underlying dynamic system. However, traditional tools can fail to summarize the complex topology present in such graphs. In this work, we leverage persistent homol…
▽ More
This work is dedicated to the topological analysis of complex transitional networks for dynamic state detection. Transitional networks are formed from time series data and they leverage graph theory tools to reveal information about the underlying dynamic system. However, traditional tools can fail to summarize the complex topology present in such graphs. In this work, we leverage persistent homology from topological data analysis to study the structure of these networks. We contrast dynamic state detection from time series using a coarse-grained state-space network (CGSSN) and topological data analysis (TDA) to two state of the art approaches: ordinal partition networks (OPNs) combined with TDA and the standard application of persistent homology to the time-delay embedding of the signal. We show that the CGSSN captures rich information about the dynamic state of the underlying dynamical system as evidenced by a significant improvement in dynamic state detection and noise robustness in comparison to OPNs. We also show that because the computational time of CGSSN is not linearly dependent on the signal's length, it is more computationally efficient than applying TDA to the time-delay embedding of the time series.
△ Less
Submitted 4 August, 2023; v1 submitted 20 May, 2022;
originally announced June 2022.
-
Temporal Network Analysis Using Zigzag Persistence
Authors:
Audun Myers,
David Muñoz,
Firas Khasawneh,
Elizabeth Munch
Abstract:
This work presents a framework for studying temporal networks using zigzag persistence, a tool from the field of Topological Data Analysis (TDA). The resulting approach is general and applicable to a wide variety of time-varying graphs. For example, these graphs may correspond to a system modeled as a network with edges whose weights are functions of time, or they may represent a time series of a…
▽ More
This work presents a framework for studying temporal networks using zigzag persistence, a tool from the field of Topological Data Analysis (TDA). The resulting approach is general and applicable to a wide variety of time-varying graphs. For example, these graphs may correspond to a system modeled as a network with edges whose weights are functions of time, or they may represent a time series of a complex dynamical system. We use simplicial complexes to represent snapshots of the temporal networks that can then be analyzed using zigzag persistence. We show two applications of our method to dynamic networks: an analysis of commuting trends on multiple temporal scales, e.g., daily and weekly, in the Great Britain transportation network, and the detection of periodic/chaotic transitions due to intermittency in dynamical systems represented by temporal ordinal partition networks. Our findings show that the resulting zero- and one-dimensional zigzag persistence diagrams can detect changes in the networks' shapes that are missed by traditional connectivity and centrality graph statistics.
△ Less
Submitted 14 August, 2023; v1 submitted 23 May, 2022;
originally announced May 2022.
-
Topological Signal Processing using the Weighted Ordinal Partition Network
Authors:
Audun Myers,
Firas A. Khasawneh,
Elizabeth Munch
Abstract:
One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea…
▽ More
One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea of utilizing tools from TDA for signal processing tasks, known as topological signal processing (TSP), has gained much attention in recent years, largely through a standard pipeline that computes the persistent homology of the point cloud generated by the Takens' embedding. However, this procedure is limited by computation time since the simplicial complex generated in this case is large, but also has a great deal of redundant data. For this reason, we turn to a more recent method for encoding the structure of the attractor, which constructs an ordinal partition network (OPN) representing information about when the dynamical system has passed between certain regions of state space. The result is a weighted graph whose structure encodes information about the underlying attractor. Our previous work began to find ways to package the information of the OPN in a manner that is amenable to TDA; however, that work only used the network structure and did nothing to encode the additional weighting information. In this paper, we take the next step: building a pipeline to analyze the weighted OPN with TDA and showing that this framework provides more resilience to noise or perturbations in the system and improves the accuracy of the dynamic state detection.
△ Less
Submitted 3 August, 2022; v1 submitted 27 April, 2022;
originally announced May 2022.
-
Reeb Graph Metrics from the Ground Up
Authors:
Brian Bollen,
Erin Chambers,
Joshua A. Levine,
Elizabeth Munch
Abstract:
The Reeb graph has been utilized in various applications including the analysis of scalar fields. Recently, research has been focused on using topological signatures such as the Reeb graph to compare multiple scalar fields by defining distance metrics on the topological signatures themselves. Here we survey five existing metrics that have been defined on Reeb graphs: the bottleneck distance, the i…
▽ More
The Reeb graph has been utilized in various applications including the analysis of scalar fields. Recently, research has been focused on using topological signatures such as the Reeb graph to compare multiple scalar fields by defining distance metrics on the topological signatures themselves. Here we survey five existing metrics that have been defined on Reeb graphs: the bottleneck distance, the interleaving distance, functional distortion distance, the Reeb graph edit distance, and the universal edit distance. Our goal is to (1) provide definitions and concrete examples of these distances in order to develop the intuition of the reader, (2) visit previously proven results of stability, universality, and discriminativity, (3) identify and complete any remaining properties which have only been proven (or disproven) for a subset of these metrics, (4) expand the taxonomy of the bottleneck distance to better distinguish between variations which have been commonly miscited, and (5) reconcile the various definitions and requirements on the underlying spaces for these metrics to be defined and properties to be proven.
△ Less
Submitted 19 October, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Realizable piecewise linear paths of persistence diagrams with Reeb graphs
Authors:
Rehab Alharbi,
Erin Wolf Chambers,
Elizabeth Munch
Abstract:
Reeb graphs are widely used in a range of fields for the purposes of analyzing and comparing complex spaces via a simpler combinatorial object. Further, they are closely related to extended persistence diagrams, which largely but not completely encode the information of the Reeb graph. In this paper, we investigate the effect on the persistence diagram of a particular continuous operation on Reeb…
▽ More
Reeb graphs are widely used in a range of fields for the purposes of analyzing and comparing complex spaces via a simpler combinatorial object. Further, they are closely related to extended persistence diagrams, which largely but not completely encode the information of the Reeb graph. In this paper, we investigate the effect on the persistence diagram of a particular continuous operation on Reeb graphs; namely the (truncated) smoothing operation. This construction arises in the context of the Reeb graph interleaving distance, but separately from that viewpoint provides a simplification of the Reeb graph which continuously shrinks small loops. We then use this characterization to initiate the study of inverse problems for Reeb graphs using smoothing by showing which paths in persistence diagram space (commonly known as vineyards) can be realized by a path in the space of Reeb graphs via these simple operations. This allows us to solve the inverse problem on a certain family of piecewise linear vineyards when fixing an initial Reeb graph.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Automatic Tree Ring Detection using Jacobi Sets
Authors:
Kayla Makela,
Tim Ophelders,
Michelle Quigley,
Elizabeth Munch,
Daniel Chitwood,
Asia Dowtin
Abstract:
Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a…
▽ More
Tree ring widths are an important source of climatic and historical data, but measuring these widths typically requires extensive manual work. Computer vision techniques provide promising directions towards the automation of tree ring detection, but most automated methods still require a substantial amount of user interaction to obtain high accuracy. We perform analysis on 3D X-ray CT images of a cross-section of a tree trunk, known as a tree disk. We present novel automated methods for locating the pith (center) of a tree disk, and ring boundaries. Our methods use a combination of standard image processing techniques and tools from topological data analysis. We evaluate the efficacy of our method for two different CT scans by comparing its results to manually located rings and centers and show that it is better than current automatic methods in terms of correctly counting each ring and its location. Our methods have several parameters, which we optimize experimentally by minimizing edit distances to the manually obtained locations.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
Using Zigzag Persistent Homology to Detect Hopf Bifurcations in Dynamical Systems
Authors:
Sarah Tymochko,
Elizabeth Munch,
Firas A. Khasawneh
Abstract:
Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost…
▽ More
Bifurcations in dynamical systems characterize qualitative changes in the system behavior. Therefore, their detection is important because they can signal the transition from normal system operation to imminent failure. While standard persistent homology has been used in this setting, it usually requires analyzing a collection of persistence diagrams, which in turn drives up the computational cost considerably. Using zigzag persistence, we can capture topological changes in the state space of the dynamical system in only one persistence diagram. Here we present Bifurcations using ZigZag (BuZZ), a one-step method to study and detect bifurcations using zigzag persistence. The BuZZ method is successfully able to detect this type of behavior in two synthetic examples as well as an example dynamical system.
△ Less
Submitted 18 September, 2020;
originally announced September 2020.
-
A family of metrics from the truncated smoothing of Reeb graphs
Authors:
Erin Wolf Chambers,
Elizabeth Munch,
Tim Ophelders
Abstract:
In this paper, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we "chop off" parts near local minima and maxima during the course of smoothing, where the amount cut is controlled by a parameter $τ$. After formalizing truncation…
▽ More
In this paper, we introduce an extension of smoothing on Reeb graphs, which we call truncated smoothing; this in turn allows us to define a new family of metrics which generalize the interleaving distance for Reeb graphs. Intuitively, we "chop off" parts near local minima and maxima during the course of smoothing, where the amount cut is controlled by a parameter $τ$. After formalizing truncation as a functor, we show that when applied after the smoothing functor, this prevents extensive expansion of the range of the function, and yields particularly nice properties (such as maintaining connectivity) when combined with smoothing for $0 \leq τ\leq 2\varepsilon$, where $\varepsilon$ is the smoothing parameter. Then, for the restriction of $τ\in [0,\varepsilon]$, we have additional structure which we can take advantage of to construct a categorical flow for any choice of slope $m \in [0,1]$. Using the infrastructure built for a category with a flow, this then gives an interleaving distance for every $m \in [0,1]$, which is a generalization of the original interleaving distance, which is the case $m=0$. While the resulting metrics are not stable, we show that any pair of these for $m,m' \in [0,1)$ are strongly equivalent metrics, which in turn gives stability of each metric up to a multiplicative constant. We conclude by discussing implications of this metric within the broader family of metrics for Reeb graphs.
△ Less
Submitted 13 May, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
A Relative Theory of Interleavings
Authors:
Magnus Bakke Botnan,
Justin Curry,
Elizabeth Munch
Abstract:
The interleaving distance, although originally developed for persistent homology, has been generalized to measure the distance between functors modeled on many posets or even small categories. Existing theories require that such a poset have a superlinear family of translations or a similar structure. However, many posets of interest to topological data analysis, such as zig-zag posets and the fac…
▽ More
The interleaving distance, although originally developed for persistent homology, has been generalized to measure the distance between functors modeled on many posets or even small categories. Existing theories require that such a poset have a superlinear family of translations or a similar structure. However, many posets of interest to topological data analysis, such as zig-zag posets and the face relation poset of a cell-complex, do not admit interesting translations, and consequently don't admit a nice theory of interleavings. In this paper we show how one can side-step this limitation by providing a general theory where one maps to a poset that does admit interesting translations, such as the lattice of down sets, and then defines interleavings relative to this map. Part of our theory includes a rigorous notion of discretization or "pixelization" of poset modules, which in turn we use for interleaving inference. We provide an approximation condition that in the setting of lattices gives rise to two possible pixelizations, both of which are guaranteed to be close in the interleaving distance. Finally, we conclude by considering interleaving inference for cosheaves over a metric space and give an explicit description of interleavings over a grid structure on Euclidean space.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Fast and Scalable Complex Network Descriptor Using PageRank and Persistent Homology
Authors:
Mustafa Hajij,
Elizabeth Munch,
Paul Rosen
Abstract:
The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this article, we use the PageRank function along with persistent homology to obtain a scalable graph descriptor and utilize it to compare the similarities between graphs. For a given graph $G(V,E)$, our descriptor can be computed in $O(|E|α(|V|))$, where…
▽ More
The PageRank of a graph is a scalar function defined on the node set of the graph which encodes nodes centrality information of the graph. In this article, we use the PageRank function along with persistent homology to obtain a scalable graph descriptor and utilize it to compare the similarities between graphs. For a given graph $G(V,E)$, our descriptor can be computed in $O(|E|α(|V|))$, where $α$ is the inverse Ackermann function which makes it scalable and computable on massive graphs. We show the effectiveness of our method by utilizing it on multiple shape mesh datasets.
△ Less
Submitted 11 September, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Chatter Diagnosis in Milling Using Supervised Learning and Topological Features Vector
Authors:
Melih C. Yesilli,
Sarah Tymochko,
Firas A. Khasawneh,
Elizabeth Munch
Abstract:
Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning alg…
▽ More
Chatter detection has become a prominent subject of interest due to its effect on cutting tool life, surface finish and spindle of machine tool. Most of the existing methods in chatter detection literature are based on signal processing and signal decomposition. In this study, we use topological features of data simulating cutting tool vibrations, combined with four supervised machine learning algorithms to diagnose chatter in the milling process. Persistence diagrams, a method of representing topological features, are not easily used in the context of machine learning, so they must be transformed into a form that is more amenable. Specifically, we will focus on two different methods for featurizing persistence diagrams, Carlsson coordinates and template functions. In this paper, we provide classification results for simulated data from various cutting configurations, including upmilling and downmilling, in addition to the same data with some added noise. Our results show that Carlsson Coordinates and Template Functions yield accuracies as high as 96% and 95%, respectively. We also provide evidence that these topological methods are noise robust descriptors for chatter detection.
△ Less
Submitted 27 October, 2019;
originally announced October 2019.
-
Adaptive Partitioning for Template Functions on Persistence Diagrams
Authors:
Sarah Tymochko,
Elizabeth Munch,
Firas A. Khasawneh
Abstract:
As the field of Topological Data Analysis continues to show success in theory and in applications, there has been increasing interest in using tools from this field with methods for machine learning. Using persistent homology, specifically persistence diagrams, as inputs to machine learning techniques requires some mathematical creativity. The space of persistence diagrams does not have the desira…
▽ More
As the field of Topological Data Analysis continues to show success in theory and in applications, there has been increasing interest in using tools from this field with methods for machine learning. Using persistent homology, specifically persistence diagrams, as inputs to machine learning techniques requires some mathematical creativity. The space of persistence diagrams does not have the desirable properties for machine learning, thus methods such as kernel methods and vectorization methods have been developed. One such featurization of persistence diagrams by Perea, Munch and Khasawneh uses continuous, compactly supported functions, referred to as "template functions," which results in a stable vector representation of the persistence diagram. In this paper, we provide a method of adaptively partitioning persistence diagrams to improve these featurizations based on localized information in the diagrams. Additionally, we provide a framework to adaptively select parameters required for the template functions in order to best utilize the partitioning method. We present results for application to example data sets comparing classification results between template function featurizations with and without partitioning, in addition to other methods from the literature.
△ Less
Submitted 18 October, 2019;
originally announced October 2019.
-
Probabilistic Convergence and Stability of Random Mapper Graphs
Authors:
Adam Brown,
Omer Bobrowski,
Elizabeth Munch,
Bei Wang
Abstract:
We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $\mathbb{X}$ equipped with a continuous function $f: \mathbb{X} \rightarrow \mathbb{R}$. We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $\mathbb{R}$. We then introduce a variant of the cl…
▽ More
We study the probabilistic convergence between the mapper graph and the Reeb graph of a topological space $\mathbb{X}$ equipped with a continuous function $f: \mathbb{X} \rightarrow \mathbb{R}$. We first give a categorification of the mapper graph and the Reeb graph by interpreting them in terms of cosheaves and stratified covers of the real line $\mathbb{R}$. We then introduce a variant of the classic mapper graph of Singh et al.~(2007), referred to as the enhanced mapper graph, and demonstrate that such a construction approximates the Reeb graph of $(\mathbb{X}, f)$ when it is applied to points randomly sampled from a probability density function concentrated on $(\mathbb{X}, f)$.
Our techniques are based on the interleaving distance of constructible cosheaves and topological estimation via kernel density estimates. Following Munch and Wang (2018), we first show that the mapper graph of $(\mathbb{X}, f)$, a constructible $\mathbb{R}$-space (with a fixed open cover), approximates the Reeb graph of the same space. We then construct an isomorphism between the mapper of $(\mathbb{X},f)$ to the mapper of a super-level set of a probability density function concentrated on $(\mathbb{X}, f)$. Finally, building on the approach of Bobrowski et al.~(2017), we show that, with high probability, we can recover the mapper of the super-level set given a sufficiently large sample. Our work is the first to consider the mapper construction using the theory of cosheaves in a probabilistic setting. It is part of an ongoing effort to combine sheaf theory, probability, and statistics, to support topological data analysis with random data.
△ Less
Submitted 14 August, 2020; v1 submitted 8 September, 2019;
originally announced September 2019.
-
A Structural Average of Labeled Merge Trees for Uncertainty Visualization
Authors:
Lin Yan,
Yusu Wang,
Elizabeth Munch,
Ellen Gasparovic,
Bei Wang
Abstract:
Physical phenomena in science and engineering are frequently modeled using scalar fields. In scalar field topology, graph-based topological descriptors such as merge trees, contour trees, and Reeb graphs are commonly used to characterize topological changes in the (sub)level sets of scalar fields. One of the biggest challenges and opportunities to advance topology-based visualization is to underst…
▽ More
Physical phenomena in science and engineering are frequently modeled using scalar fields. In scalar field topology, graph-based topological descriptors such as merge trees, contour trees, and Reeb graphs are commonly used to characterize topological changes in the (sub)level sets of scalar fields. One of the biggest challenges and opportunities to advance topology-based visualization is to understand and incorporate uncertainty into such topological descriptors to effectively reason about their underlying data. In this paper, we study a structural average of a set of labeled merge trees and use it to encode uncertainty in data. Specifically, we compute a 1-center tree that minimizes its maximum distance to any other tree in the set under a well-defined metric called the interleaving distance. We provide heuristic strategies that compute structural averages of merge trees whose labels do not fully agree. We further provide an interactive visualization system that resembles a numerical calculator that takes as input a set of merge trees and outputs a tree as their structural average. We also highlight structural similarities between the input and the average and incorporate uncertainty information for visual exploration. We develop a novel measure of uncertainty, referred to as consistency, via a metric-space view of the input trees. Finally, we demonstrate an application of our framework through merge trees that arise from ensembles of scalar fields. Our work is the first to employ interleaving distances and consistency to study a global, mathematically rigorous, structural average of merge trees in the context of uncertainty visualization.
△ Less
Submitted 8 October, 2019; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Intrinsic Interleaving Distance for Merge Trees
Authors:
Ellen Gasparovic,
Elizabeth Munch,
Steve Oudot,
Katharine Turner,
Bei Wang,
Yusu Wang
Abstract:
Merge trees are a type of graph-based topological summary that tracks the evolution of connected components in the sublevel sets of scalar functions. They enjoy widespread applications in data analysis and scientific visualization. In this paper, we consider the problem of comparing two merge trees via the notion of interleaving distance in the metric space setting. We investigate various theoreti…
▽ More
Merge trees are a type of graph-based topological summary that tracks the evolution of connected components in the sublevel sets of scalar functions. They enjoy widespread applications in data analysis and scientific visualization. In this paper, we consider the problem of comparing two merge trees via the notion of interleaving distance in the metric space setting. We investigate various theoretical properties of such a metric. In particular, we show that the interleaving distance is intrinsic on the space of labeled merge trees and provide an algorithm to construct metric 1-centers for collections of labeled merge trees. We further prove that the intrinsic property of the interleaving distance also holds for the space of unlabeled merge trees. Our results are a first step toward performing statistics on graph-based topological summaries.
△ Less
Submitted 2 February, 2022; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Persistent Homology of Complex Networks for Dynamic State Detection
Authors:
Audun Myers,
Elizabeth Munch,
Firas A. Khasawneh
Abstract:
In this paper we develop a novel Topological Data Analysis (TDA) approach for studying graph representations of time series of dynamical systems. Specifically, we show how persistent homology, a tool from TDA, can be used to yield a compressed, multi-scale representation of the graph that can distinguish between dynamic states such as periodic and chaotic behavior. We show the approach for two gra…
▽ More
In this paper we develop a novel Topological Data Analysis (TDA) approach for studying graph representations of time series of dynamical systems. Specifically, we show how persistent homology, a tool from TDA, can be used to yield a compressed, multi-scale representation of the graph that can distinguish between dynamic states such as periodic and chaotic behavior. We show the approach for two graph constructions obtained from the time series. In the first approach the time series is embedded into a point cloud which is then used to construct an undirected $k$-nearest neighbor graph. The second construct relies on the recently developed ordinal partition framework. In either case, a pairwise distance matrix is then calculated using the shortest path between the graph's nodes, and this matrix is utilized to define a filtration of a simplicial complex that enables tracking the changes in homology classes over the course of the filtration. These changes are summarized in a persistence diagram---a two-dimensional summary of changes in the topological features. We then extract existing as well as new geometric and entropy point summaries from the persistence diagram and compare to other commonly used network characteristics. Our results show that persistence-based point summaries yield a clearer distinction of the dynamic behavior and are more robust to noise than existing graph-based scores, especially when combined with ordinal graphs.
△ Less
Submitted 27 January, 2020; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Approximating Continuous Functions on Persistence Diagrams Using Template Functions
Authors:
Jose A. Perea,
Elizabeth Munch,
Firas A. Khasawneh
Abstract:
The persistence diagram is an increasingly useful tool from Topological Data Analysis, but its use alongside typical machine learning techniques requires mathematical finesse. The most success to date has come from methods that map persistence diagrams into vector spaces, in a way which maximizes the structure preserved. This process is commonly referred to as featurization. In this paper, we desc…
▽ More
The persistence diagram is an increasingly useful tool from Topological Data Analysis, but its use alongside typical machine learning techniques requires mathematical finesse. The most success to date has come from methods that map persistence diagrams into vector spaces, in a way which maximizes the structure preserved. This process is commonly referred to as featurization. In this paper, we describe a mathematical framework for featurization called \emph{template functions}, and we show that it addresses the problem of approximating continuous functions on compact subsets of the space of persistence diagrams. Specifically, we begin by characterizing relative compactness with respect to the bottleneck distance, and then provide explicit theoretical methods for constructing compact-open dense subsets of continuous functions on persistence diagrams. These dense subsets -- obtained via template functions -- are leveraged for supervised learning tasks with persistence diagrams. Specifically, we test the method for classification and regression algorithms on several examples including shape data and dynamical systems.
△ Less
Submitted 12 April, 2022; v1 submitted 19 February, 2019;
originally announced February 2019.
-
Using Persistent Homology to Quantify a Diurnal Cycle in Hurricane Felix
Authors:
Sarah Tymochko,
Elizabeth Munch,
Jason Dunion,
Kristen Corbosiero,
Ryan Torn
Abstract:
The diurnal cycle of tropical cyclones (TCs) is a daily cycle in clouds that appears in satellite images and may have implications for TC structure and intensity. The diurnal pattern can be seen in infrared (IR) satellite imagery as cyclical pulses in the cloud field that propagate radially outward from the center of nearly all Atlantic-basin TCs. These diurnal pulses, a distinguishing characteris…
▽ More
The diurnal cycle of tropical cyclones (TCs) is a daily cycle in clouds that appears in satellite images and may have implications for TC structure and intensity. The diurnal pattern can be seen in infrared (IR) satellite imagery as cyclical pulses in the cloud field that propagate radially outward from the center of nearly all Atlantic-basin TCs. These diurnal pulses, a distinguishing characteristic of the TC diurnal cycle, begin forming in the storm's inner core near sunset each day and appear as a region of cooling cloud-top temperatures. The area of cooling takes on a ring-like appearance as cloud-top warming occurs on its inside edge and the cooling moves away from the storm overnight, reaching several hundred kilometers from the circulation center by the following afternoon. The state-of-the-art TC diurnal cycle measurement has a limited ability to analyze the behavior beyond qualitative observations. We present a method for quantifying the TC diurnal cycle using one-dimensional persistent homology, a tool from Topological Data Analysis, by tracking maximum persistence and quantifying the cycle using the discrete Fourier transform. Using Geostationary Operational Environmental Satellite IR imagery data from Hurricane Felix (2007), our method is able to detect an approximate daily cycle.
△ Less
Submitted 10 January, 2020; v1 submitted 16 February, 2019;
originally announced February 2019.
-
Computing Wasserstein Distance for Persistence Diagrams on a Quantum Computer
Authors:
Jesse J. Berwald,
Joel M. Gottlieb,
Elizabeth Munch
Abstract:
Persistence diagrams are a useful tool from topological data analysis which can be used to provide a concise description of a filtered topological space. What makes them even more useful in practice is that they come with a notion of a metric, the Wasserstein distance (closely related to but not the same as the homonymous metric from probability theory). Further, this metric provides a notion of s…
▽ More
Persistence diagrams are a useful tool from topological data analysis which can be used to provide a concise description of a filtered topological space. What makes them even more useful in practice is that they come with a notion of a metric, the Wasserstein distance (closely related to but not the same as the homonymous metric from probability theory). Further, this metric provides a notion of stability; that is, small noise in the input causes at worst small differences in the output. In this paper, we show that the Wasserstein distance for persistence diagrams can be computed through quantum annealing. We provide a formulation of the problem as a Quadratic Unconstrained Binary Optimization problem, or QUBO, and prove correctness. Finally, we test our algorithm, exploring parameter choices and problem size capabilities, using a D-Wave 2000Q quantum annealing computer.
△ Less
Submitted 2 November, 2018; v1 submitted 17 September, 2018;
originally announced September 2018.
-
Topological Data Analysis for True Step Detection in Piecewise Constant Signals
Authors:
Firas A. Khasawneh,
Elizabeth Munch
Abstract:
This paper introduces a simple yet powerful approach based on topological data analysis (TDA) for detecting the true steps in a piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, and subject to spurious steps at the rising or falling edges which we refer to as digital ringing. We use persistent homology to derive mathematical guar…
▽ More
This paper introduces a simple yet powerful approach based on topological data analysis (TDA) for detecting the true steps in a piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, and subject to spurious steps at the rising or falling edges which we refer to as digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is described and tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables the accurate calculation of the spindle speed with the appropriate error bounds. The results of the described approach are compared to the frequency domain approach via Fourier transform. It is found that both our approach and the Fourier analysis yield comparable results for numerical and experimental pulses with regular spacing and digital ringing. However, the described approach significantly outperforms Fourier analysis when the spacing between the peaks is varied. We also generalize the approach to higher dimensional PWC signals, although utilizing this extension remains an interesting question for future research.
△ Less
Submitted 1 May, 2018;
originally announced May 2018.
-
Chatter Classification in Turning Using Machine Learning and Topological Data Analysis
Authors:
Firas A. Khasawneh,
Elizabeth Munch,
Jose A. Perea
Abstract:
Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chat…
▽ More
Chatter identification and detection in machining processes has been an active area of research in the past two decades. Part of the challenge in studying chatter is that machining equations that describe its occurrence are often nonlinear delay differential equations. The majority of the available tools for chatter identification rely on defining a metric that captures the characteristics of chatter, and a threshold that signals its occurrence. The difficulty in choosing these parameters can be somewhat alleviated by utilizing machine learning techniques. However, even with a successful classification algorithm, the transferability of typical machine learning methods from one data set to another remains very limited. In this paper we combine supervised machine learning with Topological Data Analysis (TDA) to obtain a descriptor of the process which can detect chatter. The features we use are derived from the persistence diagram of an attractor reconstructed from the time series via Takens embedding. We test the approach using deterministic and stochastic turning models, where the stochasticity is introduced via the cutting coefficient term. Our results show a 97% successful classification rate on the deterministic model labeled by the stability diagram obtained using the spectral element method. The features gleaned from the deterministic model are then utilized for characterization of chatter in a stochastic turning model where there are very limited analysis methods.
△ Less
Submitted 23 March, 2018;
originally announced April 2018.
-
The $\ell^\infty$-Cophenetic Metric for Phylogenetic Trees as an Interleaving Distance
Authors:
Elizabeth Munch,
Anastasios Stefanou
Abstract:
There are many metrics available to compare phylogenetic trees since this is a fundamental task in computational biology. In this paper, we focus on one such metric, the $\ell^\infty$-cophenetic metric introduced by Cardona et al. This metric works by representing a phylogenetic tree with $n$ labeled leaves as a point in $\mathbb{R}^{n(n+1)/2}$ known as the cophenetic vector, then comparing the tw…
▽ More
There are many metrics available to compare phylogenetic trees since this is a fundamental task in computational biology. In this paper, we focus on one such metric, the $\ell^\infty$-cophenetic metric introduced by Cardona et al. This metric works by representing a phylogenetic tree with $n$ labeled leaves as a point in $\mathbb{R}^{n(n+1)/2}$ known as the cophenetic vector, then comparing the two resulting Euclidean points using the $\ell^\infty$ distance. Meanwhile, the interleaving distance is a formal categorical construction generalized from the definition of Chazal et al., originally introduced to compare persistence modules arising from the field of topological data analysis. We show that the $\ell^\infty$-cophenetic metric is an example of an interleaving distance. To do this, we define phylogenetic trees as a category of merge trees with some additional structure; namely labelings on the leaves plus a requirement that morphisms respect these labels. Then we can use the definition of a flow on this category to give an interleaving distance. Finally, we show that, because of the additional structure given by the categories defined, the map sending a labeled merge tree to the cophenetic vector is, in fact, an isometric embedding, thus proving that the $\ell^\infty$-cophenetic metric is, in fact, an interleaving distance.
△ Less
Submitted 28 February, 2018;
originally announced March 2018.
-
Convergence between Categorical Representations of Reeb Space and Mapper
Authors:
Elizabeth Munch,
Bei Wang
Abstract:
The Reeb space, which generalizes the notion of a Reeb graph, is one of the few tools in topological data analysis and visualization suitable for the study of multivariate scientific datasets. First introduced by Edelsbrunner et al., it compresses the components of the level sets of a multivariate mapping and obtains a summary representation of their relationships. A related construction called ma…
▽ More
The Reeb space, which generalizes the notion of a Reeb graph, is one of the few tools in topological data analysis and visualization suitable for the study of multivariate scientific datasets. First introduced by Edelsbrunner et al., it compresses the components of the level sets of a multivariate mapping and obtains a summary representation of their relationships. A related construction called mapper, and a special case of the mapper construction called the Joint Contour Net have been shown to be effective in visual analytics. Mapper and JCN are intuitively regarded as discrete approximations of the Reeb space, however without formal proofs or approximation guarantees. An open question has been proposed by Dey et al. as to whether the mapper construction converges to the Reeb space in the limit.
In this paper, we are interested in developing the theoretical understanding of the relationship between the Reeb space and its discrete approximations to support its use in practical data analysis. Using tools from category theory, we formally prove the convergence between the Reeb space and mapper in terms of an interleaving distance between their categorical representations. Given a sequence of refined discretizations, we prove that these approximations converge to the Reeb space in the interleaving distance; this also helps to quantify the approximation quality of the discretization at a fixed resolution.
△ Less
Submitted 12 April, 2016; v1 submitted 13 December, 2015;
originally announced December 2015.
-
Categorified Reeb Graphs
Authors:
Vin de Silva,
Elizabeth Munch,
Amit Patel
Abstract:
The Reeb graph is a construction which originated in Morse theory to study a real valued function defined on a topological space. More recently, it has been used in various applications to study noisy data which creates a desire to define a measure of similarity between these structures. Here, we exploit the fact that the category of Reeb graphs is equivalent to the category of a particular class…
▽ More
The Reeb graph is a construction which originated in Morse theory to study a real valued function defined on a topological space. More recently, it has been used in various applications to study noisy data which creates a desire to define a measure of similarity between these structures. Here, we exploit the fact that the category of Reeb graphs is equivalent to the category of a particular class of cosheaf. Using this equivalency, we can define an `interleaving' distance between Reeb graphs which is stable under the perturbation of a function. Along the way, we obtain a natural construction for smoothing a Reeb graph to reduce its topological complexity. The smoothed Reeb graph can be constructed in polynomial time.
△ Less
Submitted 16 January, 2015;
originally announced January 2015.
-
Strong Equivalence of the Interleaving and Functional Distortion Metrics for Reeb Graphs
Authors:
Ulrich Bauer,
Elizabeth Munch,
Yusu Wang
Abstract:
The Reeb graph is a construction that studies a topological space through the lens of a real valued function. It has widely been used in applications, however its use on real data means that it is desirable and increasingly necessary to have methods for comparison of Reeb graphs. Recently, several methods to define metrics on the space of Reeb graphs have been presented. In this paper, we focus on…
▽ More
The Reeb graph is a construction that studies a topological space through the lens of a real valued function. It has widely been used in applications, however its use on real data means that it is desirable and increasingly necessary to have methods for comparison of Reeb graphs. Recently, several methods to define metrics on the space of Reeb graphs have been presented. In this paper, we focus on two: the functional distortion distance and the interleaving distance. The former is based on the Gromov--Hausdorff distance, while the latter utilizes the equivalence between Reeb graphs and a particular class of cosheaves. However, both are defined by constructing a near-isomorphism between the two graphs of study. In this paper, we show that the two metrics are strongly equivalent on the space of Reeb graphs. In particular, this gives an immediate proof of bottleneck stability for persistence diagrams in terms of the Reeb graph interleaving distance.
△ Less
Submitted 20 December, 2014;
originally announced December 2014.
-
Probabilistic Fréchet Means for Time Varying Persistence Diagrams
Authors:
Elizabeth Munch,
Katharine Turner,
Paul Bendich,
Sayan Mukherjee,
Jonathan Mattingly,
John Harer
Abstract:
In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In 2011, Mileyko and his collaborators made the first study of the properties of the Fréchet mean in $(\mathcal{D}_p,W_p)$, the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet…
▽ More
In order to use persistence diagrams as a true statistical tool, it would be very useful to have a good notion of mean and variance for a set of diagrams. In 2011, Mileyko and his collaborators made the first study of the properties of the Fréchet mean in $(\mathcal{D}_p,W_p)$, the space of persistence diagrams equipped with the p-th Wasserstein metric. In particular, they showed that the Fréchet mean of a finite set of diagrams always exists, but is not necessarily unique. The means of a continuously-varying set of diagrams do not themselves (necessarily) vary continuously, which presents obvious problems when trying to extend the Fréchet mean definition to the realm of vineyards.
We fix this problem by altering the original definition of Fréchet mean so that it now becomes a probability measure on the set of persistence diagrams; in a nutshell, the mean of a set of diagrams will be a weighted sum of atomic measures, where each atom is itself a persistence diagram determined using a perturbation of the input diagrams. This definition gives for each $N$ a map $(\mathcal{D}_p)^N \to \mathbb{P}(\mathcal{D}_p)$. We show that this map is Hölder continuous on finite diagrams and thus can be used to build a useful statistic on time-varying persistence diagrams, better known as vineyards.
△ Less
Submitted 17 November, 2014; v1 submitted 24 July, 2013;
originally announced July 2013.
-
Failure Filtrations for Fenced Sensor Networks
Authors:
Elizabeth Munch,
Michael Shapiro,
John Harer
Abstract:
In this paper we consider the question of sensor network coverage for a 2-dimensional domain. We seek to compute the probability that a set of sensors fails to cover given only non-metric, local (who is talking to whom) information and a probability distribution of failure of each node. This builds on the work of de Silva and Ghrist who analyzed this problem in the deterministic situation. We firs…
▽ More
In this paper we consider the question of sensor network coverage for a 2-dimensional domain. We seek to compute the probability that a set of sensors fails to cover given only non-metric, local (who is talking to whom) information and a probability distribution of failure of each node. This builds on the work of de Silva and Ghrist who analyzed this problem in the deterministic situation. We first show that a it is part of a slightly larger class of problems which is #P-complete, and thus fast algorithms likely do not exist unless P$=$NP. We then give a deterministic algorithm which is feasible in the case of a small set of sensors, and give a dynamic algorithm for an arbitrary set of sensors failing over time which utilizes a new criterion for coverage based on the one proposed by de Silva and Ghrist. These algorithms build on the theory of topological persistence.
△ Less
Submitted 29 September, 2011;
originally announced September 2011.