-
A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs
Authors:
Charles Tapley Hoyt,
Max Berrendorf,
Mikhail Galkin,
Volker Tresp,
Benjamin M. Gyori
Abstract:
The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for…
▽ More
The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.
△ Less
Submitted 19 April, 2022; v1 submitted 14 March, 2022;
originally announced March 2022.
-
ChemicalX: A Deep Learning Library for Drug Pair Scoring
Authors:
Benedek Rozemberczki,
Charles Tapley Hoyt,
Anna Gogleva,
Piotr Grabowski,
Klas Karis,
Andrej Lamov,
Andriy Nikolov,
Sebastian Nilsson,
Michael Ughetto,
Yu Wang,
Tyler Derr,
Benjamin M Gyori
Abstract:
In this paper, we introduce ChemicalX, a PyTorch-based deep learning library designed for providing a range of state of the art models to solve the drug pair scoring task. The primary objective of the library is to make deep drug pair scoring models accessible to machine learning researchers and practitioners in a streamlined framework.The design of ChemicalX reuses existing high level model train…
▽ More
In this paper, we introduce ChemicalX, a PyTorch-based deep learning library designed for providing a range of state of the art models to solve the drug pair scoring task. The primary objective of the library is to make deep drug pair scoring models accessible to machine learning researchers and practitioners in a streamlined framework.The design of ChemicalX reuses existing high level model training utilities, geometric deep learning, and deep chemistry layers from the PyTorch ecosystem. Our system provides neural network layers, custom pair scoring architectures, data loaders, and batch iterators for end users. We showcase these features with example code snippets and case studies to highlight the characteristics of ChemicalX. A range of experiments on real world drug-drug interaction, polypharmacy side effect, and combination synergy prediction tasks demonstrate that the models available in ChemicalX are effective at solving the pair scoring task. Finally, we show that ChemicalX could be used to train and score machine learning models on large drug pair datasets with hundreds of thousands of compounds on commodity hardware.
△ Less
Submitted 26 May, 2022; v1 submitted 10 February, 2022;
originally announced February 2022.
-
A Simple Standard for Sharing Ontological Mappings (SSSOM)
Authors:
Nicolas Matentzoglu,
James P. Balhoff,
Susan M. Bello,
Chris Bizon,
Matthew Brush,
Tiffany J. Callahan,
Christopher G Chute,
William D. Duncan,
Chris T. Evelo,
Davera Gabriel,
John Graybeal,
Alasdair Gray,
Benjamin M. Gyori,
Melissa Haendel,
Henriette Harmse,
Nomi L. Harris,
Ian Harrow,
Harshad Hegde,
Amelia L. Hoyt,
Charles T. Hoyt,
Dazhi Jiao,
Ernesto Jiménez-Ruiz,
Simon Jupp,
Hyeongsik Kim,
Sebastian Koehler
, et al. (19 additional authors not shown)
Abstract:
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, ar…
▽ More
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Are they associated in some other way? Such relationships between the mapped terms are often not documented, leading to incorrect assumptions and making them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Also, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones.
The Simple Standard for Sharing Ontological Mappings (SSSOM) addresses these problems by: 1. Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. 2. Defining an easy to use table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data standards. 3. Implementing open and community-driven collaborative workflows designed to evolve the standard continuously to address changing requirements and mapping practices. 4. Providing reference tools and software libraries for working with the standard.
In this paper, we present the SSSOM standard, describe several use cases, and survey some existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable, and Reusable (FAIR). The SSSOM specification is at http://w3id.org/sssom/spec.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Probabilistic verification of partially observable dynamical systems
Authors:
Benjamin M. Gyori,
Daniel Paulin,
Sucheendra K. Palaniappan
Abstract:
The construction and formal verification of dynamical models is important in engineering, biology and other disciplines. We focus on non-linear models containing a set of parameters governing their dynamics. The value of these parameters is often unknown and not directly observable through measurements, which are themselves noisy. When treating parameters as random variables, one can constrain the…
▽ More
The construction and formal verification of dynamical models is important in engineering, biology and other disciplines. We focus on non-linear models containing a set of parameters governing their dynamics. The value of these parameters is often unknown and not directly observable through measurements, which are themselves noisy. When treating parameters as random variables, one can constrain their distribution by conditioning on observations and thereby constructing a posterior probability distribution. We aim to perform model verification with respect to this posterior. The main difficulty in performing verification on a model under the posterior distribution is that in general, it is difficult to obtain \emph{independent} samples from the posterior, especially for non-linear dynamical models. Standard statistical model checking methods require independent realizations of the system and are therefore not applicable in this context.
We propose a Markov chain Monte Carlo based statistical model checking framework, which produces a sequence of dependent random realizations of the model dynamics over the parameter posterior. Using this sequence of samples, we use statistical hypothesis tests to verify whether the model satisfies a bounded temporal logic property with a certain probability. We use sample size bounds tailored to the setting of dependent samples for fixed sample size and sequential tests. We apply our method to a case-study from the domain of systems biology, to a model of the JAK-STAT biochemical pathway. The pathway is modeled as a system of non-linear ODEs containing a set of unknown parameters. Noisy, indirect observations of the system state are available from an experiment. The results show that the proposed method enables probabilistic verification with respect to the parameter posterior with specified error bounds.
△ Less
Submitted 17 April, 2015; v1 submitted 4 November, 2014;
originally announced November 2014.