A memory-efficient tree edit distance algorithm
Database and Expert Systems Applications: 25th International Conference, DEXA …, 2014•Springer
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar
trees. The standard approach to tree similarity is the tree edit distance, which has
successfully been applied in a wide range of applications. In terms of runtime, the state-of-
the-art for the tree edit distance is the RTED algorithm, which is guaranteed to be fast
independently of the tree shape. Unfortunately, this algorithm uses twice the memory of the
other, slower algorithms. The memory is quadratic in the tree size and is a bottleneck for the …
trees. The standard approach to tree similarity is the tree edit distance, which has
successfully been applied in a wide range of applications. In terms of runtime, the state-of-
the-art for the tree edit distance is the RTED algorithm, which is guaranteed to be fast
independently of the tree shape. Unfortunately, this algorithm uses twice the memory of the
other, slower algorithms. The memory is quadratic in the tree size and is a bottleneck for the …
Abstract
Hierarchical data are often modelled as trees. An interesting query identifies pairs of similar trees. The standard approach to tree similarity is the tree edit distance, which has successfully been applied in a wide range of applications. In terms of runtime, the state-of-the-art for the tree edit distance is the RTED algorithm, which is guaranteed to be fast independently of the tree shape. Unfortunately, this algorithm uses twice the memory of the other, slower algorithms. The memory is quadratic in the tree size and is a bottleneck for the tree edit distance computation.
In this paper we present a new, memory efficient algorithm for the tree edit distance. Our algorithm runs at least as fast as RTED, but requires only half the memory. This is achieved by systematically releasing memory early during the first step of the algorithm, which computes a decomposition strategy and is the main memory bottleneck. We show the correctness of our approach and prove an upper bound for the memory usage. Our empirical evaluation confirms the low memory requirements and shows that in practice our algorithm performs better than the analytic guarantees suggest.
Springer
Showing the best result for this search. See all results