A linear size index for approximate pattern matching

HL Chan, TW Lam, WK Sung, SL Tam… - … Pattern Matching: 17th …, 2006 - Springer
HL Chan, TW Lam, WK Sung, SL Tam, SS Wong
Combinatorial Pattern Matching: 17th Annual Symposium, CPM 2006, Barcelona …, 2006Springer
This paper revisits the problem of indexing a text S 1.. n to support searching substrings in S
that match a given pattern P 1.. m with at most k errors. A naive solution either has a worst-
case matching time complexity of Ω (mk) or requires Ω (nk) space. Devising a solution with
better performance has been a challenge until Cole et al. 5 showed an O (n log kn)-space
index that can support k-error matching in O (m+ occ+ log kn loglog n) time, where occ is the
number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the …
Abstract
This paper revisits the problem of indexing a text S[1..n] to support searching substrings in S that match a given pattern P[1..m] with at most k errors. A naive solution either has a worst-case matching time complexity of Ω(m k ) or requires Ω(n k ) space. Devising a solution with better performance has been a challenge until Cole et al. [5] showed an O(n log k n)-space index that can support k-error matching in O(m + occ + log k n loglogn) time, where occ is the number of occurrences. Motivated by the indexing of DNA, we investigate in this paper the feasibility of devising a linear-size index that still has a time complexity linear in m. In particular, we give an O(n)-space index that supports k-error matching in O(m + occ + (logn) loglogn) worst-case time. Furthermore, the index can be compressed from O(n) words into O(n) bits with a slight increase in the time complexity.
Springer