Approximate pattern matching with k-mismatches in packed text

Giaquinta, Emanuele; Grabowski, Szymon; Fredriksson, Kimmo

Computer Science > Data Structures and Algorithms

arXiv:1211.5433 (cs)

[Submitted on 23 Nov 2012 (v1), last revised 31 Jul 2013 (this version, v3)]

Title:Approximate pattern matching with k-mismatches in packed text

Authors:Emanuele Giaquinta, Szymon Grabowski, Kimmo Fredriksson

View PDF

Abstract:Given strings $P$ of length $m$ and $T$ of length $n$ over an alphabet of size $\sigma$, the string matching with $k$-mismatches problem is to find the positions of all the substrings in $T$ that are at Hamming distance at most $k$ from $P$. If $T$ can be read only one character at the time the best known bounds are $O(n\sqrt{k\log k})$ and $O(n + n\sqrt{k/w}\log k)$ in the word-RAM model with word length $w$. In the RAM models (including $AC^0$ and word-RAM) it is possible to read up to $\floor{w / \log \sigma}$ characters in constant time if the characters of $T$ are encoded using $\ceil{\log \sigma}$ bits. The only solution for $k$-mismatches in packed text works in $O((n \log\sigma/\log n)\ceil{m \log (k + \log n / \log\sigma) / w} + n^{\varepsilon})$ time, for any $\varepsilon > 0$. We present an algorithm that runs in time $O(\frac{n}{\floor{w/(m\log\sigma)}} (1 + \log \min(k,\sigma) \log m / \log\sigma))$ in the $AC^0$ model if $m=O(w / \log\sigma)$ and $T$ is given packed. We also describe a simpler variant that runs in time $O(\frac{n}{\floor{w/(m\log\sigma)}}\log \min(m, \log w / \log\sigma))$ in the word-RAM model. The algorithms improve the existing bound for $w = \Omega(\log^{1+\epsilon}n)$, for any $\epsilon > 0$. Based on the introduced technique, we present algorithms for several other approximate matching problems.

Comments:	This paper is an extended version of the article that appeared in Information Processing Letters 113(19-21):693-697 (2013), this http URL
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1211.5433 [cs.DS]
	(or arXiv:1211.5433v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1211.5433

Submission history

From: Emanuele Giaquinta [view email]
[v1] Fri, 23 Nov 2012 08:30:45 UTC (20 KB)
[v2] Fri, 4 Jan 2013 12:02:24 UTC (23 KB)
[v3] Wed, 31 Jul 2013 12:45:11 UTC (25 KB)

Computer Science > Data Structures and Algorithms

Title:Approximate pattern matching with k-mismatches in packed text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Approximate pattern matching with k-mismatches in packed text

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators