Skip to main content

Showing 1–49 of 49 results for author: Radoszewski, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.14550  [pdf, other

    cs.DS

    Approximate Circular Pattern Matching under Edit Distance

    Authors: Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, Wiktor Zuba

    Abstract: In the $k$-Edit Circular Pattern Matching ($k$-Edit CPM) problem, we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a positive integer threshold $k$, and we are to report all starting positions of the substrings of $T$ that are at edit distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to check if any such substring exists. Very re… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Full version of a paper accepted to STACS 2024

  2. arXiv:2308.04289  [pdf, ps, other

    cs.DS

    Linear Time Construction of Cover Suffix Tree and Applications

    Authors: Jakub Radoszewski

    Abstract: The Cover Suffix Tree (CST) of a string $T$ is the suffix tree of $T$ with additional explicit nodes corresponding to halves of square substrings of $T$. In the CST an explicit node corresponding to a substring $C$ of $T$ is annotated with two numbers: the number of non-overlapping consecutive occurrences of $C$ and the total number of positions in $T$ that are covered by occurrences of $C$ in… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted to ESA 2023. Abstract abridged to satisfy arxiv requirements

  3. arXiv:2208.08915  [pdf, other

    cs.DS

    Approximate Circular Pattern Matching

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Jakub Radoszewski, Solon P. Pissis, Wojciech Rytter, Tomasz Waleń, Wiktor Zuba

    Abstract: We consider approximate circular pattern matching (CPM, in short) under the Hamming and edit distance, in which we are given a length-$n$ text $T$, a length-$m$ pattern $P$, and a threshold $k>0$, and we are to report all starting positions of fragments of $T$ (called occurrences) that are at distance at most $k$ from some cyclic rotation of $P$. In the decision version of the problem, we are to c… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted to ESA 2022. Abstract abridged to meet arXiv requirements

  4. arXiv:2107.09206  [pdf, other

    cs.DS

    Hardness of Detecting Abelian and Additive Square Factors in Strings

    Authors: Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assuming the 3SUM conjecture) of several problems related to finding Abelian square and additive square factors in a string. In particular, we conclude conditional optimality of the state-of-the-art algorithms for finding such factors. Overall, we show 3SUM-hardness of (a) detecting an Abelian square factor of an odd half-length, (… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Accepted to ESA 2021

  5. arXiv:2105.03106  [pdf, other

    cs.DS

    Faster Algorithms for Longest Common Substring

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

    Abstract: In the classic longest common substring (LCS) problem, we are given two strings $S$ and $T$, each of length at most $n$, over an alphabet of size $σ$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. Weiner, in his seminal paper that introduced the suffix tree, presented an $\mathcal{O}(n \log σ)$-time algorithm for this problem [SWAT 1973]. For polynomially-b… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

  6. arXiv:2007.13471  [pdf, ps, other

    cs.DS

    Internal Quasiperiod Queries

    Authors: Maxime Crochemore, Costas Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: Internal pattern matching requires one to answer queries about factors of a given string. Many results are known on answering internal period queries, asking for the periods of a given factor. In this paper we investigate (for the first time) internal queries asking for covers (also known as quasiperiods) of a given factor. We propose a data structure that answers such queries in… ▽ More

    Submitted 27 July, 2020; originally announced July 2020.

    Comments: To appear in the SPIRE 2020 proceedings

  7. arXiv:2006.16137  [pdf, other

    cs.DS

    Pattern Masking for Dictionary Matching

    Authors: Panagiotis Charalampopoulos, Huiping Chen, Peter Christen, Grigorios Loukides, Nadia Pisanti, Solon P. Pissis, Jakub Radoszewski

    Abstract: In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $\mathcal{D}$ of $d$ strings, each of length $\ell$, a query string $q$ of length $\ell$, and a positive integer $z$, and we are asked to compute a smallest set $K\subseteq\{1,\ldots,\ell\}$, so that if $q[i]$, for all $i\in K$, is replaced by a wildcard, then $q$ matches at least $z$ strings from… ▽ More

    Submitted 8 March, 2024; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Published in Algorithmica. Abstract abridged due to arXiv requirements

  8. arXiv:2006.15999  [pdf, ps, other

    cs.DS cs.DM

    The Number of Repetitions in 2D-Strings

    Authors: Panagiotis Charalampopoulos, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń, Wiktor Zuba

    Abstract: The notions of periodicity and repetitions in strings, and hence these of runs and squares, naturally extend to two-dimensional strings. We consider two types of repetitions in 2D-strings: 2D-runs and quartics (quartics are a 2D-version of squares in standard strings). Amir et al. introduced 2D-runs, showed that there are $O(n^3)$ of them in an $n \times n$ 2D-string and presented a simple constru… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: To appear in the ESA 2020 proceedings

  9. arXiv:2005.06329  [pdf, ps, other

    cs.DS

    k-Approximate Quasiperiodicity under Hamming and Edit Distance

    Authors: Aleksander Kędzierski, Jakub Radoszewski

    Abstract: Quasiperiodicity in strings was introduced almost 30 years ago as an extension of string periodicity. The basic notions of quasiperiodicity are cover and seed. A cover of a text $T$ is a string whose occurrences in $T$ cover all positions of $T$. A seed of text $T$ is a cover of a superstring of $T$. In various applications exact quasiperiodicity is still not sufficient due to the presence of erro… ▽ More

    Submitted 13 May, 2020; originally announced May 2020.

    Comments: accepted to CPM 2020

  10. arXiv:2005.05681  [pdf, ps, other

    cs.DS

    Counting Distinct Patterns in Internal Dictionary Matching

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: We consider the problem of preprocessing a text $T$ of length $n$ and a dictionary $\mathcal{D}$ in order to be able to efficiently answer queries $CountDistinct(i,j)$, that is, given $i$ and $j$ return the number of patterns from $\mathcal{D}$ that occur in the fragment $T[i \mathinner{.\,.} j]$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of… ▽ More

    Submitted 12 May, 2020; originally announced May 2020.

    Comments: Accepted to CPM 2020

  11. arXiv:2004.13389  [pdf, other

    cs.DS

    Approximating longest common substring with $k$ mismatches: Theory and practice

    Authors: Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, Tatiana Starikovskaya

    Abstract: In the problem of the longest common substring with $k$ mismatches we are given two strings $X, Y$ and must find the maximal length $\ell$ such that there is a length-$\ell$ substring of $X$ and a length-$\ell$ substring of $Y$ that differ in at most $k$ positions. The length $\ell$ can be used as a robust measure of similarity between $X, Y$. In this work, we develop new approximation algorithms… ▽ More

    Submitted 28 April, 2020; originally announced April 2020.

  12. arXiv:1909.11577  [pdf, ps, other

    cs.DS

    Internal Dictionary Matching

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Manal Mohamed, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary $\mathcal{D}$ in fragments of a given string $T$ of length $n$. The dictionary is internal in the sense that each pattern in $\mathcal{D}$ is given as a fragment of $T$. This way, $\mathcal{D}$ takes space proportional to the number of patterns $d=|\mathcal{D}|$ rather than their total len… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: A short version of this paper was accepted for presentation at ISAAC 2019

  13. arXiv:1909.11433  [pdf, ps, other

    cs.DS

    Weighted Shortest Common Supersequence Problem Revisited

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings $W_1$ and $W_2$ and a threshold $\mathit{Freq}$ on probability, and… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: Accepted to SPIRE'19

  14. arXiv:1909.11336  [pdf, other

    cs.DS

    Experimental Evaluation of Algorithms for Computing Quasiperiods

    Authors: Patryk Czajka, Jakub Radoszewski

    Abstract: Quasiperiodicity is a generalization of periodicity that was introduced in the early 1990s. Since then, dozens of algorithms for computing various types of quasiperiodicity were proposed. Our work is a step towards answering the question: "Which algorithm for computing quasiperiods to choose in practice?". The central notions of quasiperiodicity are covers and seeds. We implement algorithms for co… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

  15. arXiv:1908.01664  [pdf, other

    cs.DS

    On the cyclic regularities of strings

    Authors: Oluwole Ajala, Miznah Alshammary, Mai Alzamel, Jia Gao, Costas Iliopoulos, Jakub Radoszewski, Wojciech Rytter, Bruce Watson

    Abstract: Regularities in strings are often related to periods and covers, which have extensively been studied, and algorithms for their efficient computation have broad application. In this paper we concentrate on computing cyclic regularities of strings, in particular, we propose several efficient algorithms for computing: (i) cyclic periodicity; (ii) all cyclic periodicity; (iii) maximal local cyclic per… ▽ More

    Submitted 5 August, 2019; originally announced August 2019.

  16. arXiv:1907.01815  [pdf, other

    cs.DS

    Circular Pattern Matching with $k$ Mismatches

    Authors: Panagiotis Charalampopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: The $k$-mismatch problem consists in computing the Hamming distance between a pattern $P$ of length $m$ and every length-$m$ substring of a text $T$ of length $n$, if this distance is no more than $k$. In many real-world applications, any cyclic rotation of $P$ is a relevant pattern, and thus one is interested in computing the minimal distance of every length-$m$ substring of $T$ and any cyclic ro… ▽ More

    Submitted 13 January, 2020; v1 submitted 3 July, 2019; originally announced July 2019.

    Comments: Extended version of a paper from FCT 2019

  17. arXiv:1901.11305  [pdf, other

    cs.DS

    Quasi-Linear-Time Algorithm for Longest Common Circular Factor

    Authors: Mai Alzamel, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: We introduce the Longest Common Circular Factor (LCCF) problem in which, given strings $S$ and $T$ of length $n$, we are to compute the longest factor of $S$ whose cyclic shift occurs as a factor of $T$. It is a new similarity measure, an extension of the classic Longest Common Factor. We show how to solve the LCCF problem in $O(n \log^5 n)$ time.

    Submitted 31 January, 2019; originally announced January 2019.

    ACM Class: F.2.2

  18. arXiv:1812.08101  [pdf, ps, other

    cs.DS

    Efficient Representation and Counting of Antipower Factors in Words

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: A $k$-antipower (for $k \ge 2$) is a concatenation of $k$ pairwise distinct words of the same length. The study of fragments of a word being antipowers was initiated by Fici et al. (ICALP 2016) and first algorithms for computing such fragments were presented by Badkobeh et al. (Inf. Process. Lett., 2018). We address two open problems posed by Badkobeh et al. We propose efficient algorithms for cou… ▽ More

    Submitted 10 May, 2020; v1 submitted 19 December, 2018; originally announced December 2018.

    Comments: Full version of a paper from LATA 2019

  19. arXiv:1807.11702  [pdf, ps, other

    cs.DS

    Efficient Computation of Sequence Mappability

    Authors: Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Juliusz Straszyński

    Abstract: In the $(k,m)$-mappability problem, for a given sequence $T$ of length $n$, the goal is to compute a table whose $i$th entry is the number of indices $j \ne i$ such that the length-$m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k=1$. We present… ▽ More

    Submitted 16 June, 2021; v1 submitted 31 July, 2018; originally announced July 2018.

    Comments: Accepted to SPIRE 2018

    ACM Class: F.2.2

  20. arXiv:1807.10483  [pdf, other

    cs.DS

    Faster Recovery of Approximate Periods over Edit Distance

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Juliusz Straszyński, Tomasz Waleń, Wiktor Zuba

    Abstract: The approximate period recovery problem asks to compute all $\textit{approximate word-periods}$ of a given word $S$ of length $n$: all primitive words $P$ ($|P|=p$) which have a periodic extension at edit distance smaller than $τ_p$ from $S$, where $τ_p = \lfloor \frac{n}{(3.75+ε)\cdot p} \rfloor$ for some $ε>0$. Here, the set of periodic extensions of $P$ consists of all finite prefixes of… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

    Comments: Accepted to SPIRE 2018

  21. arXiv:1804.08731  [pdf, other

    cs.DS

    Longest Common Substring Made Fully Dynamic

    Authors: Amihood Amir, Panagiotis Charalampopoulos, Solon P. Pissis, Jakub Radoszewski

    Abstract: In the longest common substring (LCS) problem, we are given two strings $S$ and $T$, each of length at most $n$, and we are asked to find a longest string occurring as a fragment of both $S$ and $T$. This is a classical and well-studied problem in computer science with a known $\mathcal{O}(n)$-time solution. In the fully dynamic version of the problem, edit operations are allowed in either of the… ▽ More

    Submitted 16 July, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

  22. arXiv:1804.06809  [pdf, ps, other

    cs.DS

    On Abelian Longest Common Factor with and without RLE

    Authors: Szymon Grabowski, Tomasz Kociumaka, Jakub Radoszewski

    Abstract: We consider the Abelian longest common factor problem in two scenarios: when input strings are uncompressed and are of size $n$, and when the input strings are run-length encoded and their compressed representations have size at most $m$. The alphabet size is denoted by $σ$. For the uncompressed problem, we show an $o(n^2)$-time and $\Oh(n)$-space algorithm in the case of $σ=\Oh(1)$, making a non-… ▽ More

    Submitted 18 April, 2018; originally announced April 2018.

    Comments: Submitted to a journal

    MSC Class: 68W32 ACM Class: F.2.2

  23. arXiv:1802.06369  [pdf, ps, other

    cs.DS

    Linear-Time Algorithm for Long LCF with $k$ Mismatches

    Authors: Panagiotis Charalampopoulos, Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: In the Longest Common Factor with $k$ Mismatches (LCF$_k$) problem, we are given two strings $X$ and $Y$ of total length $n$, and we are asked to find a pair of maximal-length factors, one of $X$ and the other of $Y$, such that their Hamming distance is at most $k$. Thankachan et al. show that this problem can be solved in $\mathcal{O}(n \log^k n)$ time and $\mathcal{O}(n)$ space for constant $k$.… ▽ More

    Submitted 18 February, 2018; originally announced February 2018.

    Comments: submitted to CPM 2018

  24. arXiv:1801.01404  [pdf, other

    cs.DS

    String Periods in the Order-Preserving Model

    Authors: Garance Gourdel, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Arseny Shur, Tomasz Waleń

    Abstract: The order-preserving model (op-model, in short) was introduced quite recently but has already attracted significant attention because of its applications in data analysis. We introduce several types of periods in this setting (op-periods). Then we give algorithms to compute these periods in time $O(n)$, $O(n\log\log n)$, $O(n \log^2 \log n/\log \log \log n)$, $O(n\log n)$ depending on the type of… ▽ More

    Submitted 4 January, 2018; originally announced January 2018.

    Comments: Full version of a paper accepted to STACS 2018

  25. arXiv:1801.01096  [pdf, ps, other

    cs.DM cs.DS math.CO

    On Periodicity Lemma for Partial Words

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: We investigate the function $L(h,p,q)$, called here the threshold function, related to periodicity of partial words (words with holes). The value $L(h,p,q)$ is defined as the minimum length threshold which guarantees that a natural extension of the periodicity lemma is valid for partial words with $h$ holes and (strong) periods $p,q$. We show how to evaluate the threshold function in… ▽ More

    Submitted 3 January, 2018; originally announced January 2018.

    Comments: Full version of a paper accepted to LATA 2018

  26. arXiv:1712.08573  [pdf, ps, other

    cs.DS

    Longest common substring with approximately $k$ mismatches

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Tatiana Starikovskaya

    Abstract: In the longest common substring problem, we are given two strings of length $n$ and must find a substring of maximal length that occurs in both strings. It is well known that the problem can be solved in linear time, but the solution is not robust and can vary greatly when the input strings are changed even by one character. To circumvent this, Leimeister and Morgenstern introduced the problem of… ▽ More

    Submitted 18 August, 2018; v1 submitted 22 December, 2017; originally announced December 2017.

    Comments: extended version of a paper from CPM 2016 with corrected proofs

  27. arXiv:1705.04022  [pdf, ps, other

    cs.DS

    Faster algorithms for 1-mappability of a sequence

    Authors: Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, Wing-Kin Sung

    Abstract: In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present t… ▽ More

    Submitted 11 May, 2017; originally announced May 2017.

  28. arXiv:1704.07625  [pdf, ps, other

    cs.DS

    Indexing Weighted Sequences: Neat and Efficient

    Authors: Carl Barton, Tomasz Kociumaka, Chang Liu, Solon P. Pissis, Jakub Radoszewski

    Abstract: In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or uncertain data, for example, in molecular biology where they are known under the name of Position-Weight Matrices. Given a probability threshold $\frac1z$, we say t… ▽ More

    Submitted 25 August, 2017; v1 submitted 25 April, 2017; originally announced April 2017.

    Comments: A new, even simpler version of the index

  29. arXiv:1703.08931  [pdf, ps, other

    cs.DS

    Palindromic Decompositions with Gaps and Errors

    Authors: Michał Adamczyk, Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Jakub Radoszewski

    Abstract: Identifying palindromes in sequences has been an interesting line of research in combinatorics on words and also in computational biology, after the discovery of the relation of palindromes in the DNA sequence with the HIV virus. Efficient algorithms for the factorization of sequences into palindromes and maximal palindromes have been devised in recent years. We extend these studies by allowing ga… ▽ More

    Submitted 27 March, 2017; originally announced March 2017.

    Comments: accepted to CSR 2017

  30. arXiv:1703.00195  [pdf, ps, other

    cs.FL cs.DM

    Two strings at Hamming distance 1 cannot be both quasiperiodic

    Authors: Amihood Amir, Costas S. Iliopoulos, Jakub Radoszewski

    Abstract: We present a generalization of a known fact from combinatorics on words related to periodicity into quasiperiodicity. A string is called periodic if it has a period which is at most half of its length. A string $w$ is called quasiperiodic if it has a non-trivial cover, that is, there exists a string $c$ that is shorter than $w$ and such that every position in $w$ is inside one of the occurrences o… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

    Comments: 6 pages, 3 figures

  31. arXiv:1607.05626  [pdf, ps, other

    cs.DS

    Streaming k-mismatch with error correcting and applications

    Authors: Jakub Radoszewski, Tatiana Starikovskaya

    Abstract: We present a new streaming algorithm for the $k$-Mismatch problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most $k$ from the pattern. Our algorithm is enhanced with an important new feature called Error Correcting, and its complexities for $k=1$ and for a general $k$ are compa… ▽ More

    Submitted 23 April, 2019; v1 submitted 19 July, 2016; originally announced July 2016.

  32. arXiv:1606.08275  [pdf, ps, other

    cs.DS

    Near-Optimal Computation of Runs over General Alphabet via Non-Crossing LCE Queries

    Authors: Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Ritu Kundu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: Longest common extension queries (LCE queries) and runs are ubiquitous in algorithmic stringology. Linear-time algorithms computing runs and preprocessing for constant-time LCE queries have been known for over a decade. However, these algorithms assume a linearly-sortable integer alphabet. A recent breakthrough paper by Bannai et.\ al.\ (SODA 2015) showed a link between the two notions: all the ru… ▽ More

    Submitted 27 June, 2016; originally announced June 2016.

    ACM Class: F.2.2

  33. arXiv:1604.07581  [pdf, ps, other

    cs.DS

    Pattern Matching and Consensus Problems on Weighted Sequences and Profiles

    Authors: Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

    Abstract: We study pattern matching problems on two major representations of uncertain sequences used in molecular biology: weighted sequences (also known as position weight matrices, PWM) and profiles (i.e., scoring matrices). In the simple version, in which only the pattern or only the text is uncertain, we obtain efficient algorithms with theoretically-provable running times using a variation of the look… ▽ More

    Submitted 11 July, 2016; v1 submitted 26 April, 2016; originally announced April 2016.

    Comments: 22 pages

  34. arXiv:1604.02238  [pdf, ps, other

    cs.DM cs.FL

    Maximum Number of Distinct and Nonequivalent Nonstandard Squares in a Word

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: The combinatorics of squares in a word depends on how the equivalence of halves of the square is defined. We consider Abelian squares, parameterized squares, and order-preserving squares. The word $uv$ is an Abelian (parameterized, order-preserving) square if $u$ and $v$ are equivalent in the Abelian (parameterized, order-preserving) sense. The maximum number of ordinary squares in a word is known… ▽ More

    Submitted 8 April, 2016; originally announced April 2016.

    Comments: Preliminary version appeared at DLT 2014

    MSC Class: 68R15; 68R05; 68Q45; 05A05 ACM Class: G.2.1

  35. arXiv:1602.01116  [pdf, other

    cs.DS

    Efficient Index for Weighted Sequences

    Authors: Carl Barton, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski

    Abstract: The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to support efficient on-line pattern queries. We study this problem in the case where the text is weighted: for every position of the text and every letter of the alph… ▽ More

    Submitted 2 February, 2016; originally announced February 2016.

    Comments: 14 pages

  36. On the Greedy Algorithm for the Shortest Common Superstring Problem with Reversals

    Authors: Gabriele Fici, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: We study a variation of the classical Shortest Common Superstring (SCS) problem in which a shortest superstring of a finite set of strings $S$ is sought containing as a factor every string of $S$ or its reversal. We call this problem Shortest Common Superstring with Reversals (SCS-R). This problem has been introduced by Jiang et al., who designed a greedy-like algorithm with length approximation r… ▽ More

    Submitted 7 December, 2015; v1 submitted 26 November, 2015; originally announced November 2015.

    Comments: Published in Information Processing Letters

  37. Efficient Ranking of Lyndon Words and Decoding Lexicographically Minimal de Bruijn Sequence

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter

    Abstract: We give efficient algorithms for ranking Lyndon words of length $n$ over an alphabet of size $σ$. The rank of a Lyndon word is its position in the sequence of lexicographically ordered Lyndon words of the same length. The outputs are integers of exponential size, and complexity of arithmetic operations on such large integers cannot be ignored. Our model of computations is the word-RAM, in which ba… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 October, 2015; originally announced October 2015.

    Comments: Corrected an error in the proof of Theorem 32. Applied comments of reviewers from the journal submission

    Journal ref: SIAM J. Discret. Math. 30(4): 2027-2046 (2016)

  38. arXiv:1412.3696  [pdf, ps, other

    cs.DS

    Covering Problems for Partial Words and for Indeterminate Strings

    Authors: Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: We consider the problem of computing a shortest solid cover of an indeterminate string. An indeterminate string may contain non-solid symbols, each of which specifies a subset of the alphabet that could be present at the corresponding position. We also consider covering partial words, which are a special case of indeterminate strings where each non-solid symbol is a don't care symbol. We prove tha… ▽ More

    Submitted 11 December, 2014; originally announced December 2014.

    Comments: full version (simplified and corrected); preliminary version appeared at ISAAC 2014; 14 pages, 4 figures

    MSC Class: 68W32 (Primary); 68Q25 (Secondary) ACM Class: F.2.2

  39. arXiv:1407.6144  [pdf, ps, other

    cs.DS

    On the String Consensus Problem and the Manhattan Sequence Consensus Problem

    Authors: Tomasz Kociumaka, Jakub W. Pachocki, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: In the Manhattan Sequence Consensus problem (MSC problem) we are given $k$ integer sequences, each of length $l$, and we are to find an integer sequence $x$ of length $l$ (called a consensus sequence), such that the maximum Manhattan distance of $x$ from each of the input sequences is minimized. For binary sequences Manhattan distance coincides with Hamming distance, hence in this case the string… ▽ More

    Submitted 23 July, 2014; originally announced July 2014.

    Comments: accepted to SPIRE 2014

  40. arXiv:1401.0163  [pdf, ps, other

    cs.DS

    Fast Algorithm for Partial Covers in Words

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Solon P. Pissis, Tomasz Waleń

    Abstract: A factor $u$ of a word $w$ is a cover of $w$ if every position in $w$ lies within some occurrence of $u$ in $w$. A word $w$ covered by $u$ thus generalizes the idea of a repetition, that is, a word composed of exact concatenations of $u$. In this article we introduce a new notion of $α$-partial cover, which can be viewed as a relaxed variant of cover, that is, a factor covering at least $α$ positi… ▽ More

    Submitted 31 December, 2013; originally announced January 2014.

  41. arXiv:1312.2381  [pdf, ps, other

    cs.DS cs.FL

    A Note on the Longest Common Compatible Prefix Problem for Partial Words

    Authors: Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, Tomasz Waleń

    Abstract: For a partial word $w$ the longest common compatible prefix of two positions $i,j$, denoted $lccp(i,j)$, is the largest $k$ such that $w[i,i+k-1]\uparrow w[j,j+k-1]$, where $\uparrow$ is the compatibility relation of partial words (it is not an equivalence relation). The LCCP problem is to preprocess a partial word in such a way that any query $lccp(i,j)$ about this word can be answered in $O(1)$… ▽ More

    Submitted 9 December, 2013; originally announced December 2013.

  42. Internal Pattern Matching Queries in a Text and Applications

    Authors: Tomasz Kociumaka, Jakub Radoszewski, Wojciech Rytter, Tomasz Waleń

    Abstract: We consider several types of internal queries, that is, questions about fragments of a given text $T$ specified in constant space by their locations in $T$. Our main result is an optimal data structure for Internal Pattern Matching (IPM) queries which, given two fragments $x$ and $y$, ask for a representation of all fragments contained in $y$ and matching $x$ exactly; this problem can be viewed as… ▽ More

    Submitted 2 May, 2023; v1 submitted 25 November, 2013; originally announced November 2013.

    Comments: 42 pages, 13 figures; an updated version of a paper presented at SODA 2015

    MSC Class: 68W05 (Primary) 68P05 (Secondary) ACM Class: E.1; F.2.2

  43. arXiv:1303.6872  [pdf, other

    cs.DS

    Order-Preserving Suffix Trees and Their Algorithmic Applications

    Authors: Maxime Crochemore, Costas S. Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Alessio Langiu, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen

    Abstract: Recently Kubica et al. (Inf. Process. Let., 2013) and Kim et al. (submitted to Theor. Comp. Sci.) introduced order-preserving pattern matching. In this problem we are looking for consecutive substrings of the text that have the same "shape" as a given pattern. These results include a linear-time order-preserving pattern matching algorithm for polynomially-bounded alphabet and an extension of this… ▽ More

    Submitted 27 March, 2013; originally announced March 2013.

  44. arXiv:1208.3313  [pdf, ps, other

    cs.DS cs.DM

    A Note on Efficient Computation of All Abelian Periods in a String

    Authors: Maxime Crochemore, Costas Iliopoulos, Tomasz Kociumaka, Marcin Kubica, Jakub Pachocki, Jakub Radoszewski, Wojciech Rytter, Wojciech Tyczyński, Tomasz Waleń

    Abstract: We derive a simple efficient algorithm for Abelian periods knowing all Abelian squares in a string. An efficient algorithm for the latter problem was given by Cummings and Smyth in 1997. By the way we show an alternative algorithm for Abelian squares. We also obtain a linear time algorithm finding all `long' Abelian periods. The aim of the paper is a (new) reduction of the problem of all Abelian p… ▽ More

    Submitted 16 August, 2012; originally announced August 2012.

    ACM Class: F.2.2

  45. arXiv:1107.2422  [pdf, ps, other

    cs.DS

    A Linear Time Algorithm for Seeds Computation

    Authors: Tomasz Kociumaka, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen

    Abstract: A seed in a word is a relaxed version of a period in which the occurrences of the repeating subword may overlap. We show a linear-time algorithm computing a linear-size representation of all the seeds of a word (the number of seeds might be quadratic). In particular, one can easily derive the shortest seed and the number of seeds from our representation. Thus, we solve an open problem stated in th… ▽ More

    Submitted 13 March, 2019; v1 submitted 12 July, 2011; originally announced July 2011.

    Comments: full version of a paper submitted to SODA 2012 with simplified algorithms and new combinatorial results

  46. arXiv:1104.3153  [pdf, ps, other

    cs.DS

    Efficient Seeds Computation Revisited

    Authors: Michalis Christou, Maxime Crochemore, Costas S. Iliopoulos, Marcin Kubica, Solon P. Pissis, Jakub Radoszewski, Wojciech Rytter, Bartosz Szreder, Tomasz Walen

    Abstract: The notion of the cover is a generalization of a period of a string, and there are linear time algorithms for finding the shortest cover. The seed is a more complicated generalization of periodicity, it is a cover of a superstring of a given string, and the shortest seed problem is of much higher algorithmic difficulty. The problem is not well understood, no linear time algorithm is known. In the… ▽ More

    Submitted 15 April, 2011; originally announced April 2011.

    Comments: 14 pages, accepted to CPM 2011

  47. On the maximal sum of exponents of runs in a string

    Authors: Maxime Crochemore, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen

    Abstract: A run is an inclusion maximal occurrence in a string (as a subinterval) of a repetition $v$ with a period $p$ such that $2p \le |v|$. The exponent of a run is defined as $|v|/p$ and is $\ge 2$. We show new bounds on the maximal sum of exponents of runs in a string of length $n$. Our upper bound of $4.1n$ is better than the best previously known proven bound of $5.6n$ by Crochemore & Ilie (2008). T… ▽ More

    Submitted 25 March, 2010; originally announced March 2010.

    Comments: 7 pages, 1 figure

  48. On the maximal number of cubic subwords in a string

    Authors: Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen

    Abstract: We investigate the problem of the maximum number of cubic subwords (of the form $www$) in a given word. We also consider square subwords (of the form $ww$). The problem of the maximum number of squares in a word is not well understood. Several new results related to this problem are produced in the paper. We consider two simple problems related to the maximum number of subwords which are squares… ▽ More

    Submitted 6 November, 2009; originally announced November 2009.

    Comments: 14 pages

  49. arXiv:0907.2157  [pdf, ps, other

    cs.DS cs.DM

    On the maximal number of highly periodic runs in a string

    Authors: Maxime Crochemore, Costas Iliopoulos, Marcin Kubica, Jakub Radoszewski, Wojciech Rytter, Tomasz Walen

    Abstract: A run is a maximal occurrence of a repetition $v$ with a period $p$ such that $2p \le |v|$. The maximal number of runs in a string of length $n$ was studied by several authors and it is known to be between $0.944 n$ and $1.029 n$. We investigate highly periodic runs, in which the shortest period $p$ satisfies $3p \le |v|$. We show the upper bound $0.5n$ on the maximal number of such runs in a st… ▽ More

    Submitted 13 July, 2009; originally announced July 2009.

    Comments: 8 pages, 2 figures