2013/042 - On Context-Diverse Repeats and their Incremental Computation
Matthias Gallé, Matias Tealdi
8th International Conference on Language and Automata Theory and Applications , LATA 2014, Madrid, Spain, March 10-14, 2014.
The context in which a substring appears is an important notion to identify - for example - its semantic meaning. However, existing classes of repeats fail to take this into account directly. We present here xkcd-repeats, a new family of repeats characterized by the number of different symbols at the left and right of their occurrences. These repeats include as special extreme cases maximal and super-maximal repeats. We give sufficient and necessary condition to bound their number linearly in the size of the sequence, and show an optimal algorithm that computes them in linear time - given a suffix array - independent on the size of the alphabet, as well as two other algorithms that are faster in practice. Additionally, we provide an independent and general framework that allows to compute these (and other) repeats incrementally; extending the application space of repeats in a streaming framework.