1.
Lipták, Zsuzsanna; Parmigiani, Luca
A BWT-Based Algorithm for Random de Bruijn Sequence Construction Proceedings Article
In: LATIN 2024: Theoretical Informatics, pp. 130–145, Springer Nature Switzerland, Cham, 2024, ISSN: 1611-3349.
Abstract | Links | BibTeX | Tags: Stringology
@inproceedings{Liptak2024,
title = {A BWT-Based Algorithm for Random de Bruijn Sequence Construction},
author = {Zsuzsanna Lipták and Luca Parmigiani},
url = {https://github.com/lucaparmigiani/rnd_dbseq},
doi = {10.1007/978-3-031-55598-5_9},
issn = {1611-3349},
year = {2024},
date = {2024-01-01},
urldate = {2024-01-01},
booktitle = {LATIN 2024: Theoretical Informatics},
pages = {130–145},
publisher = {Springer Nature Switzerland},
address = {Cham},
abstract = {A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny fraction of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a laptop computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm's potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is available (in C++ and python) at https://github.com/lucaparmigiani/rnd_dbseq.},
keywords = {Stringology},
pubstate = {published},
tppubtype = {inproceedings}
}
A binary de Bruijn sequence (dB sequence) of order k is a circular binary string that contains each k-length word exactly once as a substring. Most existing algorithms construct a specific dB sequence, or members of a specific class of dB sequences, representing only a tiny fraction of the complete set. The only algorithms capable of generating all dB sequences are based on finding Euler cycles in de Bruijn graphs. Here, we present an algorithm for constructing random binary dB sequences which uses the extended Burrows-Wheeler Transform. Our method is simple to implement (less than 120 lines of C++ code) and can produce random dB sequences of any order. Even though it does not output dB sequences uniformly at random, it provably outputs each dB sequence with positive probability. The algorithm runs in linear space and near-linear time in the length of the dB sequence and needs less than one second on a laptop computer for orders up to 23, including outputting the sequence. It can be straightforwardly extended to any constant-size alphabet. To the best of our knowledge, this is the first practical algorithm for generating random dB sequences which is capable of producing all dB sequences. Apart from its immediate usefulness in contexts where it is desirable to use a dB sequence that cannot be guessed easily, we also demonstrate our algorithm's potential in theoretical studies, giving hitherto unknown estimates of the average discrepancy of binary dB sequences. The code is available (in C++ and python) at https://github.com/lucaparmigiani/rnd_dbseq.
