๐Ÿ•‘ 16:00 CET
๐Ÿ“… January 8, 2024
๐Ÿ“ Online

Title: Movi — a fast and cache-efficient full-text pangenome index

Abstract:

Efficient pangenome indexes are promising tools for rapid classification of nanopore sequencing reads.  Recently, a compressed-index data structure called the “move structure” was proposed as an alternative to other BWT-based indexes like the FM index and r-index. The move structure uniquely achieves both O(r) space and O(1)-time queries, where r is the number of runs in the pangenome BWT. We implemented Movi, an efficient tool for building and querying move-structure pangenome indexes. While the size of the Movi’s index is larger than the r-index, it scales at a smaller rate for pangenome references, as its size is exactly proportional to r, the number of runs in the BWT of the reference. Movi can compute matching queries needed for classification โ€“ such as pseudo-matching lengths โ€“ at least ten times faster than the fastest available methods. Movi achieves this speed by leveraging the move structure’s strong locality of reference, incurring close to the minimum possible number of cache misses for queries against large pangenomes. Movi’s fast constant-time query loop makes it well suited to real-time applications like adaptive sampling for nanopore sequencing, where decisions must be made within small time and resource budgets.  This work is led by postdoctoral fellow Dr. Mohsen Zakeri, with contributions from Ph.D. students Nathaniel Brown and Omar Ahmed and collaborator Dr. Travis Gagie.  The move structure was originally described by Drs. Takaaki Nishimoto and Yasuo Tabei.

Recorded Video: https://youtu.be/t7luD8Wnk7w

Ben Langmead — Movi: a fast and cache-efficient full-text pangenome index