The main goal is to apply and extend techniques from haplotyping literature to the construction of pan-genome graphs. Given a multiple alignment of pan-genomic references, the founder reconstruction problem is to find a small set of founder sequences minimizing crossovers to explain the whole align- ment. Finding optimal or approximate solutions (within a constant factor) to this problem is known to be NP-hard. In WABI 2018, we devised a linear time algorithm to find an optimal solution to a relaxed problem variant and showed that it gives a reasonable solution to the founder reconstruction in practice. This project studies how to extend this new framework to create a manageable size pan-genome graph that retains original connections well. This helps in succinctly encoding the original sequences as paths. Tailored techniques for indexing such pan-genome representation for pattern matching and comparison (WP2) are also studied.
New succinct representations of (core-)pan-genome graphs with tailored indexing algorithms.