Pisa University

Location: Pisa, Italy

Supervisor: Prof. Nadia Pisanti

ESR: Njagi Mwaniki

Objectives

In the specific case of a set of very closely related genomes, a pan-genome can be represented by a so-called (elastic)-degenerate text that actually corresponds to the .vcf file format or, alternatively for slightly less closely related genomes, pan-genomes need to be represented with graphs such as coloured de Bruijn graphs. Several problems can be addressed on such data: pattern matching with a pattern being a simple sequence or another degenerate string or a graph, finding regularities or other structural properties, decompositions into palindromes or other constrained features, finding local similarities among pan- genomes and more in general define similarity notions, detecting inter- pan-genomes variants, etc. Variations of such tasks can be conceived: (i) using weights associated to variants according to frequency or confidence; (ii) designing on-line methods rather than off-line ones such as indexing; (iii) approximate problem version taking into account sequencing errors; (iv) using multiple patterns or texts to extend the range of applications.

Expected Results

New algorithms and tools to investigate and compare a set of closely related genomes.

Matching to and between pan-genomes