Objectives
A pan-genome stored as a graph data structure can be characterized by various attributes, such as the graph’s structural properties (average node degree, diameter, density), sequence attributes (k-mer diversity or distribution), or its functional content (relative size of the core or accessory genome). Such attributes will be used to define alignment-free measures for pan-genome similarity or distance, to be used for estimating relationships between the input pan- genomes, for example using distance based phylogenetic tree reconstruction methods. Algorithms to compute these measures will be developed, implemented, tested, and applied to real data. The result will be a software tool for quantitative pan-genome comparison.
Expected Results
New distance or similarity measures for pan-genomes and efficient software for their computation.