University of Cambridge

Location: Cambridge, United Kingdom

Supervisor: Prof. Richard Durbin

ESR: Pío Sierra

Objectives

The Positional Burrows Wheeler Transform (PBWT) is a data structure that enables efficient storage and local haplotype matching over large collections of aligned linear genome sequences with genetic variation. Recently, we presented the gPBWT and the tcPBWT, as two models that adapt the PBWT to work with graph-based representations of genomes. While the gPBWT supports basic operations such as enhanced indexing of variation graphs, the tcPBWT extends to construct ancestral recombination graphs, which enable to trace the ancestry of genetic variation at individual genomic sites, as part of a structure that captures ancestry relationships in a comprehensive manner. In this project, we will extend work on the gPBWT in terms of reference graph-based imputation and phasing of new genomes, and on the tcPBWT so as to reveal and exploit the genealogical relationships underlying pan-genome graphs.

Expected Results

Theory and software supporting phasing/haplotyping operations based on the gPBWT, and extension of theory, and software to highlight ancestral relationships overarching pan-genome graphs in genomic regions of interest. An important aspect is to support dynamic structures.

Adapting the Positional Burrows Wheeler Transform to pan-genome graphs