PhD Projects – ALPACA ITN Project

Pan-genome graphs through founder sequences

The main goal is to apply and extend techniques from haplotyping literature to the construction of pan-genome graphs. Given a multiple alignment of pan-genomic references, the founder reconstruction problem is to find a small set... Read more

Pan-genomes of viruses

The role of viruses is key for understanding the environment (e.g., in the sea, the soil or the air) or the functioning of humans, animals and plants’ microbiomes. Despite their comparatively small genome sizes, viruses pose specific challenges for ... Read more

Efficiently merging compacted de Bruijn graphs

Compacted de Bruijn graphs are natural candidates for representing pan-genome graphs. The problem of constructing compacted de Bruijn graphs has been studied extensively, in both cases where the input is a (set of) genomes or raw ... Read more

Pan-genome graph update strategies

We propose to explore distinct approaches when creating or adding information to a pan-genome graph. The simplest approach is to map new sequences, indicating newly discovered variants and annotating existing ones. However, when the graph ... Read more

Measures and algorithms for alignment-free comparative pan-genomics

A pan-genome stored as a graph data structure can be characterized by various attributes, such as the graph’s structural properties (average node degree, diameter, density), sequence attributes (k-mer diversity or distribution), or its functional ... Read more

Pan-genome representations for deep machine learning applications

The amount of sequenced genomes, and in many areas of application also the amount of annotations, have reached a mass – hundreds of thousands of sequenced genomes – that is critical for successful application of deep learning pipelines. However, ... Read more

Pan-genomics of complex loci in the human genome

Despite tremendous progress in genome assembly, recalcitrant genomic loci remain whose sequences cannot be resolved. Such regions are often variable in copy number and such copy number variants (CNVs) have been linked to various disorders, including neuropsychiatric conditions and autism ... Read more

Representations for the comparative and hierarchical analysis of pan-genomes

The main goal is to study new representations of pan-genomes that allow fast and space-efficient queries of multiple pan-genomes, allowing their comparison and exploiting the eventual ancestral relationships. We want to overcome the limitations ... Read more

Matching to and between pan-genomes

In the specific case of a set of very closely related genomes, a pan-genome can be represented by a so-called (elastic)-degenerate text that actually corresponds to the .vcf file format or, alternatively for slightly less closely related ... Read more

Comparing/aligning two pan-genomes with applications in transcriptomics and microbiomics

Comparing pan-genomes amounts to comparing two graphs, generalizing the idea to align two genomes. We aim at developing algorithms and software for 'whole-pan-genome alignment'. Though for aligning two networks approaches already ... Read more

Graph-based reference to improve quantification of CNV detection in low-coverage data

Growing insights in biomedical research indicates a substantial role of copy number variants (CNVs) in various diseases. CNVs are represented by duplicated or deleted parts of various lengths and can affect multiple genes or change gene dosage, lead to ... Read more

Differential Analysis in the Context of Pan-Genome Graphs

Identifying significant differences between individual samples is a key problem in many areas, including diagnostics (tumor development), met- agenomic analysis (differences in sample composition), transcription analysis ... Read more

Adapting the Positional Burrows Wheeler Transform to pan-genome graphs

The Positional Burrows Wheeler Transform (PBWT) is a data structure that enables efficient storage and local haplotype matching over large collections of aligned linear genome sequences with genetic variation. Recently, we ... Read more

Integrating the bacterial pan-genome with GWAS and machine learning approaches to study antibiotic resistance

Antibiotic resistance (ABR) is a global threat to public health, and is a property primarily conferred to bacteria either by horizontal transfer of a gene or by mutational evolution. Predictions suggest ABR will kill more people than cancer by 2050, as it ... Read more