The amount of sequenced genomes, and in many areas of application also the amount of annotations, have reached a mass – hundreds of thousands of sequenced genomes – that is critical for successful application of deep learning pipelines. However, …
Measures and algorithms for alignment-free comparative pan-genomics
A pan-genome stored as a graph data structure can be characterized by various attributes, such as the graph’s structural properties (average node degree, diameter, density), sequence attributes (k-mer diversity or distribution), or its functional …
Pan-genome graph update strategies
We propose to explore distinct approaches when creating or adding information to a pan-genome graph. The simplest approach is to map new sequences, indicating newly discovered variants and annotating existing ones. However, when the graph …
Adapting the Positional Burrows Wheeler Transform to pan-genome graphs
The Positional Burrows Wheeler Transform (PBWT) is a data structure that enables efficient storage and local haplotype matching over large collections of aligned linear genome sequences with genetic variation. Recently, we …
Pan-genomes of viruses
The role of viruses is key for understanding the environment (e.g., in the sea, the soil or the air) or the functioning of humans, animals and plants’ microbiomes. Despite their comparatively small genome sizes, viruses pose specific challenges for …
Matching to and between pan-genomes
In the specific case of a set of very closely related genomes, a pan-genome can be represented by a so-called (elastic)-degenerate text that actually corresponds to the .vcf file format or, alternatively for slightly less closely related …
Representations for the comparative and hierarchical analysis of pan-genomes
The main goal is to study new representations of pan-genomes that allow fast and space-efficient queries of multiple pan-genomes, allowing their comparison and exploiting the eventual ancestral relationships. We want to overcome the limitations …
Comparing/aligning two pan-genomes with applications in transcriptomics and microbiomics
Comparing pan-genomes amounts to comparing two graphs, generalizing the idea to align two genomes. We aim at developing algorithms and software for ‘whole-pan-genome alignment’. Though for aligning two networks approaches already …
Pan-genomics of complex loci in the human genome
Despite tremendous progress in genome assembly, recalcitrant genomic loci remain whose sequences cannot be resolved. Such regions are often variable in copy number and such copy number variants (CNVs) have been linked to various disorders, including neuropsychiatric conditions and autism …
Integrating the bacterial pan-genome with GWAS and machine learning approaches to study antibiotic resistance
Antibiotic resistance (ABR) is a global threat to public health, and is a property primarily conferred to bacteria either by horizontal transfer of a gene or by mutational evolution. Predictions suggest ABR will kill more people than cancer by 2050, as it …