The main goal is to apply and extend techniques from haplotyping literature to the construction of pan-genome graphs. Given a multiple alignment of pan-genomic references, the founder reconstruction problem is to find a small set... Read more
The role of viruses is key for understanding the environment (e.g., in the sea, the soil or the air) or the functioning of humans, animals and plants’ microbiomes. Despite their comparatively small genome sizes, viruses pose specific challenges for ... Read more
Compacted de Bruijn graphs are natural candidates for representing pan-genome graphs. The problem of constructing compacted de Bruijn graphs has been studied extensively, in both cases where the input is a (set of) genomes or raw ... Read more
We propose to explore distinct approaches when creating or adding information to a pan-genome graph. The simplest approach is to map new sequences, indicating newly discovered variants and annotating existing ones. However, when the graph ... Read more
A pan-genome stored as a graph data structure can be characterized by various attributes, such as the graph’s structural properties (average node degree, diameter, density), sequence attributes (k-mer diversity or distribution), or its functional ... Read more
The amount of sequenced genomes, and in many areas of application also the amount of annotations, have reached a mass – hundreds of thousands of sequenced genomes – that is critical for successful application of deep learning pipelines. However, ... Read more
Despite tremendous progress in genome assembly, recalcitrant genomic loci remain whose sequences cannot be resolved. Such regions are often variable in copy number and such copy number variants (CNVs) have been linked to various disorders, including neuropsychiatric conditions and autism ... Read more
The main goal is to study new representations of pan-genomes that allow fast and space-efficient queries of multiple pan-genomes, allowing their comparison and exploiting the eventual ancestral relationships. We want to overcome the limitations ... Read more
In the specific case of a set of very closely related genomes, a pan-genome can be represented by a so-called (elastic)-degenerate text that actually corresponds to the .vcf file format or, alternatively for slightly less closely related ... Read more
Comparing pan-genomes amounts to comparing two graphs, generalizing the idea to align two genomes. We aim at developing algorithms and software for 'whole-pan-genome alignment'. Though for aligning two networks approaches already ... Read more
Growing insights in biomedical research indicates a substantial role of copy number variants (CNVs) in various diseases. CNVs are represented by duplicated or deleted parts of various lengths and can affect multiple genes or change gene dosage, lead to ... Read more
Identifying significant differences between individual samples is a key problem in many areas, including diagnostics (tumor development), met- agenomic analysis (differences in sample composition), transcription analysis ... Read more
The Positional Burrows Wheeler Transform (PBWT) is a data structure that enables efficient storage and local haplotype matching over large collections of aligned linear genome sequences with genetic variation. Recently, we ... Read more
Antibiotic resistance (ABR) is a global threat to public health, and is a property primarily conferred to bacteria either by horizontal transfer of a gene or by mutational evolution. Predictions suggest ABR will kill more people than cancer by 2050, as it ... Read more