Location: Bratislava, Slovakia
Supervisor: Dr. Tomás Vinar
ESR: Alessia Petescia
Identifying significant differences between individual samples is a key problem in many areas, including diagnostics (tumor development), met- agenomic analysis (differences in sample composition), transcription analysis (differential expression), and epidemiology (strain identification). While it is often possible to assemble sequencing data and compare the actual sequences, such approaches require high coverage and may still be biased in the case of difficult-to-assemble sequences.
Pan-genome graphs, providing both qualitative and quantitative information about the range of patterns expected in datasets, will enable us to identify signif- icant differences even between low-coverage samples that would be impossible to assemble and compare otherwise. Even though combinatorial approaches were already proposed in the literature, further development of probabilistic frameworks will increase the sensitivity and address issues of statistical significance of findings. We have already developed a probabilistic graph-based framework for working with low-coverage sequencing datasets that can be adapted to the purpose of differential analysis. Similar methodology can also be developed for long read technologies (such as Oxford Nanopore). Such comparative analysis is also essential in employing differential analysis in medical framework, where the trade-off between cost and sensitivity is a substantial issue.
New algorithms and tools for differential analysis in the context of pan-genome graphs.