Objectives
Antibiotic resistance (ABR) is a global threat to public health, and is a property primarily conferred to bacteria either by horizontal transfer of a gene or by mutational evolution. Predictions suggest ABR will kill more people than cancer by 2050, as it underlies both treatment of infectious disease, and management of cancer therapy and transplants. The strong selective pressure applied by human use of antibiotics seems to underlie the growth, diversification and spread of resistant strains and plasmids across the world. A major goal for translational genomics is to provide rapid resistance predictions from clinical samples based on genome sequence data. There are three related problems. First, given a genotype-to-resistance mapping, perform accurate genotyping that is not affected by the levels of diversity in bacteria, which confound standard map-to-a-single-reference-genome approaches.
The remaining two problems relate to building that genotype to resistance map. One can either use Genome Wide Association Studies to try to determine causal genes/mutations, or one can use hypothesis free approaches (e.g. “machine learning”) to infer predictive features, without attempting to infer mechanism. GWAS approaches are challenging in bacteria because of genome-wide linkage disequilibrium, and because of large levels of diversity and horizontal gene transfer. In this project we seek to use pan-genomic data structures to improve GWAS (by using more informative markers and reducing unnecessary multiple testing), and as a substrate for machine learning.
Expected Results
Algorithms and software for bacterial pan-GWAS: Benchmarking of standard machine learning approaches for resistance prediction based on pan-genome for at least 2 species (M. tuberculosis, E coli); New machine learning approaches developed, software made available, benchmarked on the above 2 species.