đź•‘ 11:00 CEST
đź“… July 10, 2023
đź“Ť Online
Title: GoldRush: De novo assembly of long reads with linear time complexity
Abstract:
Since the discovery of the double helix structure of DNA and the realization that it was the sequence of the constituting nucleotides that encodes that information, researchers have looked for ways to read and assemble those sequences. Modern sequencing technologies, such as those from Pacific Biosciences and Oxford Nanopore Technologies (ONT), read information on ever longer stretches of input DNA molecules. Specialized bioinformatics algorithms can then assemble those reads de novo to reconstruct the sequence of the underlying molecule.
In de novo sequence assembly, one identifies partial alignments between two sequences to merge them into longer sequences, assuming the overlap has originated from the same locus within a larger sequence (e.g., chromosome). This can be conceptualized as an iterative process, starting with the reads. There are two major assembly paradigms, overlap/layout/consensus (OLC) and de Bruijn graph (dBG), with the former often being the preferred method when assembling long reads. In its naïve form, all-versus-all comparison of $n$ reads is an operation with $O(n^2)$ time complexity. While modern OLC algorithms implement methods that perform faster overlap detection, we note that it is possible to sidestep the operation by selecting in linear time a read set that represents a golden path – a read set with non-redundant coverage of the target genome. We have implemented the concept in GoldRush, and demonstrated its utility in assembling ONT data describing the genomes of three human cell lines (NA24385, HG01243, and HG02055), and two plant species (Oryza sativa and Solanum lycopersicum).
About the speaker:
To date, Dr. Birol’s team has accelerated research with over 140 publications, which have practical applications in both ecosystems and human biology, including medical genomics. As of 2018, he is listed among the top 1% cited scientist in the world by Clarivate Analytics in the Cross-Field category.
Dr. Birol’s research interests include the analysis of data from modern sequencing instruments to study genomes and transcriptomes of model species and humans. He directs the Bioinformatics Technology Lab, which develops bioinformatics tools for de novo sequence assembly, sequence mapping, downstream data analysis and visualization. He also directs a wet lab at BC Centre for Disease Control to study antimicrobial resistance and to develop novel strategies as alternatives to conventional antibiotics.
Recorded Video: https://youtu.be/S6cNvx9dJIU