Publications – ALPACA ITN Project

Kenneweg, Philip; Dandinasivara, Raghuram; Luo, Xiao; Hammer, Barbara; Schönhuth, Alexander

Generating Synthetic Genotypes using Diffusion Models Unpublished

2024.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

Balvert, Marleen; Cooper-Knock, Johnathan; Stamp, Julian; Byrne, Ross P.; Mourragui, Soufiane; Gils, Juami; Benonisdottir, Stefania; Schlüter, Johannes; Kenna, Kevin; Abeln, Sanne; Iacoangeli, Alfredo; Daub, Joséphine T.; Browning, Brian L.; Taş, Gizem; Hu, Jiajing; Wang, Yan; Alhathli, Elham; Harvey, Calum; Pianesi, Luna; Schulte, Sara C.; González-Domínguez, Jorge; Garrisson, Erik; Al-Chalabi, Ammar; Cartes, Jorge Avila; Baaijens, Jasmijn; Berg, Joanna; Bolognini, Davide; Bonizzoni, Paola; Guarracino, Andrea; Koyuturk, Mehmet; Markowska, Magda; Dandinasivara, Raghuram; Bemmelen, Jasper; Vorbrugg, Sebastian; Zhang, Sai; Pasanuic, Bogdan; Snyder, Michael P.; Schönhuth, Alexander; Sng, Letitia M. F.; Twine, Natalie A.

Considerations in the search for epistasis Journal Article

In: Genome Biology, vol. 25, no. 1, pp. 296, 2024, ISSN: 1474-760X.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

Frolova, Daria; Lima, Leandro; Roberts, Leah Wendy; Bohnenkämper, Leonard; Wittler, Roland; Stoye, Jens; Iqbal, Zamin

Applying rearrangement distances to enable plasmid epidemiology with pling Journal Article

In: Microbial Genomics, vol. 10, no. 10, 2024, ISSN: 2057-5858.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Frolova2024,

title = {Applying rearrangement distances to enable plasmid epidemiology with pling},

author = {Daria Frolova and Leandro Lima and Leah Wendy Roberts and Leonard Bohnenkämper and Roland Wittler and Jens Stoye and Zamin Iqbal},

url = {https://github.com/iqbal-lab-org/pling},

doi = {10.1099/mgen.0.001300},

issn = {2057-5858},

year  = {2024},

date = {2024-10-01},

urldate = {2024-10-01},

journal = {Microbial Genomics},

volume = {10},

number = {10},

publisher = {Microbiology Society},

abstract = {Plasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However, this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalizing gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally – through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the ‘Double Cut and Join Indel’ (DCJ-Indel) model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflow pling (https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the ‘Russian Doll’ dataset and a hospital transmission dataset.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

Taş, Gizem; Westerdijk, Timo; Postma, Eric; Rheenen, Wouter; Bakker, Mark K.; Eijk, Kristel R.; Kooyman, Maarten; Khleifat, Ahmad Al; Iacoangeli, Alfredo; Ticozzi, Nicola; Cooper-Knock, Johnathan; Gromicho, Marta; Chandran, Siddharthan; Morrison, Karen E.; Shaw, Pamela J.; Hardy, John; Sendtner, Michael; Meyer, Thomas; Başak, Nazli; Fogh, Isabella; Chiò, Adriano; Calvo, Andrea; Pupillo, Elisabetta; Logroscino, Giancarlo; Gotkine, Marc; Vourc’h, Patrick; Corcia, Philippe; Couratier, Philippe; Millecamps, Stèphanie; Salachas, François; Pardina, Jesus S. Mora; Rojas-García, Ricardo; Dion, Patrick; Ross, Jay P.; Ludolph, Albert C.; Weishaupt, Jochen H.; Freischmidt, Axel; Bensimon, Gilbert; Tittmann, Lukas; Lieb, Wolfgang; Franke, Andre; Ripke, Stephan; Whiteman, David C.; Olsen, Catherine M.; Uitterlinden, Andre G.; Hofman, Albert; Amouyel, Philippe; Traynor, Bryan; Singleton, Adrew B.; Neto, Miguel Mitne; Cauchi, Ruben J.; Ophoff, Roel A.; Deerlin, Vivianna M.; Grosskreutz, Julian; Graff, Caroline; Brylev, Lev; Rogelj, Boris; Koritnik, Blaž; Zidar, Janez; Stević, Zorica; Drory, Vivian; Povedano, Monica; Blair, Ian P.; Kiernan, Matthew C.; Nicholson, Garth A.; Henders, Anjali K.; Carvalho, Mamede; Pinto, Susana; Petri, Susanne; Weber, Markus; Rouleau, Guy A.; Silani, Vincenzo; Glass, Jonathan; Brown, Robert H.; Landers, John E.; Shaw, Christopher E.; Andersen, Peter M.; Garton, Fleur C.; McRae, Allan F.; McLaughlin, Russell L.; Hardiman, Orla; Kenna, Kevin P.; Wray, Naomi R.; Al-Chalabi, Ammar; Damme, Philip Van; Berg, Leonard H.; Veldink, Jan H.; Veldink, Jan H.; Schönhuth, Alexander; Balvert, Marleen

Computing linkage disequilibrium aware genome embeddings using autoencoders Journal Article

In: Bioinformatics, vol. 40, no. 6, 2024, ISSN: 1367-4811.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Tas2024,

title = {Computing linkage disequilibrium aware genome embeddings using autoencoders},

author = {Gizem Taş and Timo Westerdijk and Eric Postma and Wouter Rheenen and Mark K. Bakker and Kristel R. Eijk and Maarten Kooyman and Ahmad Al Khleifat and Alfredo Iacoangeli and Nicola Ticozzi and Johnathan Cooper-Knock and Marta Gromicho and Siddharthan Chandran and Karen E. Morrison and Pamela J. Shaw and John Hardy and Michael Sendtner and Thomas Meyer and Nazli Başak and Isabella Fogh and Adriano Chiò and Andrea Calvo and Elisabetta Pupillo and Giancarlo Logroscino and Marc Gotkine and Patrick Vourc’h and Philippe Corcia and Philippe Couratier and Stèphanie Millecamps and François Salachas and Jesus S. Mora Pardina and Ricardo Rojas-García and Patrick Dion and Jay P. Ross and Albert C. Ludolph and Jochen H. Weishaupt and Axel Freischmidt and Gilbert Bensimon and Lukas Tittmann and Wolfgang Lieb and Andre Franke and Stephan Ripke and David C. Whiteman and Catherine M. Olsen and Andre G. Uitterlinden and Albert Hofman and Philippe Amouyel and Bryan Traynor and Adrew B. Singleton and Miguel Mitne Neto and Ruben J. Cauchi and Roel A. Ophoff and Vivianna M. Deerlin and Julian Grosskreutz and Caroline Graff and Lev Brylev and Boris Rogelj and Blaž Koritnik and Janez Zidar and Zorica Stević and Vivian Drory and Monica Povedano and Ian P. Blair and Matthew C. Kiernan and Garth A. Nicholson and Anjali K. Henders and Mamede Carvalho and Susana Pinto and Susanne Petri and Markus Weber and Guy A. Rouleau and Vincenzo Silani and Jonathan Glass and Robert H. Brown and John E. Landers and Christopher E. Shaw and Peter M. Andersen and Fleur C. Garton and Allan F. McRae and Russell L. McLaughlin and Orla Hardiman and Kevin P. Kenna and Naomi R. Wray and Ammar Al-Chalabi and Philip Van Damme and Leonard H. Berg and Jan H. Veldink and Jan H. Veldink and Alexander Schönhuth and Marleen Balvert},

editor = {Peter Robinson},

url = {https://github.com/gizem-tas/haploblock-autoencoders},

doi = {10.1093/bioinformatics/btae326},

issn = {1367-4811},

year  = {2024},

date = {2024-05-01},

urldate = {2024-05-01},

journal = {Bioinformatics},

volume = {40},

number = {6},

publisher = {Oxford University Press (OUP)},

abstract = {Motivation: The completion of the genome has paved the way for genome-wide association studies (GWAS), which explained certain proportions of heritability. GWAS are not optimally suited to detect non-linear effects in disease risk, possibly hidden in non-additive interactions (epistasis). Alternative methods for epistasis detection using, e.g. deep neural networks (DNNs) are currently under active development. However, DNNs are constrained by finite computational resources, which can be rapidly depleted due to increasing complexity with the sheer size of the genome. Besides, the curse of dimensionality complicates the task of capturing meaningful genetic patterns for DNNs; therefore necessitates dimensionality reduction. 

Results: We propose a method to compress single nucleotide polymorphism (SNP) data, while leveraging the linkage disequilibrium (LD) structure and preserving potential epistasis. This method involves clustering correlated SNPs into haplotype blocks and training per-block autoencoders to learn a compressed representation of the block’s genetic content. We provide an adjustable autoencoder design to accommodate diverse blocks and bypass extensive hyperparameter tuning. We applied this method to genotyping data from Project MinE, and achieved 99% average test reconstruction accuracy—i.e. minimal information loss—while compressing the input to nearly 10% of the original size. We demonstrate that haplotype-block based autoencoders outperform linear Principal Component Analysis (PCA) by approximately 3% chromosome-wide accuracy of reconstructed variants. To the extent of our knowledge, our approach is the first to simultaneously leverage haplotype structure and DNNs for dimensionality reduction of genetic data. 

Availability and implementation: Data are available for academic use through Project MinE at https://www.projectmine.com/research/data-sharing/, contingent upon terms and requirements specified by the source studies. Code is available at https://github.com/gizem-tas/haploblock-autoencoders.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

Motivation: The completion of the genome has paved the way for genome-wide association studies (GWAS), which explained certain proportions of heritability. GWAS are not optimally suited to detect non-linear effects in disease risk, possibly hidden in non-additive interactions (epistasis). Alternative methods for epistasis detection using, e.g. deep neural networks (DNNs) are currently under active development. However, DNNs are constrained by finite computational resources, which can be rapidly depleted due to increasing complexity with the sheer size of the genome. Besides, the curse of dimensionality complicates the task of capturing meaningful genetic patterns for DNNs; therefore necessitates dimensionality reduction.
Results: We propose a method to compress single nucleotide polymorphism (SNP) data, while leveraging the linkage disequilibrium (LD) structure and preserving potential epistasis. This method involves clustering correlated SNPs into haplotype blocks and training per-block autoencoders to learn a compressed representation of the block’s genetic content. We provide an adjustable autoencoder design to accommodate diverse blocks and bypass extensive hyperparameter tuning. We applied this method to genotyping data from Project MinE, and achieved 99% average test reconstruction accuracy—i.e. minimal information loss—while compressing the input to nearly 10% of the original size. We demonstrate that haplotype-block based autoencoders outperform linear Principal Component Analysis (PCA) by approximately 3% chromosome-wide accuracy of reconstructed variants. To the extent of our knowledge, our approach is the first to simultaneously leverage haplotype structure and DNNs for dimensionality reduction of genetic data.
Availability and implementation: Data are available for academic use through Project MinE at https://www.projectmine.com/research/data-sharing/, contingent upon terms and requirements specified by the source studies. Code is available at https://github.com/gizem-tas/haploblock-autoencoders.

Lemane, Téo; Lezzoche, Nolan; Lecubin, Julien; Pelletier, Eric; Lescot, Magali; Chikhi, Rayan; Peterlongo, Pierre

Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA Journal Article

In: Nature Computational Science, vol. 4, no. 2, pp. 104–109, 2024, ISSN: 2662-8457.

Abstract | Links | BibTeX | Tags: WP1: Primary CPG, WP2: Evolutionary/Comparative CPG, WP3: Translational CPG

Cillari, Nico; Neri, Giuseppe; Pisanti, Nadia; Milazzo, Paolo; Borello, Ugo

RettDb: the Rett syndrome omics database to navigate the Rett syndrome genomic landscape Journal Article

In: Database, vol. 2024, 2024, ISSN: 1758-0463.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

Willink, Beatriz; Tunström, Kalle; Nilén, Sofie; Chikhi, Rayan; Lemane, Téo; Takahashi, Michihiko; Takahashi, Yuma; Svensson, Erik I.; Wheat, Christopher West

The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies Journal Article

In: Nature Ecology & Evolution, vol. 8, no. 1, pp. 83–97, 2023, ISSN: 2397-334X.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG, WP3: Translational CPG

@article{Willink2023,

title = {The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies},

author = {Beatriz Willink and Kalle Tunström and Sofie Nilén and Rayan Chikhi and Téo Lemane and Michihiko Takahashi and Yuma Takahashi and Erik I. Svensson and Christopher West Wheat},

doi = {10.1038/s41559-023-02243-1},

issn = {2397-334X},

year  = {2023},

date = {2023-11-01},

urldate = {2023-11-01},

journal = {Nature Ecology & Evolution},

volume = {8},

number = {1},

pages = {83–97},

publisher = {Springer Science and Business Media LLC},

abstract = {Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.},

keywords = {WP2: Evolutionary/Comparative CPG, WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

Sládeček, Tomáš; Gažiová, Michaela; Kucharík, Marcel; Zaťková, Andrea; Pös, Zuzana; Pös, Ondrej; Krampl, Werner; Tomková, Erika; Hýblová, Michaela; Minárik, Gabriel; Radvánszky, Ján; Budiš, Jaroslav; Szemes, Tomáš

Combination of expert guidelines-based and machine learning-based approaches leads to superior accuracy of automated prediction of clinical effect of copy number variations Journal Article

In: Sci. Rep., vol. 13, no. 1, pp. 10531, 2023, ISBN: 2045-2322.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Sladecek2023-oq,

title = {Combination of expert guidelines-based and machine learning-based approaches leads to superior accuracy of automated prediction of clinical effect of copy number variations},

author = {Tomáš Sládeček and Michaela Gažiová and Marcel Kucharík and Andrea Zaťková and Zuzana Pös and Ondrej Pös and Werner Krampl and Erika Tomková and Michaela Hýblová and Gabriel Minárik and Ján Radvánszky and Jaroslav Budiš and Tomáš Szemes},

url = {https://predict.genovisio.com/},

doi = {10.1038/s41598-023-37352-1},

isbn = {2045-2322},

year  = {2023},

date = {2023-06-01},

urldate = {2023-06-01},

journal = {Sci. Rep.},

volume = {13},

number = {1},

pages = {10531},

abstract = {Clinical interpretation of copy number variants (CNVs) is a complex process that requires skilled clinical professionals. General recommendations have been recently released to guide the CNV interpretation based on predefined criteria to uniform the decision process. Several semiautomatic computational methods have been proposed to recommend appropriate choices, relieving clinicians of tedious searching in vast genomic databases. We have developed and evaluated such a tool called MarCNV and tested it on CNV records collected from the ClinVar database. Alternatively, the emerging machine learning-based tools, such as the recently published ISV (Interpretation of Structural Variants), showed promising ways of even fully automated predictions using broader characterization of affected genomic elements. Such tools utilize features additional to ACMG criteria, thus providing supporting evidence and the potential to improve CNV classification. Since both approaches contribute to evaluation of CNVs clinical impact, we propose a combined solution in the form of a decision support tool based on automated ACMG guidelines (MarCNV) supplemented by a machine learning-based pathogenicity prediction (ISV) for the classification of CNVs. We provide evidence that such a combined approach is able to reduce the number of uncertain classifications and reveal potentially incorrect classifications using automated guidelines. CNV interpretation using MarCNV, ISV, and combined approach is available for non-commercial use at https://predict.genovisio.com/.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K; Monlong, Jean; Abel, Haley J; Buonaiuto, Silvia; Chang, Xian H; Cheng, Haoyu; Chu, Justin; Colonna, Vincenza; Eizenga, Jordan M; Feng, Xiaowen; Fischer, Christian; Fulton, Robert S; Garg, Shilpa; Groza, Cristian; Guarracino, Andrea; Harvey, William T; Heumos, Simon; Howe, Kerstin; Jain, Miten; Lu, Tsung-Yu; Markello, Charles; Martin, Fergal J; Mitchell, Matthew W; Munson, Katherine M; Mwaniki, Moses Njagi; Novak, Adam M; Olsen, Hugh E; Pesout, Trevor; Porubsky, David; Prins, Pjotr; Sibbesen, Jonas A; Sirén, Jouni; Tomlinson, Chad; Villani, Flavia; Vollger, Mitchell R; Antonacci-Fulton, Lucinda L; Baid, Gunjan; Baker, Carl A; Belyaeva, Anastasiya; Billis, Konstantinos; Carroll, Andrew; Chang, Pi-Chuan; Cody, Sarah; Cook, Daniel E; Cook-Deegan, Robert M; Cornejo, Omar E; Diekhans, Mark; Ebert, Peter; Fairley, Susan; Fedrigo, Olivier; Felsenfeld, Adam L; Formenti, Giulio; Frankish, Adam; Gao, Yan; Garrison, Nanibaa' A; Giron, Carlos Garcia; Green, Richard E; Haggerty, Leanne; Hoekzema, Kendra; Hourlier, Thibaut; Ji, Hanlee P; Kenny, Eimear E; Koenig, Barbara A; Kolesnikov, Alexey; Korbel, Jan O; Kordosky, Jennifer; Koren, Sergey; Lee, Hojoon; Lewis, Alexandra P; Magalhães, Hugo; Marco-Sola, Santiago; Marijon, Pierre; McCartney, Ann; McDaniel, Jennifer; Mountcastle, Jacquelyn; Nattestad, Maria; Nurk, Sergey; Olson, Nathan D; Popejoy, Alice B; Puiu, Daniela; Rautiainen, Mikko; Regier, Allison A; Rhie, Arang; Sacco, Samuel; Sanders, Ashley D; Schneider, Valerie A; Schultz, Baergen I; Shafin, Kishwar; Smith, Michael W; Sofia, Heidi J; Tayoun, Ahmad N Abou; Thibaud-Nissen, Françoise; Tricomi, Francesca Floriana; Wagner, Justin; Walenz, Brian; Wood, Jonathan M D; Zimin, Aleksey V; Bourque, Guillaume; Chaisson, Mark J P; Flicek, Paul; Phillippy, Adam M; Zook, Justin M; Eichler, Evan E; Haussler, David; Wang, Ting; Jarvis, Erich D; Miga, Karen H; Garrison, Erik; Marschall, Tobias; Hall, Ira M; Li, Heng; Paten, Benedict

A draft human pangenome reference Journal Article

In: Nature, vol. 617, no. 7960, pp. 312–324, 2023, ISBN: 1476-4687.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Liao2023-do,

title = {A draft human pangenome reference},

author = {Wen-Wei Liao and Mobin Asri and Jana Ebler and Daniel Doerr and Marina Haukness and Glenn Hickey and Shuangjia Lu and Julian K Lucas and Jean Monlong and Haley J Abel and Silvia Buonaiuto and Xian H Chang and Haoyu Cheng and Justin Chu and Vincenza Colonna and Jordan M Eizenga and Xiaowen Feng and Christian Fischer and Robert S Fulton and Shilpa Garg and Cristian Groza and Andrea Guarracino and William T Harvey and Simon Heumos and Kerstin Howe and Miten Jain and Tsung-Yu Lu and Charles Markello and Fergal J Martin and Matthew W Mitchell and Katherine M Munson and Moses Njagi Mwaniki and Adam M Novak and Hugh E Olsen and Trevor Pesout and David Porubsky and Pjotr Prins and Jonas A Sibbesen and Jouni Sirén and Chad Tomlinson and Flavia Villani and Mitchell R Vollger and Lucinda L Antonacci-Fulton and Gunjan Baid and Carl A Baker and Anastasiya Belyaeva and Konstantinos Billis and Andrew Carroll and Pi-Chuan Chang and Sarah Cody and Daniel E Cook and Robert M Cook-Deegan and Omar E Cornejo and Mark Diekhans and Peter Ebert and Susan Fairley and Olivier Fedrigo and Adam L Felsenfeld and Giulio Formenti and Adam Frankish and Yan Gao and Nanibaa' A Garrison and Carlos Garcia Giron and Richard E Green and Leanne Haggerty and Kendra Hoekzema and Thibaut Hourlier and Hanlee P Ji and Eimear E Kenny and Barbara A Koenig and Alexey Kolesnikov and Jan O Korbel and Jennifer Kordosky and Sergey Koren and Hojoon Lee and Alexandra P Lewis and Hugo Magalhães and Santiago Marco-Sola and Pierre Marijon and Ann McCartney and Jennifer McDaniel and Jacquelyn Mountcastle and Maria Nattestad and Sergey Nurk and Nathan D Olson and Alice B Popejoy and Daniela Puiu and Mikko Rautiainen and Allison A Regier and Arang Rhie and Samuel Sacco and Ashley D Sanders and Valerie A Schneider and Baergen I Schultz and Kishwar Shafin and Michael W Smith and Heidi J Sofia and Ahmad N Abou Tayoun and Françoise Thibaud-Nissen and Francesca Floriana Tricomi and Justin Wagner and Brian Walenz and Jonathan M D Wood and Aleksey V Zimin and Guillaume Bourque and Mark J P Chaisson and Paul Flicek and Adam M Phillippy and Justin M Zook and Evan E Eichler and David Haussler and Ting Wang and Erich D Jarvis and Karen H Miga and Erik Garrison and Tobias Marschall and Ira M Hall and Heng Li and Benedict Paten},

doi = {10.1038/s41586-023-05896-x},

isbn = {1476-4687},

year  = {2023},

date = {2023-05-01},

urldate = {2023-05-01},

journal = {Nature},

volume = {617},

number = {7960},

pages = {312–324},

publisher = {Springer Science and Business Media LLC},

abstract = {Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

10.

Porubsky, David; Vollger, Mitchell R; Harvey, William T; Rozanski, Allison N; Ebert, Peter; Hickey, Glenn; Hasenfeld, Patrick; Sanders, Ashley D; Stober, Catherine; Consortium, Human Pangenome Reference; Korbel, Jan O; Paten, Benedict; Marschall, Tobias; Eichler, Evan E

Gaps and complex structurally variant loci in phased genome assemblies Journal Article

In: Genome Res., vol. 33, no. 4, pp. 496–510, 2023, ISSN: 1549-5469.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Porubsky2023-ue,

title = {Gaps and complex structurally variant loci in phased genome  assemblies},

author = {David Porubsky and Mitchell R Vollger and William T Harvey and Allison N Rozanski and Peter Ebert and Glenn Hickey and Patrick Hasenfeld and Ashley D Sanders and Catherine Stober and Human Pangenome Reference Consortium and Jan O Korbel and Benedict Paten and Tobias Marschall and Evan E Eichler},

doi = {10.1101/gr.277334.122},

issn = {1549-5469},

year  = {2023},

date = {2023-04-01},

urldate = {2023-04-01},

journal = {Genome Res.},

volume = {33},

number = {4},

pages = {496–510},

abstract = {There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

11.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander

Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks Journal Article

In: Nat. Mach. Intell., vol. 5, no. 2, pp. 114–125, 2023, ISBN: 2522-5839.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

@article{Luo2023-le,

title = {Predicting the prevalence of complex genetic diseases from individual genotype profiles using capsule networks},

author = {Xiao Luo and Xiongbin Kang and Alexander Schönhuth},

doi = {10.1038/s42256-022-00604-2},

isbn = {2522-5839},

year  = {2023},

date = {2023-02-01},

urldate = {2023-02-01},

journal = {Nat. Mach. Intell.},

volume = {5},

number = {2},

pages = {114–125},

publisher = {Springer Science and Business Media LLC},

abstract = {Diseases that have a complex genetic architecture tend to suffer from considerable amounts of genetic variants that, although playing a role in the disease, have not yet been revealed as such. Two major causes for this phenomenon are genetic variants that do not stack up effects, but interact in complex ways; in addition, as recently suggested, the omnigenic model postulates that variants interact in a holistic manner to establish disease phenotypes. Here we present DiseaseCapsule, as a capsule-network-based approach that explicitly addresses to capture the hierarchical structure of the underlying genome data, and has the potential to fully capture the non-linear relationships between variants and disease. DiseaseCapsule is the first such approach to operate in a whole-genome manner when predicting disease occurrence from individual genotype profiles. In experiments, we evaluated DiseaseCapsule on amyotrophic lateral sclerosis (ALS) and Parkinson’s disease, with a particular emphasis on ALS, which is known to have a complex genetic architecture and is affected by 40% missing heritability. On ALS, DiseaseCapsule achieves 86.9% accuracy on hold-out test data in predicting disease occurrence, thereby outperforming all other approaches by large margins. Also, DiseaseCapsule required sufficiently less training data for reaching optimal performance. Last but not least, the systematic exploitation of the network architecture yielded 922 genes of particular interest, and 644 ‘non-additive’ genes that are crucial factors in DiseaseCapsule, but remain masked within linear schemes.},

keywords = {WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

12.

Teramo, Antonella; Binatti, Andrea; Ciabatti, Elena; Schiavoni, Gianluca; Tarrini, Giulia; Baril`a, Gregorio; Calabretto, Giulia; Vicenzetto, Cristina; Gasparini, Vanessa Rebecca; Facco, Monica; Petrini, Iacopo; Grossi, Roberto; Pisanti, Nadia; Bortoluzzi, Stefania; Falini, Brunangelo; Tiacci, Enrico; Galimberti, Sara; Semenzato, Gianpietro; Zambello, Renato

Defining TCRγδlymphoproliferative disorders by combined immunophenotypic and molecular evaluation Journal Article

In: Nature Communications, vol. 13, no. 1, pp. 3298, 2022, ISBN: 2041-1723.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG

13.

Sládeček, T.; Gažiová, M.; Pös, O.; Pös, Z.; Budiš, J.; Radvánsky, J.; Szemes, T.

Combination of expert decision systems with artificial intelligence leads to superior accuracy of automated prediction of clinical effect of copy number variation Presentation

Poster Presentation at ESHG, 01.06.2022.

BibTeX | Tags: Misc, WP3: Translational CPG

14.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander

Strainline: full-length de novo viral haplotype reconstruction from noisy long reads Journal Article

In: Genome Biology, vol. 23, iss. 1, no. 29, pp. 1–27, 2022, ISSN: 1474-760X.

Abstract | Links | BibTeX | Tags: WP1: Primary CPG, WP3: Translational CPG

15.

Gažiová, M.; Sládeček, T.; Pös, O.; Števko, M.; Krampl, W.; Pös, Z.; Hekel, R.; Hlavačka, M.; Kucharík, M.; Radvánszky, J.; Budiš, J.; Szemes, T.

Automated prediction of the clinical impact of structural copy number variations Journal Article

In: Scientific Reports, vol. 12, no. 1, pp. 555, 2022, ISSN: 2045-2322.

Abstract | Links | BibTeX | Tags: Misc, WP3: Translational CPG

16.

Luo, Xiao; Kang, Xiongbin; Schönhuth, Alexander

phasebook: haplotype-aware de novo assembly of diploid genomes from long reads Journal Article

In: Genome biology, vol. 22, no. 299, pp. 1–26, 2021, ISSN: 1474-760X.

Abstract | Links | BibTeX | Tags: WP3: Translational CPG