Publications – ALPACA ITN Project

38 entries « ‹ 1 of 2 › »

Cartes, Jorge Avila; Bonizzoni, Paola; Ciccolella, Simone; Vedova, Gianluca Della; Denti, Luca

PangeBlocks: customized construction of pangenome graphs via maximal blocks Journal Article

In: BMC Bioinformatics, vol. 25, no. 1, 2024, ISSN: 1471-2105.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

@article{AvilaCartes2024a,

title = {PangeBlocks: customized construction of pangenome graphs via maximal blocks},

author = {Jorge Avila Cartes and Paola Bonizzoni and Simone Ciccolella and Gianluca Della Vedova and Luca Denti},

url = {https://github.com/AlgoLab/pangeblocks},

doi = {10.1186/s12859-024-05958-5},

issn = {1471-2105},

year  = {2024},

date = {2024-11-01},

urldate = {2024-11-01},

journal = {BMC Bioinformatics},

volume = {25},

number = {1},

publisher = {Springer Science and Business Media LLC},

abstract = {\textbf{Background}

The construction of a pangenome graph is a fundamental task in pangenomics. A natural theoretical question is how to formalize the computational problem of building an optimal pangenome graph, making explicit the underlying optimization criterion and the set of feasible solutions. Current approaches build a pangenome graph with some heuristics, without assuming some explicit optimization criteria. Thus it is unclear how a specific optimization criterion affects the graph topology and downstream analysis, like read mapping and variant calling. 



\textbf{Results} 

In this paper, by leveraging the notion of maximal block in a Multiple Sequence Alignment (MSA), we reframe the pangenome graph construction problem as an exact cover problem on blocks called Minimum Weighted Block Cover (MWBC). Then we propose an Integer Linear Programming (ILP) formulation for the MWBC problem that allows us to study the most natural objective functions for building a graph. We provide an implementation of the ILP approach for solving the MWBC and we evaluate it on SARS-CoV-2 complete genomes, showing how different objective functions lead to pangenome graphs that have different properties, hinting that the specific downstream task can drive the graph construction phase. 



\textbf{Conclusion}

We show that a customized construction of a pangenome graph based on selecting objective functions has a direct impact on the resulting graphs. In particular, our formalization of the MWBC problem, based on finding an optimal subset of blocks covering an MSA, paves the way to novel practical approaches to graph representations of an MSA where the user can guide the construction.},

keywords = {WP2: Evolutionary/Comparative CPG},

pubstate = {published},

tppubtype = {article}

}

Parmigiani, Luca; Garrison, Erik; Stoye, Jens; Marschall, Tobias; Doerr, Daniel

Panacus: fast and exact pangenome growth and core size estimation Journal Article

In: Bioinformatics, 2024, ISSN: 1367-4811.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Gabory, Esteban; Mwaniki, Moses Njagi; Pisanti, Nadia; Pissis, Solon P.; Radoszewski, Jakub; Sweering, Michelle; Zuba, Wiktor

Pangenome comparison via ED strings Journal Article

In: Frontiers in Bioinformatics, vol. 4, 2024, ISSN: 2673-7647.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Brejová, Broňa; Gagie, Travis; Herencsárová, Eva; Vinař, Tomáš

Maximum-scoring path sets on pangenome graphs of constant treewidth Journal Article

In: Frontiers in Bioinformatics, vol. 4, 2024, ISSN: 2673-7647.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Sierra, Pío; Durbin, Richard

Identification of transposable element families from pangenome polymorphisms Journal Article

In: Mobile DNA, vol. 15, no. 1, pp. 13, 2024, ISSN: 1759-8753.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Parmigiani, Luca; Wittler, Roland; Stoye, Jens

Revisiting pangenome openness with k-mers Journal Article

In: Peer Community Journal, vol. 4, 2024, ISSN: 2804-3871.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Almeida, Miguel Vasconcelos; Blumer, Moritz; Yuan, Chengwei Ulrika; Sierra, Pío; Price, Jonathan L.; Quah, Fu Xiang; Friman, Aleksandr; Dallaire, Alexandra; Vernaz, Grégoire; Putman, Audrey L. K.; Smith, Alan M.; Joyce, Domino A.; Butter, Falk; Haase, Astrid D.; Durbin, Richard; Santos, M. Emília; Miska, Eric A.

Dynamic co-evolution of transposable elements and the piRNA pathway in African cichlid fishes Unpublished

2024.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

Lemane, Téo; Lezzoche, Nolan; Lecubin, Julien; Pelletier, Eric; Lescot, Magali; Chikhi, Rayan; Peterlongo, Pierre

Indexing and real-time user-friendly queries in terabyte-sized complex genomic datasets with kmindex and ORA Journal Article

In: Nature Computational Science, vol. 4, no. 2, pp. 104–109, 2024, ISSN: 2662-8457.

Abstract | Links | BibTeX | Tags: WP1: Primary CPG, WP2: Evolutionary/Comparative CPG, WP3: Translational CPG

Cartes, Jorge Avila; Bonizzoni, Paola; Ciccolella, Simone; Vedova, Gianluca Della; Denti, Luca; Didelot, Xavier; Monti, Davide Cesare; Pirola, Yuri

RecGraph: recombination-aware alignment of sequences to variation graphs Journal Article

In: Bioinformatics, vol. 40, no. 5, pp. btae292, 2024, ISSN: 1367-4811.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

@article{Avila2024,

title = {RecGraph: recombination-aware alignment of sequences to variation graphs},

author = {Jorge Avila Cartes and Paola Bonizzoni and Simone Ciccolella and Gianluca Della Vedova and Luca Denti and Xavier Didelot and Davide Cesare Monti and Yuri Pirola},

url = {https://github.com/AlgoLab/RecGraph},

doi = {10.1093/bioinformatics/btae292},

issn = {1367-4811},

year  = {2024},

date = {2024-01-01},

urldate = {2024-01-01},

journal = {Bioinformatics},

volume = {40},

number = {5},

pages = {btae292},

abstract = {Bacterial genomes present more variability than human genomes, which requires important adjustments in computational tools that are developed for human data. In particular, bacteria exhibit a mosaic structure due to homologous recombinations, but this fact is not sufficiently captured by standard read mappers that align against linear reference genomes. The recent introduction of pangenomics provides some insights in that context, as a pangenome graph can represent the variability within a species. However, the concept of sequence-to-graph alignment that captures the presence of recombinations has not been previously investigated.In this paper, we present the extension of the notion of sequence-to-graph alignment to a variation graph that incorporates a recombination, so that the latter are explicitly represented and evaluated in an alignment. Moreover, we present a dynamic programming approach for the special case where there is at most a recombination—we implement this case as RecGraph. From a modelling point of view, a recombination corresponds to identifying a new path of the variation graph, where the new arc is composed of two halves, each extracted from an original path, possibly joined by a new arc. Our experiments show that RecGraph accurately aligns simulated recombinant bacterial sequences that have at most a recombination, providing evidence for the presence of recombination events.

Our implementation is open source and available at https://github.com/AlgoLab/RecGraph.},

keywords = {WP2: Evolutionary/Comparative CPG},

pubstate = {published},

tppubtype = {article}

}

10.

Schulz, Tizian; Parmigiani, Luca; Rempel, Andreas; Stoye, Jens

Methods for Pangenomic Core Detection Book Chapter

In: Setubal, João Carlos; Stadler, Peter F.; Stoye, Jens (Ed.): Comparative Genomics: Methods and Protocols, pp. 73–106, Springer US, New York, NY, 2024, ISSN: 1940-6029.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

11.

Baláž, Andrej; Gagie, Travis; Goga, Adrián; Heumos, Simon; Navarro, Gonzalo; Petescia, Alessia; Sirén, Jouni

Wheeler Maps Proceedings Article

In: Soto, José A.; Wiese, Andreas (Ed.): LATIN 2024: Theoretical Informatics, pp. 178–192, Springer Nature Switzerland, Cham, 2024, ISSN: 1611-3349.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

12.

Willink, Beatriz; Tunström, Kalle; Nilén, Sofie; Chikhi, Rayan; Lemane, Téo; Takahashi, Michihiko; Takahashi, Yuma; Svensson, Erik I.; Wheat, Christopher West

The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies Journal Article

In: Nature Ecology & Evolution, vol. 8, no. 1, pp. 83–97, 2023, ISSN: 2397-334X.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG, WP3: Translational CPG

@article{Willink2023,

title = {The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies},

author = {Beatriz Willink and Kalle Tunström and Sofie Nilén and Rayan Chikhi and Téo Lemane and Michihiko Takahashi and Yuma Takahashi and Erik I. Svensson and Christopher West Wheat},

doi = {10.1038/s41559-023-02243-1},

issn = {2397-334X},

year  = {2023},

date = {2023-11-01},

urldate = {2023-11-01},

journal = {Nature Ecology & Evolution},

volume = {8},

number = {1},

pages = {83–97},

publisher = {Springer Science and Business Media LLC},

abstract = {Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.},

keywords = {WP2: Evolutionary/Comparative CPG, WP3: Translational CPG},

pubstate = {published},

tppubtype = {article}

}

13.

Orozco-Arias, Simon; Sierra, Pío; Durbin, Richard; González, Josefa

MCHelper automatically curates transposable element libraries across eukaryotic species Unpublished

2023.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

@unpublished{OrozcoArias2023,

title = {MCHelper automatically curates transposable element libraries across eukaryotic species},

author = {Simon Orozco-Arias and Pío Sierra and Richard Durbin and Josefa González},

doi = {10.1101/2023.10.17.562682},

year  = {2023},

date = {2023-10-01},

publisher = {Cold Spring Harbor Laboratory},

abstract = {The number of species with high quality genome sequences continues to increase, in part due to scaling up of multiple large scale biodiversity sequencing projects. While the need to annotate genic sequences in these genomes is widely acknowledged, the parallel need to annotate transposable element sequences that have been shown to alter genome architecture, rewire gene regulatory networks, and contribute to the evolution of host traits is becoming ever more evident. However, accurate genome-wide annotation of transposable element sequences is still technically challenging. Several de novo transposable element identification tools are now available, but manual curation of the libraries produced by these tools is needed to generate high quality genome annotations. Manual curation is time-consuming, and thus impractical for large-scale genomic studies, and lacks reproducibility. In this work, we present the Manual Curator Helper tool MCHelper, which automates the TE library curation process. By leveraging MCHelper’s fully automated mode with the outputs from three de novo transposable element identification tools, RepeatModeler2, EDTA and REPET, in fruit fly, rice, hooded crow, zebrafish, maize, and human, we show a substantial improvement in the quality of the transposable element libraries and genome annotations. MCHelper libraries are less redundant, with up to 65% reduction in the number of consensus sequences, have up to 11.4% fewer false positive sequences, and up to ∼48% fewer “unclassified/unknown” transposable element consensus sequences. Genome-wide transposable element annotations were also improved, including larger unfragmented insertions. Moreover, MCHelper is an easy to install and easy to use tool.},

keywords = {WP2: Evolutionary/Comparative CPG},

pubstate = {published},

tppubtype = {unpublished}

}

14.

Herencsárová, Eva; Brejová, Broňa

Identifying Clusters in Graph Representations of Genomes Proceedings Article

In: Brejová, Broňa; Ciencialová, Lucie; Holeňa, Martin; Jajcay, Róbert; Jajcayová, Tatiana; Lexa, Matej; Mráz, František; Pardubská, Dana; Plátek, Martin (Ed.): Proceedings of the 23rd Conference Information Technologies – Applications and Theory (ITAT 2023), pp. 232–241, CEUR Workshop Proceedings, Tatranské Matliare, Slovakia, 2023.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

15.

Bernardini, Giulia; Iersel, Leo; Julien, Esther; Stougie, Leen

Constructing phylogenetic networks via cherry picking and machine learning Journal Article

In: Algorithms for Molecular Biology, vol. 18, no. 1, pp. 13, 2023, ISSN: 1748-7188.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

16.

Cozzi, Davide; Rossi, Massimiliano; Rubinacci, Simone; Gagie, Travis; Köppl, Dominik; Boucher, Christina; Bonizzoni, Paola

μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data Journal Article

In: Bioinformatics, vol. 39, no. 9, 2023, ISSN: 1367-4811.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

@article{Cozzi2023-bx,

title = {μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data},

author = {Davide Cozzi and Massimiliano Rossi and Simone Rubinacci and Travis Gagie and Dominik Köppl and Christina Boucher and Paola Bonizzoni},

url = {https://github.com/dlcgold/muPBWT},

doi = {10.1093/bioinformatics/btad552},

issn = {1367-4811},

year  = {2023},

date = {2023-09-01},

urldate = {2023-09-01},

journal = {Bioinformatics},

volume = {39},

number = {9},

publisher = {Oxford University Press (OUP)},

abstract = {The Positional Burrows–Wheeler Transform (PBWT) is a data structure that indexes haplotype sequences in a manner that enables finding maximal haplotype matches in h sequences containing w variation sites in O(hw) time. This represents a significant improvement over classical quadratic-time approaches. However, the original PBWT data structure does not allow for queries over Biobank panels that consist of several millions of haplotypes, if an index of the haplotypes must be kept entirely in memory.



In this article, we leverage the notion of r-index proposed for the BWT to present a memory-efficient method for constructing and storing the run-length encoded PBWT, and computing set maximal matches (SMEMs) queries in haplotype sequences. We implement our method, which we refer to as μ-PBWT, and evaluate it on datasets of 1000 Genome Project and UK Biobank data. Our experiments demonstrate that the μ-PBWT reduces the memory usage up to a factor of 20% compared to the best current PBWT-based indexing. In particular, μ-PBWT produces an index that stores high-coverage whole genome sequencing data of chromosome 20 in about a third of the space of its BCF file. μ-PBWT is an adaptation of techniques for the run-length compressed BWT for the PBWT (RLPBWT) and it is based on keeping in memory only a succinct representation of the RLPBWT that still allows the efficient computation of set maximal matches (SMEMs) over the original panel.



Our implementation is open source and available at https://github.com/dlcgold/muPBWT. The binary is available at https://bioconda.github.io/recipes/mupbwt/README.html.},

keywords = {WP2: Evolutionary/Comparative CPG},

pubstate = {published},

tppubtype = {article}

}

17.

Ayad, Lorraine A K; Chikhi, Rayan; Pissis, Solon P

Seedability: optimizing alignment parameters for sensitive sequence comparison Journal Article

In: Bioinform. Adv., vol. 3, no. 1, 2023, ISSN: 2635-0041.

Abstract | Links | BibTeX | Tags: WP1: Primary CPG, WP2: Evolutionary/Comparative CPG

18.

Lee, Sewon; Kim, Gyuri; Karin, Eli Levy; Mirdita, Milot; Park, Sukhwan; Chikhi, Rayan; Babaian, Artem; Kryshtafovych, Andriy; Steinegger, Martin

Petascale Homology Search for Structure Prediction Unpublished

bioRxiv, 2023.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

19.

Gabory, Esteban; Mwaniki, Moses Njagi; Pisanti, Nadia; Pissis, Solon P; Radoszewski, Jakub; Sweering, Michelle; Zuba, Wiktor

Comparing elastic-degenerate strings: Algorithms, lower bounds, and applications Proceedings Article

In: Bulteau, Laurent; Lipt'ak, Zsuzsanna (Ed.): 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023), pp. 11:1–11:20, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2023, ISBN: 978-3-95977-276-1.

Abstract | Links | BibTeX | Tags: WP1: Primary CPG, WP2: Evolutionary/Comparative CPG

@inproceedings{Gabory2023-vh,

title = {Comparing elastic-degenerate strings: Algorithms, lower bounds, and applications},

author = {Esteban Gabory and Moses Njagi Mwaniki and Nadia Pisanti and Solon P Pissis and Jakub Radoszewski and Michelle Sweering and Wiktor Zuba},

editor = {Bulteau, Laurent and Lipt'{a}k, Zsuzsanna},

doi = {10.4230/LIPIcs.CPM.2023.11},

isbn = {978-3-95977-276-1},

year  = {2023},

date = {2023-06-01},

urldate = {2023-06-01},

booktitle = {34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)},

volume = {259},

pages = {11:1--11:20},

publisher = {Schloss Dagstuhl - Leibniz-Zentrum für Informatik},

address = {Dagstuhl, Germany},

series = {Leibniz International Proceedings in Informatics (LIPIcs)},

abstract = {An elastic-degenerate (ED) string T is a sequence of n sets T[1],…,T[n] containing m strings in total whose cumulative length is N. We call n, m, and N the length, the cardinality and the size of T, respectively. The language of T is defined as ℒ(T) = {S_1 ⋯ S_n : S_i ∈ T[i] for all i ∈ [1,n]}. ED strings have been introduced to represent a set of closely-related DNA sequences, also known as a pangenome. The basic question we investigate here is: Given two ED strings, how fast can we check whether the two languages they represent have a nonempty intersection? We call the underlying problem the ED String Intersection (EDSI) problem. For two ED strings T₁ and T₂ of lengths n₁ and n₂, cardinalities m₁ and m₂, and sizes N₁ and N₂, respectively, we show the following: - There is no 𝒪((N₁N₂)^{1-ε})-time algorithm, thus no 𝒪((N₁m₂+N₂m₁)^{1-ε})-time algorithm and no 𝒪((N₁n₂+N₂n₁)^{1-ε})-time algorithm, for any constant ε > 0, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Strong Exponential-Time Hypothesis is false. - There is no combinatorial 𝒪((N₁+N₂)^{1.2-ε}f(n₁,n₂))-time algorithm, for any constant ε > 0 and any function f, for EDSI even when T₁ and T₂ are over a binary alphabet, unless the Boolean Matrix Multiplication conjecture is false. - An 𝒪(N₁log N₁log n₁+N₂log N₂log n₂)-time algorithm for outputting a compact (RLE) representation of the intersection language of two unary ED strings. In the case when T₁ and T₂ are given in a compact representation, we show that the problem is NP-complete. - An 𝒪(N₁m₂+N₂m₁)-time algorithm for EDSI. - An Õ(N₁^{ω-1}n₂+N₂^{ω-1}n₁)-time algorithm for EDSI, where ω is the exponent of matrix multiplication; the Õ notation suppresses factors that are polylogarithmic in the input size. We also show that the techniques we develop have applications outside of ED string comparison.},

keywords = {WP1: Primary CPG, WP2: Evolutionary/Comparative CPG},

pubstate = {published},

tppubtype = {inproceedings}

}

20.

Denti, Luca; Khorsand, Parsoa; Bonizzoni, Paola; Hormozdiari, Fereydoun; Chikhi, Rayan

SVDSS: structural variation discovery in hard-to-call genomic regions using sample-specific strings from accurate long reads Journal Article

In: Nat. Methods, vol. 20, no. 4, pp. 550–558, 2023, ISBN: 1548-7105.

Abstract | Links | BibTeX | Tags: WP2: Evolutionary/Comparative CPG

38 entries « ‹ 1 of 2 › »