We propose to explore distinct approaches when creating or adding information to a pan-genome graph. The simplest approach is to map new sequences, indicating newly discovered variants and annotating existing ones. However, when the graph is getting too complex and/or too big, we may have interest to split it into two (or more) sub-graphs. The objective of this ESR will be to determine the best strategy to adopt depending on data size and complexity, from high-quality trustable sequences (perfectly assembled genomes) to lower quality sequences (badly assembled data) or even unassembled sequences.
Expected results are methodological. This work will provide results regarding the best strategy to adopt when increasing the volume and/or the complexity of the data indexed. In addition, this ESR will work in close relation with ESR4 and will provide new algorithmic techniques pertaining to the data structure plasticity.