BEGIN:VCALENDAR VERSION:2.0 PRODID:-//Ä¢¹½ÊÓÆµ//NONSGML v1.0//EN NAME:PhD defence E.A.D. Gabory METHOD:PUBLISH BEGIN:VEVENT DTSTART:20250521T154500 DTEND:20250521T171500 DTSTAMP:20250521T154500 UID:2025/phd-defence-e-a-d-gabory@8F96275E-9F55-4B3F-A143-836282E12573 CREATED:20250502T081048 LOCATION:(1st floor) Auditorium, Main building De Boelelaan 1105 1081 HV Amsterdam SUMMARY:PhD defence E.A.D. Gabory X-ALT-DESC;FMTTYPE=text/html:
Variable strings for pa ngenomes: Matching, Comparison, Indexing
This dissertation inv estigates the computational foundations of sequence analysis in the c ontext of pangenomics, a rapidly evolving field in computational biol ogy. Classical string algorithms, originally developed for linear DNA sequences, encounter new challenges when extended to nonlinear, grap h-like pangenome representations. To address this, we study variable strings—generalized models for representing sets of similar sequenc es compactly, including elastic-degenerate strings (ED strings), foun der graphs, and weighted sequences. The thesis makes three core contr ibutions. First, it explores exact and approximate pattern matching a lgorithms for variable strings, establishing tight upper and lower bo unds. Second, it introduces novel methods for comparing pangenomic da ta structures, including algorithms for intersection detection, match ing statistics, and distance-based comparisons. Third, it proposes sp ace-efficient indexing strategies for weighted sequences, enabling pr obabilistic pattern queries under uncertainty.
More information on the
DESCRIPTION: This dissertation investigates the computational foundati ons of sequence analysis in the context of pangenomics, a rapidly evo lving field in computational biology. Classical string algorithms, or iginally developed for linear DNA sequences, encounter new challenges when extended to nonlinear, graph-like pangenome representations. To address this, we study variable strings—generalized models for rep resenting sets of similar sequences compactly, including elastic-dege nerate strings (ED strings), founder graphs, and weighted sequences. The thesis makes three core contributions. First, it explores exact a nd approximate pattern matching algorithms for variable strings, esta blishing tight upper and lower bounds. Second, it introduces novel me thods for comparing pangenomic data structures, including algorithms for intersection detection, matching statistics, and distance-based c omparisons. Third, it proposes space-efficient indexing strategies fo r weighted sequences, enabling probabilistic pattern queries under un certainty. More information on the Variable strings fo r pangenomes: Matching, Comparison, Indexing END:VEVENT END:VCALENDAR