ralphi: a deep reinforcement learning framework for haplotype assembly

Journal article

Enzo Battistella, Anant Maheshwari, Barış Ekim, Bonnie Berger, Victoria Popic
RECOMB (International Conference on Research in Computational Molecular Biology), Best Paper Award, 2025

Semantic Scholar DBLP DOI PubMedCentral PubMed

Cite

APA Click to copy
Battistella, E., Maheshwari, A., Ekim, B., Berger, B., & Popic, V. (2025). ralphi: a deep reinforcement learning framework for haplotype assembly. RECOMB (International Conference on Research in Computational Molecular Biology), Best Paper Award.

Chicago/Turabian Click to copy
Battistella, Enzo, Anant Maheshwari, Barış Ekim, Bonnie Berger, and Victoria Popic. “Ralphi: a Deep Reinforcement Learning Framework for Haplotype Assembly.” RECOMB (International Conference on Research in Computational Molecular Biology), Best Paper Award (2025).

MLA Click to copy
Battistella, Enzo, et al. “Ralphi: a Deep Reinforcement Learning Framework for Haplotype Assembly.” RECOMB (International Conference on Research in Computational Molecular Biology), Best Paper Award, 2025.

BibTeX Click to copy

@article{enzo2025a,
  title = {ralphi: a deep reinforcement learning framework for haplotype assembly},
  year = {2025},
  journal = {RECOMB (International Conference on Research in Computational Molecular Biology), Best Paper Award},
  author = {Battistella, Enzo and Maheshwari, Anant and Ekim, Barış and Berger, Bonnie and Popic, Victoria}
}

Abstract

Haplotype assembly is the problem of reconstructing the combination of alleles on the maternally and paternally inherited chromosome copies. Individual haplotypes are essential to our understanding of how combinations of different variants impact phenotype. In this work, we focus on read-based haplotype assembly of individual diploid genomes, which reconstructs the two haplotypes directly from read alignments at variant loci. We introduce ralphi, a novel deep reinforcement learning framework for haplotype assembly, which integrates the representational power of deep learning with reinforcement learning to accurately partition read fragments into their respective haplotype sets. To set the reward objective for reinforcement learning, our approach uses the classic reduction of the problem to the maximum fragment cut formulation on fragment graphs, where nodes correspond to reads and edge weights capture the conflict or agreement of the reads at shared variant sites. We trained ralphi on a diverse dataset of fragment graph topologies derived from genomes in the 1000 Genomes Project. We show that ralphi consistently achieves lower error rates at comparable or longer haplotype block lengths over the state of the art for short and long ONT reads at varying coverage in standard human genome benchmarks. ralphi is available at https://github.com/PopicLab/ralphi.