Download presentation
Presentation is loading. Please wait.
Published byGeorge Skinner Modified over 8 years ago
1
Population sequencing using short reads: HIV as a case study Vladimir Jojic et.al. PSB 13:114-125 (2008) Presenter: Yong Li
2
Overview Background –Population sequencing & metagenomics –Pyrosequencing & classical sequencing The Problem and the challenge –low concentration; short reads; sequencing errors; The model –sequence & frequency reads The EM algorithm Validation
3
Background Population sequencing & metagenomics –Multiple strain vs. multiple species –HIV drug resistance from rare variants Pyrosequencing & chromatographical –Ultra-deep sequencing, 454 sequencing –Short reads; high error rate; homopolymers –Sensitivity 0.1% vs. 20% To clone or not to clone? –Two protocols to detect mutational variant –Cloning bias; stoichiometry
4
Genome Res. Wang et al. 17: 1195-1201, 2007 Clonal amplification
5
Genome Res. Wang et al. 17: 1195-1201, 2007
6
The computational problem Given: – 454 sequencing reads Get: –Reconstruct the population Sequences (epitome) –Estimate the relative quantity Statistical model
7
The statistical model (1) Indel frequency Sequencing error parameter
8
The statistical model (2)
9
The hidden variable: Model parameters: Observed variable:, t = 1…T EM algorithm ?
10
Computational tricks One tau Clustering of reads Initialization Determining the number of strains: S –Trails
11
Validation Data is partially simulated –e is composed of real HIV variants –Artificial values for –x generated from the very probabilistic model with 1% substitution; 2% insertion, 0.5% deletion Two datasets –1. Varied strains frequencies, and coverage –2. Varied mutation density
13
Discussion High sensitivity compared with chromatography approach –0.1% relative abundance May be applied to metagenomic sequencing Need validation using real date Need comparison with other method
14
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.