Download presentation
Presentation is loading. Please wait.
Published byRachel Gilmore Modified over 9 years ago
1
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering University of Connecticut
2
Haplotype Spectra Reconstruction Given NGS reads, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction
3
Single Individual Haplotyping Somatic cells are diploid, containing two nearly identical copies of each autosomal chromosome – Heterozygous loci found by mapping reads to reference genome – Long haplotype fragments can be generated by sequencing fosmid pools [Duitama et al. 2012]
4
Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11
5
Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11
6
Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11
7
Single Individual Haplotyping Input: Matrix M of m fragments covering n loci Locus12345...n f1f1 *01100 f2f2 110*11 f3f3 00011* fmfm **1*11
8
RefHap Algorithm [Duitama et al. 12] Reduce the problem to Max-Cut Solve Max-Cut Build haplotypes according with the cut Locus12345 f1f1 *0110 f2f2 110*1 f3f3 1**0* f4f4 *00*1 3 f1f1 1 1 f4f4 f2f2 f3f3 h 1 00110 h 2 11001 Chr. 22, 32k SNPs, 14k fragments
9
Haplotype Spectra Reconstruction Given short sequence fragments, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction
10
Transcriptome Reconstruction Challenge: Alternative Splicing [Griffith and Marra 07]
11
1742365 t 1 : 174365 t 2 : 174235 t 3 :t 4 : 174351742365
12
Map the RNA-Seq reads to genome Construct Splice Graph - G(V,E) – V : exons – E: splicing events Generate candidate transcripts – Depth-first-search (DFS) Filter candidate transcripts – Fragment length distribution (FLD) – Integer programming Genome TRIP Transciptome Reconstruction using Integer Programming
13
How to filter? Select the smallest set of putative transcripts that yields a good statistical fit between – empirically determined during library preparation – implied by “mapping” read pairs 13 123 500 300 200 Mean : 500; Std. dev. 50
14
Allele Specific Expression
15
Haplotype Spectra Reconstruction Given short sequence fragments, reconstruct: – Full length sequences – Sequence frequencies Example applications: – Single individual haplotyping – Allele specific transcriptome reconstruction – Viral quasispecies reconstruction
16
RNA Virus Replication High mutation rate (~10 -4 ) Lauring & Andino, PLoS Pathogens 2011
17
How Are Quasispecies Contributing to Virus Persistence and Evolution? Variants differ in – Virulence – Ability to escape immune response – Resistance to antiviral therapies – Tissue tropism Lauring & Andino, PLoS Pathogens 2011
18
Shotgun reads starting positions distributed ~uniformly Amplicon reads have predefined start/end positions covering fixed overlapping windows Shotgun vs. Amplicon Reads
19
Reconstruction from Shotgun Reads: ViSpA Read Error Correction Read Alignment Preprocessing of Aligned Reads Read Graph Construction Contig Assembly Frequency Estimation Shotgun reads Quasispecies sequences w/ frequencies
20
Reconstruction from Amplicon Reads: VirA Reference in FASTA format Error- corrected SAM/BAM Read data Estimate Amplicons Max-Bandwidth Paths Viral population variants with frequencies Amplicon Read Graph Frequency Estimation
21
K amplicons represented by K-layer read graph Vertices ⇔ distinct reads Edges ⇔ reads with consistent overlap Vertices have count function c(v) Amplicon Read Graph
22
Read Graph Transformation Heuristic to reduce edges in dense graphs Replace bipartite cliques with star subgraphs
23
Challenges Scalability Exploit inherent sparsity of biological instances E.g., exact scaffolding algorithm using non-serial dynamic programming based on SPQR trees Flexibility Long (noisy) reads + short Heterogeneous data, e.g., RNA-Seq + TSSeq + PolyA-Seq Quantifying reconstruction uncertainty Compute intensive, e.g., bootstrapping + + + - - + - -
24
Acknowledgements Jorge Duitama Sahar Al Seesi Mazhar Kahn Rachel O’Neill Alexander Artyomenko Adrian Caciula Nicholas Mancuso Serghei Mangul Bassam Tork Alex Zelikovsky Irina Astrovskaya Pavel Skums
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.