Ion Mandoiu Computer Science and Engineering Department

Slides:



Advertisements
Similar presentations
Marius Nicolae Computer Science and Engineering Department
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Considerations for Analyzing Targeted NGS Data HLA
Admixture in Horse Breeds Illustrated from Single Nucleotide Polymorphism Data César Torres, Yaniv Brandvain University of Minnesota, Department of Plant.
 Experimental Setup  Whole brain RNA-Seq Data from Sanger Institute Mouse Genomes Project [Keane et al. 2011]  Synthetic hybrids with different levels.
G ENOTYPE AND SNP C ALLING FROM N EXT - GENERATION S EQUENCING D ATA Authors: Rasmus Nielsen, et al. Published in Nature Reviews, Genetics, Presented.
Computational Advances in Next Generation Sequencing Ion Măndoiu (University of Connecticut) Alex Zelikovsky (Georgia State University) April 28, 2011,
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Genotype and Haplotype Reconstruction from Low- Coverage Short Sequencing Reads Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.
MALD Mapping by Admixture Linkage Disequilibrium.
University of Connecticut
Efficient Algorithms for SNP Genotype Data Analysis using Hidden Markov Models of Haplotype Diversity Justin Kennedy Dissertation Defense for the Degree.
Computational Challenges in Whole-Genome Association Studies Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
LD-Based Genotype and Haplotype Inference from Low-Coverage Short Sequencing Reads Ion Mandoiu Computer Science and Engineering Department University of.
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama 1, Ion Mandoiu 1, and Pramod Srivastava.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Ion Mandoiu CSE Department, University of Connecticut Joint work with Justin.
Linkage Disequilibrium-Based Single Individual Genotyping from Low-Coverage Short Sequencing Reads Justin Kennedy 1 Joint work with Sanjiv Dinakar 1, Yozen.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference Ion Mandoiu University of Connecticut CS&E Department.
Bioinformatics Pipeline for Fosmid based Molecular Haplotype Sequencing Jorge Duitama1,2, Thomas Huebsch1, Gayle McEwen1, Sabrina Schulz1, Eun-Kyung Suk1,
Workshop in Bioinformatics 2010 Class # Class 8 March 2010.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Justin Kennedy, Ion Mandoiu, Bogdan Pasaniuc CSE Department, University of Connecticut.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Bioinformatics Tools for Personalized Cancer Immunotherapy
Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Algorithms for Genotype and Haplotype Inference from Low- Coverage Short Sequencing Reads Ion Mandoiu Computer Science and Engineering Department University.
Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Estimation of alternative splicing isoform frequencies from RNA-Seq data Ion Mandoiu Computer Science and Engineering Department University of Connecticut.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Towards accurate detection and genotyping of expressed variants from whole transcriptome sequencing data Jorge Duitama 1, Pramod Srivastava 2, and Ion.
Informatics challenges and computer tools for sequencing 1000s of human genomes Gabor T. Marth Boston College Biology Department Cold Spring Harbor Laboratory.
Reconstruction of Haplotype Spectra from NGS Data Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering.
GeVab: Genome Variation Analysis Browsing Server Korean BioInformation Center, KRIBB InCoB2009 KRIBB
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
PERFORMANCE COMPARISON OF NEXT GENERATION SEQUENCING PLATFORMS Bekir Erguner 1,3, Duran Üstek 2, Mahmut Ş. Sağıroğlu 1 1Advanced Genomics and Bioinformatics.
Variables: – T(p) - set of candidate transcripts on which pe read p can be mapped within 1 std. dev. – y(t) -1 if a candidate transcript t is selected,
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Targeted next generation sequencing for population genomics and phylogenomics in Ambystomatid salamanders Eric M. O’Neill David W. Weisrock Photograph.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
10cM - Linkage Mapping Set v2 ABI Median intermarker distance: 4.7 Mb Mean intermarker distance: 5.6 Mb Mean genetic gap distance: 8.9 cM Average Heterozygosity.
Serghei Mangul Department of Computer Science Georgia State University Joint work with Irina Astrovskaya, Marius Nicolae, Bassam Tork, Ion Mandoiu and.
Sahar Al Seesi and Ion Măndoiu Computer Science and Engineering
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Methods in genome wide association studies. Norú Moreno
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
California Pacific Medical Center
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Imputation-based local ancestry inference in admixed populations
Paul VanRaden and Chuanyu Sun Animal Genomics and Improvement Lab USDA-ARS, Beltsville, MD, USA National Association of Animal Breeders Columbia, MO, USA.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
Analysis of Next Generation Sequence Data BIOST /06/2015.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Constrained Hidden Markov Models for Population-based Haplotyping
Imputation-based local ancestry inference in admixed populations
Identification of Paralogs in RADseq data
Presentation transcript:

Linkage Disequilibrium Based SNP Genotype Calling from Short Sequencing Reads Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint work with S. Dinakar, J. Duitama, Y. Hernández, J. Kennedy, and Y. Wu

Ultra-High Throughput Sequencing Recent massively parallel sequencing technologies deliver orders of magnitude higher throughput compared to classic Sanger sequencing -SBS: Sequencing by Synthesis -SBL: Sequencing by Ligation -Challenges in Genome Assembly: The short read lengths and absence of paired ends make it difficult for assembly software to disambiguate repeat regions, therefore resulting in fragmented assemblies. -New Type of sequencing error: in 454 including incorrect estimates of homopolymer lengths, ‘transposition-like’ insertions (a base identical to a nearby homopolymer is inserted in a nearby nonadjacent location) and errors caused by multiple templates attached to the same bead Roche/454 FLX Titanium 400bp reads 400Mb/10h run ABI SOLiD 2.0 25-35bp reads 3-4Gb/6 day run Helicos HeliScope 25-55bp reads >1Gb/day Illumina Genome Analyzer II 35-50bp reads 1.5Gb/2.5 day run 2

Personal Genomes: The Future is Now! C.Venter Sanger@7.5x J. Watson 454@7.4x NA18507 Illumina@36x SOLiD@12x -SBS: Sequencing by Synthesis -SBL: Sequencing by Ligation -Challenges in Genome Assembly: The short read lengths and absence of paired ends make it difficult for assembly software to disambiguate repeat regions, therefore resulting in fragmented assemblies. -New Type of sequencing error: in 454 including incorrect estimates of homopolymer lengths, ‘transposition-like’ insertions (a base identical to a nearby homopolymer is inserted in a nearby nonadjacent location) and errors caused by multiple templates attached to the same bead 3

Challenges for Genomic Medicine at Single-Base Resolution Medical sequencing focuses on genetic variation (SNPs, CNVs, genome rearrangements) Requires accurate determination of both alleles at variable loci This is limited by coverage depth due to random nature of shotgun sequencing For the Venter and Watson genomes (both sequenced at ~7.5x average coverage), comparison with SNP genotyping chips has shown only ~75% accuracy for sequencing based calls of heterozygous SNPs [Levy et al 07, Wheeler et al 08] [Wendl&Wilson 08] predict that 21x coverage is required for sequencing of normal tissue samples based on idealized theory that “neglects any heuristic inputs” What heuristic inputs help? How much can we gain from improved data analysis? 4

Linkage Disequilibrium: Sources & Modeling HMM model of haplotype frequencies F1 F2 Fn … H1 H2 Hn Fi = founder haplotype at locus i, Hi = observed allele at locus i P(Fi), P(Fi | Fi-1) and P(Hi | Fi) estimated from reference panel such as Hapmap For given haplotype h with n SNPs, P(H=h|M) can be computed in O(nK2) using forward algorithm, where K=#founders

Pipeline for LD-Based Genotype Calling Read sequences Reference genome sequence GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT >gnl|ti|1779718824 name:EI1W3PE02ILQXT >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA Mapped reads GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT >gnl|ti|1779718824 name:EI1W3PE02ILQXT >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA GTCGCCCAGGCTGGTGTGCAGTGGTGCAACCTCAGCTCACTGCAACCTCTGCCTCCAGGTTCAAGCAATT TCAGTGAGGGTTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTGTTTTTGAGACAGAATTTTGCTCTT >gnl|ti|1779718824 name:EI1W3PE02ILQXT >gnl|ti|1779718825 name:EI1W3PE02GTXK0 TAGTAAAGATGGGGTTTCACTACGTTGGCTGAGCTGTTCTCGAACTCCTGACCTCAAATGAC CTCTGCCTCAGCCTCCCAAGTAGCTGGGATTACAGGCGGGCGCCACCACGCCCAGCTAATTTTGTATTGT AGGTACTTTGAGTCTGGGGGAGACAAAGGAGTTAGAAAGAGAGAGAATAAGCACTTAAAAGGCGGGTCCA TAATATGTTTATTTGTTTTGCTGCTGTTGAGTTGTACAATGTTGGGGAAAACAGTCGCACAACACCCGGC TCAGAATACCTGTTGCCCATTTTTATATGTTCCTTGGAGAAATGTCAATTCAGAGCTTTTGCTCAGCTTT GGGGGCCCGAGCATCGGAGGGTTGCTCATGGCCCACAGTTGTCAGGCTCCACCTAATTAAATGGTTTACA >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC >gi|88943037|ref|NT_113796.1|Hs1_111515 Homo sapiens chromosome 1 genomic contig, reference assembly CTTTGAAGTATTCTGAGACTTGTAGGAAGGTGAAGTAAATATCTAATATAATTGTAACAAGTAGTGCTTG GAATTCTGTGAAAGCCTGTAGCTATAAAAAAATGTTGAGCCATAAATACCATCAGAAATAACAAAGGGAG CTCCTAATTCTGGAGTAGGGGCTAGGCTAGAATGGTAGAATGCTCAAAAGAATCCAGCGAAGAGGAATAT AATACAGATGGATTCAGGAGAGGTACTTCCAGGGGGTCAAGGGGAGAAATACCTGTTGGGGGTCAATGCC GATTGTATGTTTTTGATTATTTTTTGTTAGGCTGTGATGGGCTCAAGTAATTGAAATTCCTGATGCAAGT TCCTTACTAAATTGATGAGACTTAAACCCATGAAAACTTAACAGCTAAACTCCCTAGTCAACTGGTTTGA AATGTACTTTCTCAGATACAGAACACCCTTGGTCAATTGAATACAGATCAATCACTTTAAGTAAGCTAAG TTCTGAGATAATAAATAGGACTGTCCCATATTGGAGGCCTTTTTGAACAGTTGTTGTATGGTGACCCTGA ATCTACTTCTCCAGCAGCTGGGGGAAAAAAGGTGAGAGAAGCAGGATTGAAGCTGCTTCTTTGAATTTAC Quality scores 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 >gnl|ti|1779718824 name:EI1W3PE02ILQXT 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 >gnl|ti|1779718824 name:EI1W3PE02ILQXT 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 27 42 35 21 6 28 43 36 22 10 27 42 35 20 6 28 43 36 22 9 28 28 28 28 26 28 28 40 34 14 44 36 23 13 2 27 42 35 21 7 >gnl|ti|1779718824 name:EI1W3PE02ILQXT 26 26 37 29 28 28 28 28 27 28 28 28 37 28 27 27 28 36 28 37 40 34 18 3 28 28 28 27 33 24 26 28 28 28 40 33 14 28 36 27 28 43 36 22 9 28 44 36 24 14 4 28 28 28 27 28 26 26 35 26 28 28 28 24 28 37 29 28 19 28 26 37 29 26 39 33 13 37 28 28 28 33 23 28 33 23 28 36 27 33 23 28 35 25 28 28 36 27 36 27 28 28 28 27 28 28 28 24 28 28 27 28 28 37 29 36 27 27 28 27 28 21 24 28 27 41 34 15 28 36 27 26 28 24 35 27 28 40 34 15 SNP genotype calls Hapmap genotypes … rs12095710 T T 9.988139e-01 rs12127179 C T 9.986735e-01 rs11800791 G G 9.977713e-01 rs11578310 G G 9.980062e-01 rs1287622 G G 8.644588e-01 rs11804808 C C 9.977779e-01 rs17471528 A G 5.236099e-01 rs11804835 C C 9.977759e-01 rs11804836 C C 9.977925e-01 rs1287623 G G 9.646510e-01 rs13374307 G G 9.989084e-01 rs12122008 G G 5.121655e-01 rs17431341 A C 5.290652e-01 rs881635 G G 9.978737e-01 rs9700130 A A 9.989940e-01 rs11121600 A A 6.160199e-01 rs12121542 A A 5.555713e-01 rs11121605 T T 8.387705e-01 rs12563779 G G 9.982776e-01 rs11121607 C G 5.639239e-01 rs11121608 G T 5.452936e-01 rs12029742 G G 9.973527e-01 rs562118 C C 9.738776e-01 rs12133533 A C 9.956655e-01 rs11121648 G G 9.077355e-01 rs9662691 C C 9.988648e-01 rs11805141 C C 9.928786e-01 rs1287635 C C 6.113270e-01 NOTE: P(g|r) is NP-Hard… 90 209342 16 F 0 0 21100012010021001001100?100201?10111110111?0212000 18 F 15 16 2110001?0100210010011002122201210211?1221220212000 21120010012001201001120010110101011111011110212000 15 M 0 0 8 F 0 0 2110001001000200122110001111011100111?121210222000 7 M 0 0 011202100120022012211200101101210211122111?0120000 9 M 0 0 21100010010002001221100010110111001112121210220000 12 F 9 10 11 M 7 8 011?001?012002201221120010?10121021112211110120000 21100210010002001221100012110111001112121210222000 90 209342 16 F 0 0 21100012010021001001100?100201?10111110111?0212000 18 F 15 16 2110001?0100210010011002122201210211?1221220212000 21120010012001201001120010110101011111011110212000 15 M 0 0 8 F 0 0 2110001001000200122110001111011100111?121210222000 7 M 0 0 011202100120022012211200101101210211122111?0120000 9 M 0 0 21100010010002001221100010110111001112121210220000 12 F 9 10 11 M 7 8 011?001?012002201221120010?10121021112211110120000 21100210010002001221100012110111001112121210222000 90 209342 16 F 0 0 21100012010021001001100?100201?10111110111?0212000 18 F 15 16 2110001?0100210010011002122201210211?1221220212000 21120010012001201001120010110101011111011110212000 15 M 0 0 8 F 0 0 2110001001000200122110001111011100111?121210222000 7 M 0 0 011202100120022012211200101101210211122111?0120000 9 M 0 0 21100010010002001221100010110111001112121210220000 12 F 9 10 11 M 7 8 011?001?012002201221120010?10121021112211110120000 21100210010002001221100012110111001112121210222000 6

Genotype Calling Accuracy vs. Coverage Watson/454 reads NA18507/Illumina reads

Conclusions & Ongoing Work Exploiting LD information yields significant improvements in genotyping calling accuracy and/or cost reduction Accuracy achieved by previously proposed binomial test is achieved by HMM-based posterior decoding algorithm using less than 1/4 of the reads Ongoing work Modeling ambiguities in read mapping Haplotype inferrence Extension to population sequencing data (removing need for reference panels) ACKNOWLEDGEMENTS This work was supported in part by NSF under awards IIS-0546457 and DBI-0543365 to IM and IIS-0803440 to YW. SD and YH performed this research as part of the Summer REU program “Bio-Grid Initiatives for Interdisciplinary Research and Education" funded by NSF under award CCF-0755373.