Download presentation
Presentation is loading. Please wait.
Published byTyrone Sims Modified over 9 years ago
1
Aaron R. Quinlan (quinlaaa@bc.edu) and Gabor T. Marth (marth@bc.edu), Department of Biology, Boston College, Chestnut Hill, MA 02467 http://bioinformatics.bc.edu/marthlab/ Our SNP detection method, detects SNPs across clonal reads based on base composition and quality. 1 3 4 C/TC/T P(TT|R) =.9991 P(CT|R) =.96 Method for Diploid Base Calling (Support Vector Machine - based) Collect Heterozygous and Homozygous Training Examples Calculate indicative features that separate heterozygotes from homozygotes. SNP 1 SNP 2SNP N… Trained SVM Can Separate Unseen Homozygotes and Heterozygotes Make Diploid Base Calls on Unseen Alignments. P(CT|R) =.34 P(CT|R) =.01 P(AC|R) =.999 P(AT|R) =.001 P(S 1 S 2 |R) = Probability of allelic combination given the read SVM SVMs Learn a Function to Distinguish between Positive and Negative based on the statistics of the features in the training examples. We are integrating diploid base calling (heterozygote detection) into Base Call/QualityPolymorphism Rate Base CompositionDepth of Alignment Probability of polymorphism Assessing the Accuracy of the Initial Prototype: “Unseen” Alignments SNP (A/G) Found Across Multiple Clonal Reads PCR-based sequences of diploid individuals Calls = (CC, CT, TT) P(CC|R) =.9995 P(CT|R) =.003 Summary: 1.We built a diploid base calling prototype from the ground up. The initial prototype’s performance is similar to Polyphred 5. 2.We are currently compiling a larger example set to improve accuracy. 3.Our method incorporates information from multiple reads for a given individual in a statistically-rigorous fashion. 4.This prototype represents the first major expansion of. 5.We are currently working to expand the prototype to a production-ready application 2 1 Probability of each possible diploid base call (AA,CC,GG,TT,AC,AG,AT,CG,CT,GT) Each Possible Diploid Base Call/Probability Prior Probability of Each Diploid Genotype Depth of Alignment Observed Diploid Variations/Probabilities SVM Score + is Het - is Hom P(Het) + 0 - 1 Utilizing multiple reads per individual, we can make an individual genotype call. Forward Read Reverse Read P(GT | Read) =.98 P(GT | Read) =.87 Individual Genotype Call: P(GT) =.993 Prior(GT Frequency) =.34 ? Rationale: The accuracy of the consensus diploid base call for an individual increases with the number of reads available for that individual. Polyphred 5 was tested with the following settings: quality = 21, score = 99, source, ref_comp 0 Convert SVM Score to P(Het) Assessing the Genotyping Accuracy of the Initial Prototype Objective: To enhance with an accurate diploid base calling algorithm Accuracy A Novel Approach To Diploid Base Calling P(CT) =.9 P(CC) =.045 P(TT) =.045 P(Others) =.01 Probability of Each Genotype From a diploid base call Sensitivity 21851 Data Accuracy by P(Het) Score Number of Alignments Analyzed: 993 Total Number of Read Positions: 231874 Total Number of Heterozygotes: 31411 Total Number of Homozygotes: 143370 Note: Polyphred was tested on alignments created by PolyBayes. This allowed Polyphred to analyze a larger fraction of reads, as Compared to Phrap Alignments.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.