Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aaron R. Quinlan and Gabor T. Marth Department of Biology, Boston College, Chestnut Hill, MA 02467

Similar presentations


Presentation on theme: "Aaron R. Quinlan and Gabor T. Marth Department of Biology, Boston College, Chestnut Hill, MA 02467"— Presentation transcript:

1 Aaron R. Quinlan (quinlaaa@bc.edu) and Gabor T. Marth (marth@bc.edu), Department of Biology, Boston College, Chestnut Hill, MA 02467 http://bioinformatics.bc.edu/marthlab/ Our SNP detection method, detects SNPs across clonal reads based on base composition and quality. 1 3 4 C/TC/T P(TT|R) =.9991 P(CT|R) =.96 Method for Diploid Base Calling (Support Vector Machine - based) Collect Heterozygous and Homozygous Training Examples Calculate indicative features that separate heterozygotes from homozygotes. SNP 1 SNP 2SNP N… Trained SVM Can Separate Unseen Homozygotes and Heterozygotes Make Diploid Base Calls on Unseen Alignments. P(CT|R) =.34 P(CT|R) =.01 P(AC|R) =.999 P(AT|R) =.001 P(S 1 S 2 |R) = Probability of allelic combination given the read SVM SVMs Learn a Function to Distinguish between Positive and Negative based on the statistics of the features in the training examples. We are integrating diploid base calling (heterozygote detection) into Base Call/QualityPolymorphism Rate Base CompositionDepth of Alignment Probability of polymorphism Assessing the Accuracy of the Initial Prototype: “Unseen” Alignments SNP (A/G) Found Across Multiple Clonal Reads PCR-based sequences of diploid individuals Calls = (CC, CT, TT) P(CC|R) =.9995 P(CT|R) =.003 Summary: 1.We built a diploid base calling prototype from the ground up. The initial prototype’s performance is similar to Polyphred 5. 2.We are currently compiling a larger example set to improve accuracy. 3.Our method incorporates information from multiple reads for a given individual in a statistically-rigorous fashion. 4.This prototype represents the first major expansion of. 5.We are currently working to expand the prototype to a production-ready application 2 1 Probability of each possible diploid base call (AA,CC,GG,TT,AC,AG,AT,CG,CT,GT) Each Possible Diploid Base Call/Probability Prior Probability of Each Diploid Genotype Depth of Alignment Observed Diploid Variations/Probabilities SVM Score + is Het - is Hom P(Het) + 0 - 1 Utilizing multiple reads per individual, we can make an individual genotype call. Forward Read Reverse Read P(GT | Read) =.98 P(GT | Read) =.87 Individual Genotype Call: P(GT) =.993 Prior(GT Frequency) =.34 ? Rationale: The accuracy of the consensus diploid base call for an individual increases with the number of reads available for that individual. Polyphred 5 was tested with the following settings: quality = 21, score = 99, source, ref_comp 0 Convert SVM Score to P(Het) Assessing the Genotyping Accuracy of the Initial Prototype Objective: To enhance with an accurate diploid base calling algorithm Accuracy A Novel Approach To Diploid Base Calling P(CT) =.9 P(CC) =.045 P(TT) =.045 P(Others) =.01 Probability of Each Genotype From a diploid base call Sensitivity 21851 Data Accuracy by P(Het) Score Number of Alignments Analyzed: 993 Total Number of Read Positions: 231874 Total Number of Heterozygotes: 31411 Total Number of Homozygotes: 143370 Note: Polyphred was tested on alignments created by PolyBayes. This allowed Polyphred to analyze a larger fraction of reads, as Compared to Phrap Alignments.


Download ppt "Aaron R. Quinlan and Gabor T. Marth Department of Biology, Boston College, Chestnut Hill, MA 02467"

Similar presentations


Ads by Google