SNP Scores. Overall Score Coverage Score * 4 optional scores ▫Read Balance Score  = 1 if reads are balanced in each direction ▫Allele Balance Score 

Slides:



Advertisements
Similar presentations
Hypothesis testing Another judgment method of sampling data.
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Introduction Finding the solutions to a system of linear equations requires graphing multiple linear inequalities on the same coordinate plane. Most real-world.
Reference mapping and variant detection Peter Tsai Bioinformatics Institute, University of Auckland.
Hypothesis Testing making decisions using sample data.
Statistical Issues in Research Planning and Evaluation
Significance Testing Chapter 13 Victor Katch Kinesiology.
Review: What influences confidence intervals?
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
Review of Basics. REVIEW OF BASICS PART I Measurement Descriptive Statistics Frequency Distributions.
2-5 : Normal Distribution
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
TESTING A HYPOTHESIS RELATING TO THE POPULATION MEAN 1 This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also.
Sample Size Determination Ziad Taib March 7, 2014.
Inferential Statistics
AM Recitation 2/10/11.
Overview of Statistical Hypothesis Testing: The z-Test
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Hypothesis testing is used to make decisions concerning the value of a parameter.
Hypothesis Testing II The Two-Sample Case.
Inference for Proportions(C18-C22 BVD) C19-22: Inference for Proportions.
Comparing Means From Two Sets of Data
Determining Sample Size
Estimation and Confidence Intervals
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Education 793 Class Notes T-tests 29 October 2003.
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Week 8 Chapter 8 - Hypothesis Testing I: The One-Sample Case.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Chapter 9: Testing Hypotheses
PARAMETRIC STATISTICAL INFERENCE
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Classifier Evaluation Vasileios Hatzivassiloglou University of Texas at Dallas.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
PPA 501 – Analytical Methods in Administration Lecture 6a – Normal Curve, Z- Scores, and Estimation.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Tests of Random Number Generators
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Copyright ©2013 Pearson Education, Inc. publishing as Prentice Hall 9-1 σ σ.
Example You give 100 random students a questionnaire designed to measure attitudes toward living in dormitories Scores range from 1 to 7 –(1 = unfavorable;
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Chapter 10 The t Test for Two Independent Samples
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Analyzing Statistical Inferences July 30, Inferential Statistics? When? When you infer from a sample to a population Generalize sample results to.
© Copyright McGraw-Hill 2004
Tutorial I: Missing Value Analysis
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
CHAPTER 7: TESTING HYPOTHESES Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chapter 9 Hypothesis Testing.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Two-Sample Hypothesis Testing
Hypothesis Testing: Hypotheses
Chapter 9 Hypothesis Testing.
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Presentation transcript:

SNP Scores

Overall Score Coverage Score * 4 optional scores ▫Read Balance Score  = 1 if reads are balanced in each direction ▫Allele Balance Score  = 1 if SNP count is balanced in relation to the read count in each direction ▫Homopolymer Score  = 1 if the SNP is not an indel in a homopolymer ▫Mismatch score  = 1 if there are fewer than 3 SNPs present within 10 bp on either side of the SNP that occur in a minimum number of reads Maximum score = 30*1*1*1*1 = 30

Interpreting the Score Scores are an empirical estimation of how likely it is that a given SNP is real and not an artifact of sequencing or alignment The score is based on Phred scores ▫30 = 1 in 1000 are not real ▫20 = 1 in 100 are not real ▫10 = 1 in 10 are not real

Interpreting the Score A low score does not mean the mutation is more likely to be false- it only means the mutation cannot be confidently called as a true mutation. Even real SNPs will have low scores if the coverage is low.

Optional Scores The optional scores can be ignored (set equal to 1) in the final score calculation by adjusting the settings for the mutation report. As you can see, The homopolymer score is always ignored unless it is Roche data.

Optional Score You may want to ignore certain optional scores depending on your data For example: If your data is all (or nearly all) one directional you can ignore choose to ignore the Read Balance score because even real SNPs will not be balanced Homopolymer scores are automatically ignored unless Roche data is being analyzed

Coverage Score If the SNP count is greater than 50 (example: 50% SNP at 100 coverage) then the score is 30. Otherwise the score is calculated according to this formula: % = SNP Allele Percentage # = SNP Count

Coverage Score This score is based on the Gompertz function where a, b, and c have been adjusted to achieve the desired distribution.

Coverage Score Distribution Different SNP Percentages vs Read Coverage: Score 10 Higher % SNPs are more reliable at low coverage

Coverage Score Distribution Different Levels of Coverage vs SNP % Low coverage will limit the score even if a SNP occurs in a high percentage of reads

Read Balance Score If the number of forward and reverse reads is within 1, then the score is 1. If not the score is calculated according to this formula: #F = number of forward reads C = Coverage

Read Balance Score When sequence data has reads present in both directions it is more reliable because the base quality is averaged out between the high quality 5’ end and the low quality 3’ end. A score of 1 means there is no penalty. A score below 1 reduces the score from the Coverage Score.

Read Balance Score Distribution Levels of Coverage vs Percent of Reads in the Forward Direction Percent of Reads in One Direction vs Coverage Lower coverage results in a higher penalty because the balance is more likely to be random

Allele Balance The Allele Balance score penalizes SNPs that occur at different frequencies in the forward and reverse directions because they are more likely to be sequencing or alignment errors. The score is based on a Yate’s chi-square test which is less likely than normal chi-square tests to reject the null hypothesis due to a lack of data (low coverage in this case).

Allele Balance First a variable is calculated: ▫W = |(#F SNP)*(#R non-SNP) – (#R SNP)*(# F non-SNP)|- C/2 If this variable is negative then the score is 1. Otherwise, the score is calculated according to the equation: #F = number of forward reads #R = number of reverse reads

Allele Balance Distribution Vary Imbalance Score vs Number of Forward SNPs 100 reads in each direction, 50% SNPs Vary Coverage Score vs Coverage Balanced reads, 2:1 SNP Balance, 30% SNPs Vary SNP Percentage Score vs percent of reads with a SNP allele 300 reads in each direction, 2:1 SNP Balance

Homopolymer Score The homopolymer score penalizes indels in homopolymer regions when analyzing Roche pyrosequencing data because they are usually a sequencing error. The penalty is higher for longer homopolymer regions because error is more likely.

Homopolymer Score The program first determines which length of homopolymer region is present more often (A) and less often (B) If A or B is not ≥ 3 then the score is 1 Otherwise the score is calculated according to the formula: Example: A deletion from 4 bases to 3 bases that occurs less than half of the time: A = 4, B = 3, Score = 0.5

Mismatch Score Several SNPs occurring very close together is usually the result of an alignment error. This score penalizes a SNP if there are other SNPs nearby. The program first looks for SNPs that occur in a minimum percentage of reads in the 10 bp on either side of the SNP being scored. The number of SNPs is used to calculate the score. If the number of nearby SNPs is less than 3 there is no penalty.

Mismatch Score Distribution After the number of nearby SNPs is determined the score is calculated according to the formula: This results in the following distribution: