IGES 2003 How many markers are necessary to infer correct familial relationships in follow-up studies? Silvano Presciuttini 1,3, Chiara Toni 2, Fabio Marroni.

Slides:



Advertisements
Similar presentations
Chapter 9 Introduction to the t-statistic
Advertisements

Genetic Heterogeneity Taken from: Advanced Topics in Linkage Analysis. Ch. 27 Presented by: Natalie Aizenberg Assaf Chen.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Basics of Linkage Analysis
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
PERFORMANCE OF DIFFERENT MODELS PREDICTING THE PRE-TEST PROBABILITY OF CARRYING MUTATIONS IN BRCA1 OR BRCA2 IN 330 ITALIAN FAMILIES Silvano Presciuttini,
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
THE POWER OF AN IBS-BASED METHOD TO INFER RELATIONSHIPS BETWEEN PAIRS OF INDIVIDUALS Silvano Presciuttini 1, Chiara Toni 1, Simonetta Verdiani 2, Lucia.
Inference about a Mean Part II
Assigning individuals to ethnic groups based on 13 STR loci X. Fosella 1, F. Marroni 1, S. Manzoni 2, A. Verzeletti 2, F. De Ferrari 2, N. Cerri 2, S.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
TrueAllele ® Casework Validation on PowerPlex ® 21 Mixture Data Australian and New Zealand Forensic Science Society September, 2014 Adelaide, South Australia.
Inferences About Process Quality
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
New Technologies Y-STR DNA Pedigree (7 generations)
AM Recitation 2/10/11.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Chapter 8 Introduction to Hypothesis Testing
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
AP STATISTICS LESSON 10 – 1 (DAY 2)
PARAMETRIC STATISTICAL INFERENCE
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
We obtained breast cancer tissues from the Breast Cancer Biospecimen Repository of Fred Hutchinson Cancer Research Center. We performed two rounds of next-gen.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Experimental Design Experimental Designs An Overview.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
Lecture 15: Linkage Analysis VII
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Using Familias and FamLink Athens 29 May, Familias 10 w w.umb.no NORWEGIAN UNIVERSITY OF LIFE SCIENCES.
Ark nr.: 1 | Forfatter: Øyvind Langsrud - a member of the Food Science Alliance | NLH - Matforsk - Akvaforsk Rotation Tests - Computing exact adjusted.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
© Copyright McGraw-Hill 2004
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Chapter 9 Introduction to the t Statistic
HYPOTHESIS TESTING.
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Basic Estimation Techniques
Statistical Process Control
Introduction to Inference
Harald H.H. Göring, Joseph D. Terwilliger 
Statistical Inference about Regression
Lecture 9: QTL Mapping II: Outbred Populations
Chapter 10 Introduction to the Analysis of Variance
Presentation transcript:

IGES 2003 How many markers are necessary to infer correct familial relationships in follow-up studies? Silvano Presciuttini 1,3, Chiara Toni 2, Fabio Marroni 1, Isabella Spinetti 2, Ranieri Domenici 2, and Joan E. Bailey-Wilson 3 1 Center of Statistical Genetics, University of Pisa, Italy 2 Unit of Legal Medicine, University of Pisa, Italy 3 Inherited Disease Research Branch, NHGRI, NIH, Baltimore, USA

IGES 2003 Background of this study The inference about the biological relationship between pairs of individuals using genetic markers plays a central role in many areas of human and applied genetics. The inference about the biological relationship between pairs of individuals using genetic markers plays a central role in many areas of human and applied genetics.  In forensic science, deficiency paternity cases arise when the alleged parent of a claimant is not available; often these cases reduce to determine the likelihood ratio of two alternative hypotheses about the relationship between a single pair of individuals.  In follow-up linkage studies, genome-wide scan data are usually not available, and verifying relationships among relatives could only be obtained by genotyping of additional markers outside the candidate region. We investigated the number of unlinked STR markers necessary to infer the true relationship between a pair of individuals, against several alternative hypotheses, using computer simulations.

IGES 2003 Design of the study We focused on the number of markers (M) necessary to reach predefined levels of power in discriminating relationships (1-  = 80%, 90%, 95%, and 99%) at various significance levels (  = 5%, 1%, and 0.1%) We focused on the number of markers (M) necessary to reach predefined levels of power in discriminating relationships (1-  = 80%, 90%, 95%, and 99%) at various significance levels (  = 5%, 1%, and 0.1%) The following relationships were considered: 1) parent-child (PC); 2) full sibs (FS), 3) second degree (2D, including half-sibs, grandparent-grandchild and avuncular pairs), 4) first cousins (FC), and non-relatives (NR) The following relationships were considered: 1) parent-child (PC); 2) full sibs (FS), 3) second degree (2D, including half-sibs, grandparent-grandchild and avuncular pairs), 4) first cousins (FC), and non-relatives (NR) We investigated: We investigated: 1.The “exact” method for inferring relationships 2.An approximate method based on IBS allele sharing 3.The reduction in the number of total genotypes achieved by using a sequential test

IGES 2003 The exact method of inferring relationships The usual, long-established method of inferring relationships between individuals is based on the population frequencies of the observed alleles and on the conditional probabilities of the observed genotypes, given any two alternative hypothesized relationships Probabilities of the seven possible combinations of genotypes for five common relationships

IGES 2003 Computer simulations For each comparison, we simulated the genotypes of 10,000 pairs of relatives of the five considered relationships, and another set of 10,000 pairs of the “false” relationship to be tested. For each comparison, we simulated the genotypes of 10,000 pairs of relatives of the five considered relationships, and another set of 10,000 pairs of the “false” relationship to be tested. Simulations were carried out separately for 25 commonly used markers (including the 13 CODIS markers and a second set of 12 markers commonly used in the forensic practice), and they were repeated a second time to reach a total number of 50 markers, thus representing a possible future expansion of validated markers. Simulations were carried out separately for 25 commonly used markers (including the 13 CODIS markers and a second set of 12 markers commonly used in the forensic practice), and they were repeated a second time to reach a total number of 50 markers, thus representing a possible future expansion of validated markers. The LR formula appropriate for each true relationship was applied to all 20,000 simulated pairs (including true and false relatives) for an increasing number of markers (1 to 50), and the resulting distributions of the log(LR) was analyzed in terms of a standard power analysis The LR formula appropriate for each true relationship was applied to all 20,000 simulated pairs (including true and false relatives) for an increasing number of markers (1 to 50), and the resulting distributions of the log(LR) was analyzed in terms of a standard power analysis

IGES 2003 Discriminating full sibs from half sibs Figure: number of markers needed to reach a given false-positive ratio (  ) at various percentages of true positives (1-  ) in discriminating 2D relatives from FS As an example of applying the the exact method, we show the power in discriminating full sibs from 2 nd degree relationships

IGES 2003 Results of the exact method Number of markers required to discriminate true relatives from false relatives at various combinations of  and 1- 

IGES 2003 The IBS method Instead of calculating a LR based on the genotypes of the individuals in a pair, it is possible to calculate a LR based on the number of alleles (0, 1, or 2) that the pair share identical by state. Instead of calculating a LR based on the genotypes of the individuals in a pair, it is possible to calculate a LR based on the number of alleles (0, 1, or 2) that the pair share identical by state. We have previously shown that the probabilities of sharing 0, 1, or 2 alleles (z 0, z 1, and z 2,) for a given relationship depend on locus heterozygosity (H), and are scarcely affected by variation of the distribution of allele frequencies. We have previously shown that the probabilities of sharing 0, 1, or 2 alleles (z 0, z 1, and z 2,) for a given relationship depend on locus heterozygosity (H), and are scarcely affected by variation of the distribution of allele frequencies. This allowed us to obtain empirical curves relating z i ’s to H for a series of common relationships. This allowed us to obtain empirical curves relating z i ’s to H for a series of common relationships. This means that the LR of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. This means that the LR of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H.

IGES 2003 Relationship between H and z i The figure shows a plot of the exact probabilities of sharing 0, 1 or both alleles at 19 loci as a function of locus heterozygosity for three common relationships (full sibs, 2nd degree and non- relatives). Lines represent third- order polynomial regression curves.

IGES 2003 Power comparison of the exact and the IBS methods Figure: Number of markers needed in the IBS method and in the exact method to discriminate full sibs from 2nd degree relatives When reliable allele frequency estimates are not available, the IBS method can be applied without losing much power

IGES 2003 Usefulness of a sequential test When genotyping is specifically performed for the purpose of verifying relationships, a sequential test may save a lot of resources TABLE: Mean number of markers per individuals needed to discriminate full sibs from 2nd degree relatives in a sequential test  = 0.05

IGES 2003 Conclusions Verifying reported relationships in follow-up familial studies is often based on typing of a small number of markers in new families Verifying reported relationships in follow-up familial studies is often based on typing of a small number of markers in new families Use of unlinked markers of the type used in forensic science provide sufficient power to discriminate the most critical relationships Use of unlinked markers of the type used in forensic science provide sufficient power to discriminate the most critical relationships When no reliable estimate of allele frequencies is available, a method based on IBS may be used instead of the conventional exact method When no reliable estimate of allele frequencies is available, a method based on IBS may be used instead of the conventional exact method A sequential test based on small sets of markers to be added at each step may render the task of verifying relationships highly efficient A sequential test based on small sets of markers to be added at each step may render the task of verifying relationships highly efficient