HGEN Thanks to Fruhling Rijsdijk

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

Statistical Techniques I
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Logistic Regression Psy 524 Ainsworth.
Chapter 13: The Chi-Square Test
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Computer vision: models, learning and inference
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.
BCOR 1020 Business Statistics
The General LISREL MODEL and Non-normality Ulf H. Olsson Professor of Statistics.
1 Multivariate Normal Distribution Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Elements of Financial Risk Management Second Edition © 2012 by Peter Christoffersen 1 Distributions and Copulas for Integrated Risk Management Elements.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
CHI SQUARE TESTS.
Threshold Liability Models (Ordinal Data Analysis) Frühling Rijsdijk MRC SGDP Centre, Institute of Psychiatry, King’s College London Boulder Twin Workshop.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Copyright © 2011, 2005, 1998, 1993 by Mosby, Inc., an affiliate of Elsevier Inc. Chapter 19: Statistical Analysis for Experimental-Type Research.
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Frühling Rijsdijk & Kate Morley
Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Dependent t-Test PowerPoint Prepared by Alfred P.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.
CFA with Categorical Outcomes Psych DeShon.
Introduction to Marketing Research
Nonparametric Statistics
Chapter 12 Analysis of count data.
Introduction to Regression Analysis
Notes on Logistic Regression
The Statistical Imagination
Making Use of Associations Tests
CH 5: Multivariate Methods
Hypothesis testing. Chi-square test
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Categorical Data Aims Loglinear models Categorical data
Community &family medicine
Threshold Liability Models (Ordinal Data Analysis)
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Data Analysis for Two-Way Tables
Chapter 11 Goodness-of-Fit and Contingency Tables
Nonparametric Statistics
Linkage in Selected Samples
Analysis of count data 1.
Basic Statistical Terms
Chi Square Two-way Tables
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
Hypothesis testing. Chi-square test
Liability Threshold Models
LEARNING OUTCOMES After studying this chapter, you should be able to
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Statistics II: An Overview of Statistics
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Parametric versus Nonparametric (Chi-square)
Mathematical Foundations of BME
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
Brad Verhulst & Sarah Medland
RES 500 Academic Writing and Research Skills
Multivariate Genetic Analysis: Introduction
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Producing good data through sampling and experimentation
Hypothesis Testing - Chi Square
Probabilistic Surrogate Models
Presentation transcript:

HGEN619 2006 Thanks to Fruhling Rijsdijk Threshold Models HGEN619 2006 Thanks to Fruhling Rijsdijk

Categorical Data Measuring instrument is able to only discriminate between two or a few ordered categories : e.g. absence or presence of a disease Data therefore take the form of counts, i.e. the number of individuals within each category Underlying normal distribution of liability with one or more thresholds (cut-offs) is assumed

Standard Normal Distribution Liability is a latent variable, the scale is arbitrary, distribution is, therefore, assumed to be a (SND) Standard Normal Distribution or z-distribution: mean () = 0 and SD () = 1 z-values are number of SD away from mean area under curve translates directly to probabilities > Normal Probability Density function () -3 3 -1 1 2 -2 68%

Cumulative Probability -3 3 z0 Area=P(z  z0) Z0 Area 0 .50 50% .2 .42 42% .4 .35 35% .6 .27 27% .8 .21 21% 1 .16 16% 1.2 .12 12% 1.4 .08 8% 1.6 .06 6% 1.8 .036 3.6% 2 .023 2.3% 2.2 .014 1.4% 2.4 .008 .8% 2.6 .005 .5% 2.8 .003 .3% 2.9 .002 .2% Mathematically, the area under a curve can be derived by integral calculus. However, we don’t need to worry about integration of the SND, it has already been done by statisticians and we can look up these values in standard statistical tables. Standard Normal Cumulative Probability in right-hand tail (For negative z values, areas are found by symmetry)

Univariate Normal Distribution For one variable it is possible to find a z-value (threshold) on SND, so that proportion exactly matches observed proportion of sample i.e if from sample of 1000 individuals, 150 have met a criteria for a disorder (15%): the z-value is 1.04 -3 3 1.04

Two Categorical Traits With two categorical traits, data are represented in a Contingency Table, containing cell counts that can be translated into proportions 0=absent 1=present Trait2 Trait1 1 00 01 10 11 Trait2 Trait1 1 545 (.76) 75 (.11) 56 (.08) 40 (.05)

Categorical Data for Twins With dichotomous measured trait i.e. disorder either present or absent in unselected sample cell a: number of pairs concordant for unaffected cell d: number of pairs concordant for affected cell b/c: number of pairs discordant for disorder Twin2 Twin1 1 00 a 01 b 10 c 11 d 0 = unaffected 1 = affected

Joint Liability Model for Twins Assumed to follow a bivariate normal distribution Shape of bivariate normal distribution is determined by correlation between traits Expected proportions under distribution can be calculated by numerical integration with mathematical subroutines R=.00 R=.90

Expected Proportions for BN Liab 2 Liab 1 1 .87 .05 .03 R=0.6 Th1=1.4 Th2=1.4

Correlated Dimensions Correlation (shape) and two thresholds determine relative proportions of observations in 4 cells of CT Conversely, sample proportions in 4 cells can be used to estimate correlation and thresholds Twin2 Twin1 1 00 a 01 b 10 c 11 d c c d d a b a b

ACE Liability Twin Model 1 Variance decomposition (A, C, E) can be applied to liability, where correlations in liability are determined by path model This leads to an estimate of heritability of liability 1/.5 E C A A C E L L 1 1 Unaf Twin 1 ¯ Aff Unaf Twin 2 ¯ Aff

Fit to Ordinal Data in Mx Summary Statistics: Contingency Tables Built-in function for maximum likelihood analysis of 2 way contingency tables Limited to two variables Raw Data Built-in function for raw data ML More flexible: multivariate, missing data, moderator variables Frequency Data

Model Fitting to CT Fit function is twice the log-likelihood of the observed frequency data calculated as: Where nij is the observed frequency in cell ij And pij is the expected proportion in cell ij Expected proportions calculated by numerical integration of the bivariate normal over two dimensions: the liabilities for twin1 and twin2

Expected Proportions Probability that both twins are affected: where Φ is bivariate normal probability density function, L1 and L2 are liabilities of twin1 and twin2, with means 0, and  is correlation matrix of two liabilities (proportion d) Example: for correlation of .9 and thresholds of 1, d is around .12 Probability that both twins are below threshold: is given by integral function with reversed boundaries (proportion a) a is around .80 in this example B L1

Chi-square Statistic log-likelihood of the data under the model subtracted from log-likelihood of the observed frequencies themselves The model’s failure to predict the observed data i.e. a bad fitting model, is reflected in a significant χ²

Model Fitting to Raw Data ordinal ordinal Zyg response1 response2 1 0 0 1 0 1 2 1 0 2 0 0 1 1 1 2 . 1 2 0 . 2 0 1

Expected Proportions Likelihood of a vector of ordinal responses is computed by Expected Proportion in corresponding cell of Multivariate normal Distribution (MN), e.g. where  is MN pdf, which is function of , correlation matrix of variables Expected proportion are calculated by numerical integration of MN over n dimensions. In this example two, the liabilities for twin1 and twin2. By maximizing the likelihood of data under a MN distribution, ML estimate of correlation matrix and thresholds are obtained

Numerical Integration (0 0) (1 1) (0 1) (1 0)