Frühling Rijsdijk & Kate Morley

Slides:

Advertisements

Similar presentations

Bivariate analysis HGEN619 class 2007.

Advertisements

TYPES OF DATA. Qualitative vs. Quantitative Data A qualitative variable is one in which the “true” or naturally occurring levels or categories taken by.

Introduction to Data Analysis

Statistical Tests Karen H. Hagglund, M.S.

Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.

Multivariate Genetic Analysis: Introduction(II) Frühling Rijsdijk & Shaun Purcell Wednesday March 6, 2002.

Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.

Univariate Analysis in Mx Boulder, Group Structure Title Type: Data/ Calculation/ Constraint Reading Data Matrices Declaration Assigning Specifications/

Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004.

Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.

Thresholds and ordinal data Sarah Medland – Boulder 2010.

Introduction to Multivariate Genetic Analysis Kate Morley and Frühling Rijsdijk 21st Twin and Family Methodology Workshop, March 2008.

Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.

Raw data analysis S. Purcell & M. C. Neale Twin Workshop, IBG Colorado, March 2002.

Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.

Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.

Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.

Institute of Psychiatry King’s College London, UK

Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)

Applied Quantitative Analysis and Practices LECTURE#11 By Dr. Osman Sadiq Paracha.

Descriptive Research: Quantitative Method Descriptive Analysis –Limits generalization to the particular group of individuals observed. –No conclusions.

Introduction to Quantitative Research Analysis and SPSS SW242 – Session 6 Slides.

Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.

CHI SQUARE TESTS.

Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.

Threshold Liability Models (Ordinal Data Analysis) Frühling Rijsdijk MRC SGDP Centre, Institute of Psychiatry, King’s College London Boulder Twin Workshop.

In Stat-I, we described data by three different ways. Qualitative vs Quantitative Discrete vs Continuous Measurement Scales Describing Data Types.

N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.

Univariate Analysis Hermine Maes TC21 March 2008.

Chapter 2: Levels of Measurement. Researchers classify variables according to the extent to which the values of the variable measure the intended characteristics.

Mx modeling of methylation data: twin correlations [means, SD, correlation] ACE / ADE latent factor model regression [sex and age] genetic association.

Means, Thresholds and Moderation Sarah Medland – Boulder 2008 Corrected Version Thanks to Hongyan Du for pointing out the error on the regression examples.

Mx Practical TC20, 2007 Hermine H. Maes Nick Martin, Dorret Boomsma.

Measurements Statistics WEEK 6. Lesson Objectives Review Descriptive / Survey Level of measurements Descriptive Statistics.

Categorical Data Frühling Rijsdijk 1 & Caroline van Baal 2 1 IoP, London 2 Vrije Universiteit, A’dam Twin Workshop, Boulder Tuesday March 2, 2004.

THE CHI-SQUARE TEST BACKGROUND AND NEED OF THE TEST Data collected in the field of medicine is often qualitative. --- For example, the presence or absence.

Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.

Nonparametric Statistics

Hypothesis Testing Procedures Many More Tests Exist!

More on thresholds Sarah Medland. A plug for OpenMx? Very few packages can handle ordinal data adequately… OpenMx can also be used for more than just.

REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.

Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.

QTL Mapping Using Mx Michael C Neale Virginia Institute for Psychiatric and Behavioral Genetics Virginia Commonwealth University.

Chapter 15 Analyzing Quantitative Data. Levels of Measurement Nominal measurement Involves assigning numbers to classify characteristics into categories.

March 7, 2012M. de Moor, Twin Workshop Boulder1 Copy files Go to Faculty\marleen\Boulder2012\Multivariate Copy all files to your own directory Go to Faculty\kees\Boulder2012\Multivariate.

2 NURS/HSCI 597 NURSING RESEARCH & DATA ANALYSIS GEORGE MASON UNIVERSITY.

Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.

Multivariate Genetic Analysis (Introduction) Frühling Rijsdijk Wednesday March 8, 2006.

Categorical Data HGEN

Nonparametric Statistics

HGEN Thanks to Fruhling Rijsdijk

Ordinal Data Sarah Medland.

Introduction to Multivariate Genetic Analysis

BINARY and CATEGORICAL TRAITS

Heterogeneity HGEN619 class 2007.

APPROACHES TO QUANTITATIVE DATA ANALYSIS

Threshold Liability Models (Ordinal Data Analysis)

More on thresholds Sarah Medland.

Nonparametric Statistics

Basic Statistical Terms

Liability Threshold Models

More on thresholds Sarah Medland.

BOULDER WORKSHOP STATISTICS REVIEWED: LIKELIHOOD MODELS

Brad Verhulst & Sarah Medland

Multivariate Genetic Analysis: Introduction

Presentation transcript:

Frühling Rijsdijk & Kate Morley Categorical Data Frühling Rijsdijk & Kate Morley Twin Workshop, Boulder Tuesday March 7, 2006

Aims Introduce Categorical Data Define liability and describe assumptions of the liability model Show how heritability of liability can be estimated from categorical twin data Practical exercises

Measurement Scales of Outcome Variables Dichotomous, Binary (2 categories) only two possible outcomes: e.g. yes / no response on an item; depression / no depression. Ordinal (Ranking) Polychotomous , (>2 categories) Qualitative, Qualitative, number of children in a family, Income level Categorical, Categorical, Discrete Discrete Nominal (no Ranking) Gender: 1=males, 2 = females marital status: 1=married, 2=divorced, 3=separated e.g. IQ, Temperature: outcomes mutually exclusive, logically ordered, differences meaningful, but zero point is arbitrary (e.g. 0 ºC is melting point water). We cannot say 80 ºC is twice as warm as 40 ºC, or having an IQ of 100 means being twice as smart as one of 50. Interval Interval Scale Scale Quantitative, Quantitative, Continuous Continuous Ratio Ratio Highest level of measurement e.g. height, weight. Outcomes are mutually exclusive, there is a logical order, differences are meaningful and zero means absence of the trait. We can say e.g. a tree of 3 M is twice as high as one of 1.5 Scale Scale

Ordinal data Measuring instrument is able to only discriminate between two or a few ordered categories e.g. absence or presence of a disease. Data take the form of counts, i.e. the number of individuals within each category: ‘no’ ‘yes’ So far we have talked about the analyses of continuous variables and transformation of skewed data into a more normal distribution. We now deal with the situation that, instead of making measurements on a continuous scale, we are able to discriminate between only a small number of ordered categories with our measuring instrument. For example, the data may be the presence or absence of a disease, or the responses to a single item on a questionnaire. In such cases the data take the form of counts, i.e. the number of individuals within each category of response. Of 100 individuals: 90 ‘no’ 10 ‘yes’ 55 19 ‘no’ ‘yes’ 18 8

Univariate Normal Distribution of Liability Assumptions: (1) Underlying normal distribution of liability (2) The liability distribution has 1 or more thresholds (cut-offs) One approach to analyze ordinal data is to assume that the ordered categories reflect an imprecise measurement of an underlying normal distribution of liability. The liability distribution is further assumed to have one or more thresholds (cut-offs) to discriminate between the categories.

The standard Normal distribution Liability is a latent variable, the scale is arbitrary, distribution is, therefore, assumed to be a Standard Normal Distribution (SND) or z-distribution: mean () = 0 and SD () = 1 z-values are the number of SD away from the mean area under curve translates directly to probabilities > Normal Probability Density function () Since the liability is a latent (unmeasured) variable, its location or scale are arbitrary, and it is customary to assume that it has a standard normal distribution (SND) or z-distribution which has a Mean of 0, and a Standard Deviation (SD) of 1. The z-distribution is described mathematically by the standard normal probability density function ( = phi), which is a bell-shaped curve. The z-values are the number of SDs away from the mean; the area under the curve between any two z-values can be directly interpreted as a probability: e.g. the area between 1 and +1 SD around the mean corresponds to 68%; between 2 and +2 with 95%. -3 3 - 1 2 -2 68%

Standard Normal Cumulative Probability in right-hand tail (For negative z values, areas are found by symmetry) Z0 Area 0 .50 50% .2 .42 42% .4 .35 35% .6 .27 27% .8 .21 21% 1 .16 16% 1.2 .12 12% 1.4 .08 8% 1.6 .06 6% 1.8 .036 3.6% 2 .023 2.3% 2.2 .014 1.4% 2.4 .008 .8% 2.6 .005 .5% 2.8 .003 .3% 2.9 .002 .2% -3 3 zT Area=P(z  zT) Mathematically, the area under a curve can be derived by integral calculus. The mathematical notation for integration is shown here. First: the integral symbol with the thresholds enclosing the area we are after; Secondly, the function of the liability distribution, the PDF, indicated by the phi symbol. Between brackets we indicate some distributional characteristics , mean=0, variance=1. Given any threshold (z-value), it’s possible to determine the probability between threshold and infinity, or threshold and minus infinity (which is 1- prob threshold to infinity). However, we don’t need to worry about integration of the SND, it has already been done by statisticians and we can look up these values in standard statistical tables. Here is a table with areas for a couple of z-values. Values for negative z-values are found by symmetry.

Example: From counts find z-value in Table For one variable it is possible to find a z-value (threshold) on the SND, so that the proportion exactly matches the observed proportion of the sample e.g. if from a sample of 1000 individuals, 120 have met a criteria for a disorder (12%): the z-value is 1.2 Z0 Area .6 .27 27% .8 .21 21% 1 .16 16% 1.2 .12 12% 1.4 .08 8% 1.6 .055 6% 1.8 .036 3.6% 2 .023 2.3% 2.2 .014 1.4% 2.4 .008 .8% 2.6 .005 .5% 2.8 .003 .3% 2.9 .002 .2% We can perform the inverse operation of finding a z-value (threshold) on the SND, such that the proportion above the threshold exactly matches the observed proportion of the sample. Thus, if from a sample of 1000 individuals, 120 subjects meet the criteria for a specific disorder (12%), the threshold z-value (cut-off) is found to be 1.2. In other words, the threshold value is estimated to be 1.2, from the observed counts of 120 affected individuals and 880 unaffected individuals. -3 1.2 3 unaff aff Counts: 880 120

Two categorical traits: Data from twins In an unselected sample of twins > Contingency Table with 4 observed cells: cell a:number of pairs concordant for unaffected cell d: number of pairs concordant for affected cell b/c: number of pairs discordant for the disorder Twin1 Twin2 1 545 (.76) 75 (.11) 56 (.08) 40 (.05) What happens when we have Categorical Data for Twins ? When the measured trait is dichotomous i.e. a disorder is either present or not, we can partition our observations into pairs concordant for not having the disorder (a), pairs concordant for the disorder (d) and discordant pairs in which one is affected and one is unaffected (b and c). These frequencies are summarized in a 2x2 contingency table: 0 = unaffected 1 = affected

Joint Liability Model for twin pairs Assumed to follow a bivariate normal distribution, where both traits have a mean of 0 and standard deviation of 1, but the correlation between them is unknown. The shape of a bivariate normal distribution is determined by the correlation between the traits When we have twin pair data, the assumption now is that the joint distribution of liabilities of twin pairs follows a bivariate normal distribution, where both traits have a mean of 0 and standard deviation 1, but the correlation between them is unknown. The shape of such a bivariate normal distribution is determined by the correlation.

Bivariate Normal r =.90 r =.00

Bivariate Normal (R=0. 6) partitioned at threshold 1 Bivariate Normal (R=0.6) partitioned at threshold 1.4 (z-value) on both liabilities

How are expected proportions calculated? By numerical integration of the bivariate normal over two dimensions: the liabilities for twin1 and twin2 e.g. the probability that both twins are affected : Φ is the bivariate normal probability density function, L1 and L2 are the liabilities of twin1 and twin2, with means 0, and  is the correlation matrix of the two liabilities T1 is threshold (z-value) on L1, T2 is threshold (z-value) on L2

(0 0) (1 1) (0 1) (1 0)

How is numerical integration performed? There are programmed mathematical subroutines that can do these calculations Mx uses one of them Like the SND the expected proportion under the curve between any ranges of values of the two traits can be calculated by means of numerical integration. There are programmed mathematical subroutines that can do these calculations. Mx uses one of them.

Expected Proportions of the BN, for R=0.6, Th1=1.4, Th2=1.4 Liab 2 1 Liab 1 .87 .05 .05 .03 1

How can we estimate correlations from CT? The correlation (shape) of the BN and the two thresholds determine the relative proportions of observations in the 4 cells of the CT. Conversely, the sample proportions in the 4 cells can be used to estimate the correlation and the thresholds. Twin2 Twin1 1 a b c d This Figure shows the correlated dimensions for twin pair data with the liabilities of twin1 and twin2 on the axes. The correlation (contour) and the two thresholds (cut-offs on the liability distribution) determine the relative proportions (a, b, c and d) of observations in the 4 cells of the CT. Conversely, the sample proportions in the 4 cells can be used to estimate the correlation and the thresholds. c c d d b a a b

Summary It is possible to estimate a correlation between categorical traits from simple counts (CT), because of the assumptions we make about their joint distributions: The Bivariate Normal The relative sample proportions in the 4 cells are translated to proportions under the BN so that the most likely correlation and the thresholds are derived

ACE Liability Model 1 1/.5 E C A A C E L L Unaf Unaf Aff Aff Twin 1 Not a proper path diagram, but shows the underlying assumptions of liability, thresholds etc. Unaf ¯ Aff Unaf ¯ Aff Twin 1 Twin 2

How can we fit ordinal data in Mx? Summary statistics: CT Mx has a built-in fit function for the maximum-likelihood analysis of 2-way Contingency Tables >analyses limited to only two variables Raw data analyses - multivariate - handles missing data - moderator variables

ML of RAW Ordinal data Is the sum of the likelihood of all observations. The likelihood of an observation is the expected proportion in the corresponding cell of the MN. The sum of the log-likelihoods of all observations is a value that (like for continuous data) is not very interpretable, unless we compare it with the LL of other models or a saturated model to get a chi-square index.

Raw Ordinal Data ordinal ordinal Zyg respons1 respons2 1 0 0 1 0 1 1 0 0 1 0 1 2 1 0 2 0 0 1 1 1 2 . 1 2 0 . 2 0 1 NOTE: smallest category should always be 0 !!

SORT ! We can speed up computation time considerably when the data is sorted since if case i+1 = case i, then likelihood is NOT recalculated. In e.g. the bivariate, 2 category case, there are only 4 possible vectors of observations : 1 1, 0 1, 1 0, 00 and, therefore, only 4 integrals for Mx to calculate if the data file is sorted.

Practical

Sample and Measures Australian Twin Registry data (QIMR) Self-report questionnaire Non-smoker, ex-smoker, current smoker Age of smoking onset Large sample of adult twins + family members Today using MZMs (785 pairs) and DZMs (536 pairs)

Variable: age at smoking onset, including non-smokers Ordered as: Non-smokers / late onset / early onset

Practical Exercise Analysis of age of onset data - Estimate thresholds - Estimate correlations - Fit univariate model Observed counts from ATR data: MZM DZM 0 1 2 0 1 2 0 368 24 46 0 203 22 63 1 26 15 21 1 17 5 16 2 54 22 209 2 65 12 133

Threshold Specification in Mx 2 Categories Matrix T: 1 x 2 T(1,1) T(1,2) threshold 1 for twin1 & twin2 -3 -1 3 Threshold Model T /

Threshold Specification in Mx 3 Categories Matrix T: 2 x 2 T(1,1) T(1,2) threshold 1 for twin1 & twin2 T(2,1) T(2,2) increment 2.2 -3 -1 1.2 3 Threshold Model L*T / 1 0 1 1 t11 t12 t21 t22 t11 t12 t11 + t21 t12 + t22 * =

polycor_smk.mx #define nvarx2 2 #define nthresh 2 #ngroups 2 G1: Data and model for MZM correlation DAta NInput_vars=3 Missing=. Ordinal File=smk_prac.ord Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 2 SELECT ageon_t1 ageon_t2 / Begin Matrices; R STAN nvarx2 nvarx2 FREE T FULL nthresh nvarx2 FREE L Lower nthresh nthresh End matrices; Value 1 L 1 1 to L nthresh nthresh

polycor_smk.mx #define nvarx2 2 ! Number of variables x number of twins #define nthresh 2 ! Number of thresholds=num of cat-1 #ngroups 2 G1: Data and model for MZM correlation DAta NInput_vars=3 Missing=. Ordinal File=smk_prac.ord ! Ordinal data file Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 2 SELECT ageon_t1 ageon_t2 / Begin Matrices; R STAN nvarx2 nvarx2 FREE T FULL nthresh nvarx2 FREE L Lower nthresh nthresh End matrices; Value 1 L 1 1 to L nthresh nthresh

polycor_smk.mx #define nvarx2 2 ! Number of variables per pair #define nthresh 2 ! Number of thresholds=num of cat-1 #ngroups 2 G1: Data and model for MZM correlation DAta NInput_vars=3 Missing=. Ordinal File=smk_prac.ord ! Ordinal data file Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 2 SELECT ageon_t1 ageon_t2 / Begin Matrices; R STAN nvarx2 nvarx2 FREE ! Correlation matrix T FULL nthresh nvarx2 FREE L Lower nthresh nthresh End matrices; Value 1 L 1 1 to L nthresh nthresh

polycor_smk.mx #define nvarx2 2 ! Number of variables per pair #define nthresh 2 ! Number of thresholds=num of cat-1 #ngroups 2 G1: Data and model for MZM correlation DAta NInput_vars=3 Missing=. Ordinal File=smk_prac.ord ! Ordinal data file Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 2 SELECT ageon_t1 ageon_t2 / Begin Matrices; R STAN nvarx2 nvarx2 FREE ! Correlation matrix T FULL nthresh nvarx2 FREE ! thresh tw1, thresh tw2 L Lower nthresh nthresh ! Sums threshold displacements End matrices; Value 1 L 1 1 to L nthresh nthresh ! initialize L

COV R / Thresholds L*T / Bound 0.01 1 T 1 1 T 1 2 Bound 0.1 5 T 2 1 T 2 2 Start 0.2 T 1 1 T 1 2 Start 0.2 T 2 1 T 2 2 Start .6 R 2 1 Option RS Option func=1.E-10 END

COV R / ! Predicted Correlation matrix for MZ pairs Thresholds L*T / ! Threshold model, to ensure t1>t2>t3 etc....... Bound 0.01 1 T 1 1 T 1 2 Bound 0.1 5 T 2 1 T 2 2 Start 0.2 T 1 1 T 1 2 Start 0.2 T 2 1 T 2 2 Start .6 R 2 1 Option RS Option func=1.E-10 END

COV R / ! Predicted Correlation matrix for MZ pairs Thresholds L*T / ! Threshold model, to ensure t1>t2>t3 etc....... Bound 0.01 1 T 1 1 T 1 2 Bound 0.1 5 T 2 1 T 2 2 ! Ensures positive threshold displacement Start 0.2 T 1 1 T 1 2 ! Starting values for the 1st thresholds Start 0.2 T 2 1 T 2 2 ! Starting values for the 2nd thresholds Start .6 R 2 1 ! Starting value for the correlation Option RS Option func=1.E-10 !function precision is less than usual END

! Test equality of thresholds between Tw1 and Tw2 EQ T 1 1 1 T 1 1 2 !constrain TH1 to be equal across Tw1 and Tw2 MZM EQ T 1 2 1 T 1 2 2 !constrain TH2 to be equal across Tw1 and Tw2 MZM EQ T 2 1 1 T 2 1 2 !constrain TH1 to be equal across Tw1 and Tw2 DZM EQ T 2 2 1 T 2 2 2 !constrain TH2 to be equal across Tw1 and Tw2 DZM End Get cor.mxs ! Test equality of thresholds between MZM & DZM EQ T 1 1 1 T 1 1 2 T 2 1 1 T 2 1 2 !constrain TH1 to be equal across all Males EQ T 1 2 1 T 1 2 2 T 2 2 1 T 2 2 2 !constrain TH2 to be equal across all Males

Exercise I Fit saturated model Test equality of thresholds Estimates of thresholds Estimates of polychoric correlations Test equality of thresholds Examine differences in threshold and correlation estimates for saturated model and sub-models Examine correlations What model should we fit? Raw ORD File: smk_prac.dat Script: polychor_smk.mx Location: kate\Ordinal_Practical

Estimates: smoking age-at-onset -2LL df Twin 1 Twin 2 Saturated 5128.185 3055 Th1 MZ 0.09 0.12 DZ 0.03 0.05 Th2 0.31 0.33 0.24 0.26 Cor 1 0.81 0.55

Estimates: smoking age-at-onset ΔХ2 Δ df P Twin 1 Twin 2 Sub-model 1 Th1 MZ DZ Th2 Cor Sub-model 2

Estimates: smoking age-at-onset ΔХ2 Δ df P Twin 1 Twin 2 Sub-model 1 0.77 4 0.94 Th1 MZ 0.10 DZ 0.04 Th2 0.32 0.25 Cor 1 0.81 0.55 Sub-model 2 2.44 6 0.88 0.07 0.29

ACEcat_smk.mx #define nvar 1 ! number of variables per twin #define nvarx2 2 ! number of variables per pair #define nthresh 1 ! number of thresholds=num of cat-1 #ngroups 4 ! number of groups in script G1: Parameters for the Genetic model Calculation Begin Matrices; X Low nvar nvar FREE ! Additive genetic path coefficient Y Low nvar nvar FREE ! Common environmental path coefficient Z Low nvar nvar FREE ! Unique environmental path coefficient End matrices; Begin Algebra; A=X*X' ; !Additive genetic variance (path X squared) C=Y*Y' ; !Common Environm variance (path Y squared) E=Z*Z' ; !Unique Environm variance (path Z squared) End Algebra; start .6 X 1 1 Y 1 1 Z 1 1 !starting value for X, Y, Z Interval @95 A 1 1 C 1 1 E 1 1 !requests the 95%CI for h2, c2, e2 End

G2: Data and model for MZ pairs DAta NInput_vars=3 Missing=. Ordinal File=prac_smk.ord Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 2 SELECT ageon_t1 ageon_t2 / Matrices = group 1 T FULL nthresh nvarx2 FREE ! Thresh tw1, thresh tw2 L Lower nthresh nthresh COV ! Predicted covariance matrix for MZ pairs ( A + C + E | A + C _ A + C | A + C + E ) / Thresholds L*T / !Threshold model Bound 0.01 1 T 1 1 T 1 2 ! Ensures positive threshold displacement Bound 0.1 5 T 2 1 T 2 2 Start 0.1 T 1 1 T 1 2 ! Starting values for the 1st thresholds Start 0.2 T 1 1 T 1 2 ! Starting values for the 2nd thresholds Option rs End

G3: Data and model for DZ pairs DAta NInput_vars=4 Missing=. Ordinal File=prac_smk.ord Labels zyg ageon_t1 ageon_t2 SELECT IF zyg = 4 SELECT ageon_t1 ageon_t2 / Matrices = group 1 T FULL nthresh nvarx2 FREE ! Thresh tw1, thresh tw2 L Lower nthresh nthresh H FULL 1 1 ! .5 COVARIANCE ! Predicted covariance matrix for DZ pairs ( A + C + E | H@A + C _ H@A + C | A + C + E ) / Thresholds L*T / !Threshold model Bound 0.1 1 T 1 1 T 1 2 ! Ensures positive threshold displacement Bound 0.1 5 T 2 1 T 2 2 Start 0.1 T 1 1 T 1 2 ! Starting values for the 1st thresholds Start 0.2 T 1 1 T 1 2 ! Starting values for the 2nd thresholds Option rs End

Constraint groups and degrees of freedom G4: CONSTRAIN VARIANCES OF OBSERVED VARIABLES TO 1 CONSTRAINT Matrices = Group 1 I UNIT 1 1 CO A+C+E= I / !constrains the total variance to equal 1 Option func=1.E-10 End Constraint groups and degrees of freedom As the total variance is constrained to unity, we can estimate one VC from the other two, giving us one less independent parameter: A + C + E = 1 therefore E = 1 - A - C So each constraint group adds a degree of freedom to the model.

Exercise II Fit ACE model Raw ORD File: smk_prac.dat What does the threshold model look like? Change it to reflect the findings from exercise I Raw ORD File: smk_prac.dat Script: ACEcat_smk.mx Location: kate\Ordinal_Practical