Patterson, Dayton & Graubard ASA, August 2002, NYC 1 Rejoinder.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Mixture modelling of continuous variables. Mixture modelling So far we have dealt with mixture modelling for a selection of binary or ordinal variables.
Sampling: Final and Initial Sample Size Determination
Hypothesis: It is an assumption of population parameter ( mean, proportion, variance) There are two types of hypothesis : 1) Simple hypothesis :A statistical.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
© 2004 Prentice-Hall, Inc.Chap 1-1 Basic Business Statistics (9 th Edition) Chapter 1 Introduction and Data Collection.
Dr. Chris L. S. Coryn Spring 2012
Multivariate Data Analysis Chapter 11 - Structural Equation Modeling.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
7-1 Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft.
Today Concepts underlying inferential statistics
Mixture Modeling Chongming Yang Research Support Center FHSS College.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Determining Sample Size
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 13.
Sampling. Concerns 1)Representativeness of the Sample: Does the sample accurately portray the population from which it is drawn 2)Time and Change: Was.
Education 793 Class Notes T-tests 29 October 2003.
Chapter Nine Copyright © 2006 McGraw-Hill/Irwin Sampling: Theory, Designs and Issues in Marketing Research.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
Estimation Bias, Standard Error and Sampling Distribution Estimation Bias, Standard Error and Sampling Distribution Topic 9.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
7-1 Chapter Seven SAMPLING DESIGN. 7-2 Selection of Elements Population Element the individual subject on which the measurement is taken; e.g., the population.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Copyright ©2011 Pearson Education 7-1 Chapter 7 Sampling and Sampling Distributions Statistics for Managers using Microsoft Excel 6 th Global Edition.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Population and Sampling
Basic Business Statistics
Chapter 16 The Chi-Square Statistic
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
Confidence intervals and hypothesis testing Petter Mostad
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
Calibrated imputation of numerical data under linear edit restrictions Jeroen Pannekoek Natalie Shlomo Ton de Waal.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Understanding Sampling
Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Basic Business Statistics, 8e © 2002 Prentice-Hall, Inc. Chap 1-1 Inferential Statistics for Forecasting Dr. Ghada Abo-zaid Inferential Statistics for.
1 Chi-square Test Dr. T. T. Kachwala. Using the Chi-Square Test 2 The following are the two Applications: 1. Chi square as a test of Independence 2.Chi.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Tutorial I: Missing Value Analysis
1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.
Investigating the Potential of Using Non-Probability Samples Debbie Cooper, ONS.
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Approaches to quantitative data analysis Lara Traeger, PhD Methods in Supportive Oncology Research.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Estimating standard error using bootstrap
3. The X and Y samples are independent of one another.
4 Sampling.
Sampling.
Lecture 7 Sampling and Sampling Distributions
Chapter 7 Estimation: Single Population
Presentation transcript:

Patterson, Dayton & Graubard ASA, August 2002, NYC 1 Rejoinder

Patterson, Dayton & Graubard ASA, August 2002, NYC 2 Purpose of Study I. Utilizing latent variable methods, to develop a new and useful definition of low-consumption and high-consumption groups among vegetable consumers. II. To extend conventional methods of latent class analysis to the analysis of data from complex survey designs incorporating weights and clusters.

Patterson, Dayton & Graubard ASA, August 2002, NYC 3 Points of Contention – Binary Scale Use of binary scale of measurement deemed to be consistent with reliability (validity?) of dietary intake data available in CSFII. Dichotomizing the data allows for relatively sophisticated analysis with weak distributional assumptions. Other options: Continuous measurement (grams) – fit normal mixture model (e.g.) Servings (count) – fit mixture of Poisson processes (e.g.)

Patterson, Dayton & Graubard ASA, August 2002, NYC 4 Points of Contention – Two Class Model With four binary variables there are 16 unique response patterns. An unrestricted 2-class model requires estimation of 9 unique parameters and leaves 6 degrees of freedom for assessing fit (conventional LCA). An unrestricted 3-class model requires estimation of 14 unique parameters and appears to leave 1 degree of freedom for assessing fit (conventional LCA). However, the model is unidentified (asymptotic covariance matrix of rank 13, not 14). Hence, there is no unique unrestricted 3-class model for 4 binary responses. We did not view this as a concern since our primary interest was to identify low- and high-consumption groups.

Patterson, Dayton & Graubard ASA, August 2002, NYC 5 Points of Contention – Two Class Model (Cont’d) Constrained models – Since day 1 used face-to-face interview and days 2-4 were completed by telephone, it seems reasonable to constrain conditional probabilities in a corresponding manner. In fact, a differential pattern was apparent for the low-consumption class but not for the high-consumption class. We did not pursue this post-hoc observation with additional modeling but our methods could be used to obtain suitable estimates (model comparison is a more challenging issue). Separating vegetables – the data table can be enlarged by disaggregating types of vegetables and this would allow for the exploration of models with more than two latent classes. However, with a sample size of 1028, the resulting table would rapidly become sparse and pose analytical problems. On the other hand, a few categories of vegetables might be defined (e.g., deep yellow, dark-green leafy) and the resulting analyses could be of interest. Finally, it is tempting to model the actual 6 measurement occasions taking into account missingness. Conventional Markov latent class analysis, for example, could be (relatively) easily adapted for complex survey designs. We elected not to pursue this avenue since the 6 data points do not actually represent consistent time intervals during the year for different respondents. Also, for some respondents, observations were randomly deleted to yield 4 usable measures whereas for other respondents there were missing data. However, the mechanism for missingness at the respondent level is not reported in the database available for analysis.

Patterson, Dayton & Graubard ASA, August 2002, NYC 6 Points of Contention - Classification Given estimates for the latent class parameters, a Bayes rule can be applied to classify each respondent as a low- or high-consumer. Given these classifications, additional analyses can be conducted using the classes as the dependent variable (e.g., logistic regression). In conventional LCA, this method constitutes a two-stage “approximation” to the covariate latent class model presented by Dayton & Macready (1988) in JASA. The linear logistic covariate model for the proportion in latent class 1, conditional on covariates, Z, is of the form: Other than a recent study by Kuo (2001), very little is known about how well the two-stage strategy approximates the covariate model and nothing is known in the context of complex survey designs.

Patterson, Dayton & Graubard ASA, August 2002, NYC 7 Points of Contention - Sample Weights The sampling design for CSFII is complex with stratified multistage cluster sampling as well as adjustment, using weights, for non-response and other factors. IF a homogeneous latent class model were appropriate (i.e., equal conditional probabilities across strata and clusters) THEN, on average, the use of weights would have no effect on the estimates for conditional probabilities. It seems unlikely that such homogeneity characterizes the population and there is no direct method to statistically assess this assumption given the complex design. Our model incorporating weights is aimed at estimating an overall population model (“census” model). In fact, the patterns of conditional probabilities with and without the use of weights are very similar with some notable depression of estimates for the low-consumption latent class. However, estimates for the proportion in the low-consumption class do differ widely (.33 unweighted versus.18 weighted).

Patterson, Dayton & Graubard ASA, August 2002, NYC 8 Points of Contention – Sampling Variances We chose to explore a jackknife estimate for sampling variances of the latent class parameter estimates. The jackknife is relatively easy to program, has wide applicability and can be implemented to capture the characteristics of a complex survey design. As expected, jackknife variances tend to slightly overestimate “true” variances but coverage for confidence intervals are close to the nominal value (.95). In the context of the present study, there are certainly other methods that should be explored for estimating sampling variances including linearization, balanced half-sample replication and bootstrapping.

Patterson, Dayton & Graubard ASA, August 2002, NYC 9 Future Research Directions Assessing Model Fit – In conventional LCA, observed and expected frequencies can be used to compute a Pearson or likelihood-ratio chi-square goodness-of-fit statistic. The performance of such tests in the context of complex survey data with estimation based on pseudo-likelihoods is not known. We utilized a Wald test to assess fit although its performance for latent class in the present context has not been studied. Comparing Models – In conventional LCA, difference chi-square statistics can be used to compare nested models only if no restriction to a boundary value is required to constrain the more complex model to the simpler model. For example, a legitimate application would be restricting conditional probabilities to equality for times 2-4. However, as is true for mixture models in general, boundary constraints violate the asymptotics required for the chi-square tests. Thus, for example, an unrestricted two-class model cannot be compared to an unrestricted three-class model. Information criteria such as AIC or BIC have been suggested for this latter purpose but there are limited simulation results for these methods at present. For complex survey data, even the former case is questionable although Wald tests may be useful here. Bayesian Methods – if one were willing to assume prior information about parameters, modern computer-intensive methods such as Markov chain Monte Carlo (MCMC) could be used to simulate distributional results and offer new opportunities for analysis.