Examining Data.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Kin 304 Regression Linear Regression Least Sum of Squares
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
 These 100 seniors make up one possible sample. All seniors in Howard County make up the population.  The sample mean ( ) is and the sample standard.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
UNIDIMENSIONALITY – MULTIDIMENSIONALITY (An example) Panayiotis Panayides.
Introduction and Overview
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Introduction to Probability and Statistics Linear Regression and Correlation.
Correlation and Regression Analysis
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Hypothesis Testing II The Two-Sample Case.
Chapter 2 Dimensionality Reduction. Linear Methods
Correlation.
Tests and Measurements Intersession 2006.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.
Applied Quantitative Analysis and Practices
Chapter Eight: Quantitative Methods
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Quality Control  Statistical Process Control (SPC)
Principal Component Analysis
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Comparing Counts Chi Square Tests Independence.
Exploring Group Differences
23. Inference for regression
Exploratory Factor Analysis
Statistical analysis.
Ch5.4 Central Limit Theorem
Probability plots.
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
BAE 5333 Applied Water Resources Statistics
Introduction to Regression Analysis
Two-Sample Hypothesis Testing
Statistics for the Social Sciences
Exploring Microarray data
Sampling Distributions and Estimation
Statistical analysis.
Linear Regression and Correlation Analysis
Statistical Data Analysis - Lecture10 26/03/03
Sample Mean Distributions
Chapter 25 Comparing Counts.
BPK 304W Correlation.
The Practice of Statistics in the Life Sciences Fourth Edition
6-1 Introduction To Empirical Models
Hypothesis Testing Two Proportions
Two Independent Samples
Descriptive Statistics vs. Factor Analysis
EPSY 5245 EPSY 5245 Michael C. Rodriguez
AP Statistics: Chapter 18
Two-way analysis of variance (ANOVA)
Psychology as a Science
Topic 1: Statistical Analysis
Sampling Distributions of Proportions section 7.2
Principal Component Analysis
Chapter 26 Comparing Counts.
Summary (Week 1) Categorical vs. Quantitative Variables
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Presentation transcript:

Examining Data

Constructing a variable Assemble a set of items that might work together to define a construct/ variable. Hypothesize the hierarchy of these items along that construct. Choose a response format Investigate how well the hierarchy holds for members of your response frame. Ensure that your scale is unidimensional.

Unidimensionality Always Remember – Unidimensionality is never perfect. It is always approximate. Need to ask: "Is dimensionality in the data big enough to merit dividing the items into separate tests, or constructing new tests, one for each dimension?“ It may be that two or three off-dimension items have been included in your item instrument and should be dropped. The question then becomes "Is the lack of unidimensionality in my data sufficiently large to threaten the validity of my results?"

Do my items fall along a unidimensional scale? We can investigate through Person and Item Fit Statistics The Principal Components Analysis of Residuals

A Rasch Assumption The Rasch model is based on the specification of "local independence". Meaning that after the contribution of the measures to the data has been removed, all that will be left is random, normally distributed, noise. When a residual is divided by its model standard deviation, it will have the characteristics of being sampled from a unit normal distribution.

Residual-based Principal Components Analysis This is not a typical factor analysis PCAR intention is to explain variance. Specifically, it looks for the factor in the residuals that explains the most variance. If factor is at the "noise" level, then no shared second dimension. If factor is above the “noise” level, then it is the "second" dimension in the data. Similarly, a third dimension is investigated, etc.

Example: Table 23 Table of STANDARDIZED RESIDUAL variance (in Eigenvalue units) Empirical Total variance in observations = 127.9 100.0% Variance explained by measures = 102.9 80.5% Unexplained variance (total) = 25.0 19.5% (100%) Unexpl var explained by 1st factor = 4.6 3.6% (18.5) The Rasch dimension explains 80.5% of the variance in the data. Is this good? The largest secondary dimension, "the first factor in the residuals" explains 3.6% of the variance. What do you think?

Table of STANDARDIZED RESIDUAL variance Empirical: variance components for the observed data Model: variance components expected for the data if exactly fit the Rasch model Total variance in observations: total variance in the observations around their Rasch expected values in standardized residual units Variance explained by measures: variance explained by the item difficulties, person abilities and rating scale structures. Unexplained variance (total): variance not explained by the Rasch measures Unexplained variance (explained by 1st, 2nd, ... factor): size of the first, second, ... component in the principal component decomposition of residuals

Unexplained variance explained by 1st factor The eigenvalue of the biggest residual dimension is 4.6. Indicating it has the strength of almost 5 items In other words, the contrast between the strongly positively loading items and the strongly negatively loading items on the first factor in the residuals has the strength of about 5 items. Since positive and negative loading is arbitrary, it is necessary to look at the items at the top and the bottom of the factor plot. Are those items substantively different? To the point they merit the construction of two separate tests?

How Big is Big? Rules of Thumb A "secondary dimension" must have the strength of at least 3 items. If the first factor has an eigenvalue less than 3, then the test is probably unidimensional. Individual items may still misfit. Simulation studies indicate that an eigenvalue less than 1.4 is at the random level; larger values indicate there is some structure present (R. Smith). No established criteria for when a deviation becomes a dimension. PCA is only indicative, but not definitive.

Consider Liking for Science Output… Do the items at the top differ substantively from those at the bottom?

If still in doubt… Split your items into two subtests, based on positive and negative loadings on the first residual factor. Measure everyone on the two subtests and cross-plot the measures. What is their correlation? Do you see two versions of the same story about the persons? If only a few people are noticeably off-diagonal, then you have a substantively unidimensional test.