Examining Data. Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy.

Slides:



Advertisements
Similar presentations
Properties of Least Squares Regression Coefficients
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
The Simple Regression Model
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
 These 100 seniors make up one possible sample. All seniors in Howard County make up the population.  The sample mean ( ) is and the sample standard.
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
UNIDIMENSIONALITY – MULTIDIMENSIONALITY (An example) Panayiotis Panayides.
Introduction and Overview
Terminology A statistic is a number calculated from a sample of data. For each different sample, the value of the statistic is a uniquely determined number.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 25, Slide 1 Chapter 25 Comparing Counts.
Statistics for the Social Sciences Psychology 340 Spring 2005 Prediction cont.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Correlation 2 Computations, and the best fitting line.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Introduction to Probability and Statistics Linear Regression and Correlation.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Relationships Among Variables
Multivariate Methods EPSY 5245 Michael C. Rodriguez.
Reporting item response theory results Jeffrey B. Brookings Wittenberg University Presented at the SAMR/SWPA Symposium: Handy tips for communicating and.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Regression Analysis (2)
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Handling Data and Figures of Merit Data comes in different formats time Histograms Lists But…. Can contain the same information about quality What is meant.
Correlation.
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
Introduction to Inferential Statistics. Introduction  Researchers most often have a population that is too large to test, so have to draw a sample from.
Independent Samples t-Test (or 2-Sample t-Test)
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Confirmatory Factor Analysis Psych 818 DeShon. Construct Validity: MTMM ● Assessed via convergent and divergent evidence ● Convergent – Measures of the.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Section 10.1 Confidence Intervals
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
Applied Quantitative Analysis and Practices
Chapter Eight: Quantitative Methods
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Quality Control  Statistical Process Control (SPC)
Sampling Distributions: Suppose I randomly select 100 seniors in Anne Arundel County and record each one’s GPA
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
Principal Component Analysis
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Ch5.4 Central Limit Theorem
Regression Analysis AGEC 784.
Two-Sample Hypothesis Testing
Exploring Microarray data
Chapter 25 Comparing Counts.
Descriptive Statistics vs. Factor Analysis
Two-way analysis of variance (ANOVA)
Psychology as a Science
Principal Component Analysis
Chapter 26 Comparing Counts.
MBA 510 Lecture 2 Spring 2013 Dr. Tonya Balan 4/20/2019.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Examining Data.
Chapter 26 Comparing Counts.
Presentation transcript:

Examining Data

Constructing a variable 1. Assemble a set of items that might work together to define a construct/ variable. 2. Hypothesize the hierarchy of these items along that construct. 3. Choose a response format 4. Investigate how well the hierarchy holds for members of your response frame. 5. Ensure that your scale is unidimensional.

Unidimensionality Always Remember – Unidimensionality is never perfect. It is always approximate. Need to ask: "Is dimensionality in the data big enough to merit dividing the items into separate tests, or constructing new tests, one for each dimension?“ It may be that two or three off-dimension items have been included in your item instrument and should be dropped.  The question then becomes "Is the lack of unidimensionality in my data sufficiently large to threaten the validity of my results?"

Do my items fall along a unidimensional scale? We can investigate through  Person and Item Fit Statistics  The Principal Components Analysis of Residuals

A Rasch Assumption The Rasch model is based on the specification of "local independence".  Meaning that after the contribution of the measures to the data has been removed, all that will be left is random, normally distributed, noise. When a residual is divided by its model standard deviation, it will have the characteristics of being sampled from a unit normal distribution.

Residual-based Principal Components Analysis This is not a typical factor analysis PCAR intention is to explain variance. Specifically, it looks for the factor in the residuals that explains the most variance.  If factor is at the "noise" level, then no shared second dimension.  If factor is above the “noise” level, then it is the "second" dimension in the data.  Similarly, a third dimension is investigated, etc.

Example: Table 23 Table of STANDARDIZED RESIDUAL variance (in Eigenvalue units) Empirical Total variance in observations = % Variance explained by measures = % Unexplained variance (total) = % (100%) Unexpl var explained by 1st factor = % (18.5) The Rasch dimension explains 80.5% of the variance in the data. Is this good? The largest secondary dimension, "the first factor in the residuals" explains 3.6% of the variance. What do you think?

Table of STANDARDIZED RESIDUAL variance Empirical: variance components for the observed data Model: variance components expected for the data if exactly fit the Rasch model Total variance in observations: total variance in the observations around their Rasch expected values in standardized residual units Variance explained by measures: variance explained by the item difficulties, person abilities and rating scale structures. Unexplained variance (total): variance not explained by the Rasch measures Unexplained variance (explained by 1st, 2nd,... factor): size of the first, second,... component in the principal component decomposition of residuals

Unexplained variance explained by 1 st factor The eigenvalue of the biggest residual dimension is 4.6.  Indicating it has the strength of almost 5 items  In other words, the contrast between the strongly positively loading items and the strongly negatively loading items on the first factor in the residuals has the strength of about 5 items. Since positive and negative loading is arbitrary, it is necessary to look at the items at the top and the bottom of the factor plot.  Are those items substantively different? To the point they merit the construction of two separate tests?

How Big is Big? Rules of Thumb A "secondary dimension" must have the strength of at least 3 items. If the first factor has an eigenvalue less than 3, then the test is probably unidimensional. Individual items may still misfit. Simulation studies indicate that an eigenvalue less than 1.4 is at the random level; larger values indicate there is some structure present (R. Smith). No established criteria for when a deviation becomes a dimension.  PCA is only indicative, but not definitive.

Consider Liking for Science Output… Do the items at the top differ substantively from those at the bottom?

If still in doubt… Split your items into two subtests, based on positive and negative loadings on the first residual factor. Measure everyone on the two subtests and cross-plot the measures.  What is their correlation?  Do you see two versions of the same story about the persons?  If only a few people are noticeably off-diagonal, then you have a substantively unidimensional test.