A Top Ten List of Measurement-related Erros Alan D. Mead Illinois Institute of Technology.

Slides:



Advertisements
Similar presentations
Chapter 8 Flashcards.
Advertisements

Psych 5500/6500 t Test for Two Independent Groups: Power Fall, 2008.
Structural Equation Modeling
Reliability and Validity
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Hypothesis Testing Steps in Hypothesis Testing:
1 G Lect 2a G Lecture 2a Thinking about variability Samples and variability Null hypothesis testing.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Statistical Issues in Research Planning and Evaluation
Copyright © Allyn & Bacon (2007) Data and the Nature of Measurement Graziano and Raulin Research Methods: Chapter 4 This multimedia product and its contents.
1 IQ and Personality Assessment Psychology 631 William P. Wattles, Ph. D.
Test Validity: What it is, and why we care.
Beginning the Research Design
Concept of Reliability and Validity. Learning Objectives  Discuss the fundamentals of measurement  Understand the relationship between Reliability and.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Meta-analysis & psychotherapy outcome research
Class 2: Tues., Sept. 14th Correlation (2.2) Introduction to Measurement Theory: –Reliability of measurements and correlation –Example that demonstrates.
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Today Concepts underlying inferential statistics
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Chapter 7 Evaluating What a Test Really Measures
Relationships Among Variables
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Modified for EPE/EDP 711 by Kelly Bradley on January 8, 2013.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 3 Correlation and Prediction.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Validity. Face Validity  The extent to which items on a test appear to be meaningful and relevant to the construct being measured.
Reliability & Validity
Chapter 1 Measurement, Statistics, and Research. What is Measurement? Measurement is the process of comparing a value to a standard Measurement is the.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Psychology 820 Correlation Regression & Prediction.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Measurement MANA 4328 Dr. Jeanne Michalski
RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.
Item Analysis: Classical and Beyond SCROLLA Symposium Measurement Theory and Item Analysis Heriot Watt University 12th February 2003.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
The Design of Statistical Specifications for a Test Mark D. Reckase Michigan State University.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 9 Introduction to the t Statistic
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
Data and the Nature of Measurement
Selecting the Best Measure for Your Study
Assessment Theory and Models Part II
Chapter 21 More About Tests.
Evaluation of measuring tools: validity
Reliability & Validity
Introduction to Measurement
پرسشنامه کارگاه.
Inferential Statistics
Presentation transcript:

A Top Ten List of Measurement-related Erros Alan D. Mead Illinois Institute of Technology

1: Ignoring error You... Tested two candidates for a job, and one scored 77 and one scored 74, which would you pick? Found validities for tests A and B of r AC =.05 and r BC =.25 with N=30, which would you use? Recruited people for a developmental intervention if they were in the bottom 10% of a pre-test. Will your intervention successfully raise a post-test score?

2: Being too precise Maximizing internal consistency maximizes reliability if (and only if) errors are uncorrelated In many situations, you would be better off with more, shorter (less precise) measures Validity is probably more important than reliability, but 98% of psychometric theory is about maximizing reliability

3: Creating narrow measures Our tools (e.g., internal consistency reliability, factor analysis) encourage us to create unidimensional scales Excessive internal consistency can rob your measure of content validity Many criteria are multidimensional

4: Assuming R 2 goes to 1.0 Our best predictors have validities near.50, that's only accounting for 25% of the variability of the criterion Should we be sad? What's the maximum of R 2 ? What's the most variance that we can account for in the criterion? 1.0/100% is only the theoretical maximum How predictable is behavior? I'd guess “not very”

5: Capitalizing on chance Taking a non-cross-validated statistic as Gospel Examples include... R 2 Reliability of a test after selecting items Validity of an empirically-keyed test Worse problems when... You select a few elements from a large set Statistics are imprecise (e.g., small N) Elements vary greatly in “base rate” of “goodness”

6: Misusing NHST Using NHST when it's not needed World is round, p <.05 Simulation studies Having too little power What is the power to detect a true correlation of.35 with N=100 (and otherwise reasonably favorable conditions)? [See Schmidt, Hunter & Urry, 1976] “Proving” H0 DIF, comparability

7: Relying on back-translation Carefully scrutinizing a back translation, to check that translation: The flesh is willing, but the spirit is weak The vodka is good, but the meat is rotten Can you just “carefully scrutinize” a test, rather than pilot testing? No, people are not very good at this Also, some concepts back-translate well, but are meaningless (e.g., ice hockey)

8: Ignoring model-data fit There is a “scientific beauty” to statistical models that is hard to explain But, things turn ugly if the model fails to closely describe real-world data Assumptions of the model (and robustness of the violations) Identifiability of the model (can we fit our model?) Global fit (e.g., AIC, GFI, Chi-Square) Local fit (individual elements of the model)

9: Drinking the empiricism Kool-Aid Theory is more important than data Data are often flawed Data is always sampled with error (often substantial) Theory plus appropriate data (and analysis) is the strongest But it's a lot less common than you might suppose Most of our individual studies are either (1) very narrow or (2) flawed

10: Validating without power When feasible, test validation is expensive and difficult How big a sample do you need to validate? To achieve 90% power when true validity is.35 and criterion reliability is.60, you need at least N=276 A sample of N=105 would only provide 50% power! Multiply by 2-10 if there is moderate to severe range restriction or much less reliable criterion And if you do not reject H0, then you know essentially nothing!

Bonus: Assuming linearity Relationships are, often, well-approximated by linear models We also know that cognitive ability tests generally have linear relationships with the criterion However, it does not follow that all predictors will always have linear relations with criteria Consider 16PF Factor G, Rule-Consciousness

Bonus: Assuming linearity

Errors cause problems Measurement error (i.e., lack of perfect reliability) attenuates relations Sampling error makes interpretations hard

Ignoring error (cont.) If you tested two candidates for a job, and one scored 77 and one scored 74, which would you pick?

If you found validities for tests A and B of r AC =.05 and r BC =.25 with N=30, which would you use?