Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho.

Slides:



Advertisements
Similar presentations
Quantitative Methods Topic 5 Probability Distributions
Advertisements

Ch. 10: Summarizing the Data
WHAT DO THEY ALL MEAN?. Median Is the number that is in the middle of a set of numbers. (If two numbers make up the middle of a set of numbers then the.
M&Ms Statistics.
1  1 =.
1 1 Slide © 2003 South-Western/Thomson Learning TM Slides Prepared by JOHN S. LOUCKS St. Edwards University.
Scoring Terminology Used in Assessment in Special Education
Measurement, Evaluation, Assessment and Statistics
Data handling Thursday. Objectives for today Extending our knowledge of statistics – range, mode, median and the mean. mode rangeToday we are focusing.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Number bonds to 10,
Example: New scores on the Roberts Test of Agricultural Knowledge:
A.P. Psychology Statistics Notes. Correlation  The way 2 factors vary together and how well one predicts the other  Positive Correlation- direct relationship.
Descriptive (Univariate) Statistics Percentages (frequencies) Ratios and Rates Measures of Central Tendency Measures of Variability Descriptive statistics.
Descriptive Statistics Chapter 3 Numerical Scales Nominal scale-Uses numbers for identification (student ID numbers) Ordinal scale- Uses numbers for.
Descriptive Statistics Statistical Notation Measures of Central Tendency Measures of Variability Estimating Population Values.
Appraisal in Counseling Session 2. Schedule Finish History Finish History Statistical Concepts Statistical Concepts Scales of measurement Scales of measurement.
Copyright 2001 by Allyn and Bacon Standardized Testing Chapter 14.
Data observation and Descriptive Statistics
The Data Analysis Plan. The Overall Data Analysis Plan Purpose: To tell a story. To construct a coherent narrative that explains findings, argues against.
Quiz 2 Measures of central tendency Measures of variability.
Statistics Used In Special Education
MSE 600 Descriptive Statistics Chapter 10 in 6 th Edition (may be another chapter in 7 th edition)
Data Analysis and Statistics. When you have to interpret information, follow these steps: Understand the title of the graph Read the labels Analyze pictures.
1.3 Psychology Statistics AP Psychology Mr. Loomis.
Statistics Recording the results from our studies.
Statistics 1 Measures of central tendency and measures of spread.
Thinking About Psychology: The Science of Mind and Behavior 2e Charles T. Blair-Broeker Randal M. Ernst.
Describing Behavior Chapter 4. Data Analysis Two basic types  Descriptive Summarizes and describes the nature and properties of the data  Inferential.
Descriptive Statistics
Chapter 6 Foundations of Educational Measurement Part 1 Jeffrey Oescher.
Psychology’s Statistics. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
A tour of fundamental statistics introducing Basic Statistics.
Agenda Descriptive Statistics Measures of Spread - Variability.
Hotness Activity. Descriptives! Yay! Inferentials Basic info about sample “Simple” statistics.
Basic Statistical Terms: Statistics: refers to the sample A means by which a set of data may be described and interpreted in a meaningful way. A method.
Part II  igma Freud & Descriptive Statistics Chapter 2 Means to an End: Computing and Understanding Averages.
Statistical Analysis of Data. What is a Statistic???? Population Sample Parameter: value that describes a population Statistic: a value that describes.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
1 Outline 1. Why do we need statistics? 2. Descriptive statistics 3. Inferential statistics 4. Measurement scales 5. Frequency distributions 6. Z scores.
Standardized Testing. Basic Terminology Evaluation: a judgment Measurement: a number Assessment: procedure to gather information.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 2 The Mean, Variance, Standard.
Wamup What information can you get from the graph? Which had a more symmetrical distribution of scores?
On stats Descriptive statistics reduce data sets to allow for easier interpretation. Statistics allow use to look at average scores. For instance,
Measures of Central Tendency (MCT) 1. Describe how MCT describe data 2. Explain mean, median & mode 3. Explain sample means 4. Explain “deviations around.
HMS 320 Understanding Statistics Part 2. Quantitative Data Numbers of something…. (nominal - categorical Importance of something (ordinal - rankings)
Psychology’s Statistics Appendix. Statistics Are a means to make data more meaningful Provide a method of organizing information so that it can be understood.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
Measures of Central Tendency, Variance and Percentage.
Interpreting Test Results using the Normal Distribution Dr. Amanda Hilsmier.
AP PSYCHOLOGY: UNIT I Introductory Psychology: Statistical Analysis The use of mathematics to organize, summarize and interpret numerical data.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 10 Descriptive Statistics Numbers –One tool for collecting data about communication.
Introductory Psychology: Statistical Analysis
Measures of Central Tendency
Statistical Reasoning in Everyday Life
How Psychologists Ask and Answer Questions Statistics Unit 2 – pg
z-Scores, the Normal Curve, & Standard Error of the Mean
Shoe Sizes.
Science of Psychology AP Psychology
Understanding Research Results: Description and Correlation
Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores
Research Statistics Objective: Students will acquire knowledge related to research Statistics in order to identify how they are used to develop research.
Central Tendency.
Statistical Evaluation
Introduction to Statistics
Quantitative Methods PSY302 Quiz Normal Curve Review February 6, 2017
Norms.
Week 11.
Lecture 4 Psyc 300A.
Standard Deviation Mean - the average of a set of data
Presentation transcript:

Psychometrics 101: Foundational Knowledge for Testing Professionals Steve Saladin, Ph.D. University of Idaho

What we are going to cover Understanding test scoresDamn the statistics & full speed ahead! Accuracy of the testError, error everywhere so whats my score? ValidityTo use or not to use, that is the question!

Criterion-referenced vs norm- referenced Is performance rated on some pre-established cut points or is it based on comparisons with others Class room grading is generally criterion based 90% right=A, 80%=B, 70%=C, etc. Typically reported as a percentage correct or P/F Grading on the curve means grade based on comparison with rest of class (norm-referenced) 80% might be a B, an A, a C or something else.

Standardized tests are typically norm-referenced SAT, ACT, GRE, IQ test Typically reported as percentile or standard score Certification exams are often criterion-referenced Proctor certification, licensing exams Typically reported as percentage correct or P/F Sometimes you get a mix GED uses norms to establish cut-scores Important to note difference between percentile and percentage correct Criterion-referenced vs norm- referenced

Damn the Statistics & full speed ahead! Testing is all about quantifying something about people (skills, knowledge, behavior, etc.) Stats are just a way to describe the numbers Make it more understandable Reveal relationships To understand norm-referenced test scores, you need to know two general things What is the typical score? To what degree did others score differently?

Whats typical? Mean Median Mode How different are the scores? Range = highest – lowest = 40 Variance = average of squared differences from mean = Standard Deviation = square root of Variance = = arithmetic average = 30 = # in the middle = 30 = most frequently occurring # = 40

Standard Normal Distribution Normal Curve Assumes trait is normally distributed in population Mean Standard deviation

Examples SAT/GRE scores are based on a scale of Mean = 500 SD = 100 The Wechsler IQ test Mean = 100 SD = 15 ACT scores range from 1 – 36 Mean = 18 SD = 6

The Normal Curve %tile <1% 2.5% 16% 50% 84% 97.5% 99.5% GRE SAT IQ ACT

How are these things related? GRE scores and Grad School grades CLEP scores and final exam scores Compass/Accuplacer scores and success in entry classes Motivation and cheating Correlation tells us if things vary or change in a related way Higher GRE scores means higher grades Lower motivation suggests higher levels of cheating

Some Facts About Correlation Ranges from +1.0 to -1.0 Sign tells you direction of correlation + as A gets bigger so does B - as A gets bigger, B gets smaller

How To Lie With Statistics! Test Taking linked to Longevity! A recent study found that people who had taken more tests during early adulthood tended to live longer. The number of tests taken between the ages of 16 and 30 correlated strongly with the age of death. The more tests you take, the longer you will live!

Some Facts About Correlation It is not causation, but can be used to predict Small samples may miss relationship Heterogeneous samples may miss relationship

Error, Error Everywhere No test is perfect, no measurement is perfect ________ Get more precise, but never get exact Score = Truth + Error

Error, Error Everywhere Error can be lots of things including The environment The test-taker Procedural variations The test itself Since error makes scores inconsistent or unreliable, a measure of reliability of scores is important

Reliability Test-Retest Test group on two different occasions and correlate the results Are results stable over time Internal Consistency Correlate score on each item to total Are they all measuring the same thing Alternate Forms Develop two versions of same test and correlate scores on each Are your versions comparable All correlations so subject to same problems

So whats good? GRE has reported reliability of 0.89 (Quantitative), 0.92 (Verbal) GRE Guide to Use of Scores, ACT Technical Manual reports Composite score reliability of.97 SAT reports reliabilities of Test Caharacteristics of the SAT on COMPASS alternate forms reliability reported to be d.pdf

Reliability & Error

Theoretical distribution of scores | %------| | % | | 2% 14% 34% 34% 14% 2% % 16% 50% 84% 96% 1 SEM below to 1 SEM above = 68% confidence 2 SEM below to 2 SEM above = 95% confidence

SEM for some tests GRE Verbal.34, Quantitative.51, so 68% confidence interval for score of 500 is for Verbal, for Quantitative Only reported in increments of 10 GRE Guide to Use of Scores, ACT Composite SEM.91, so 68% confidence interval for score of 20 is ACT Technical Manual WAIS-IV FSIQ SEM is 2.16, so 68 % confidence interval for score of 100 is

Does Reliability = Validity? Getting a consistent result means reliability Having that result be meaningful is validity Validity is based on inferences you make from results Test has to be reliable to be valid Test does not have to be valid to be reliable NO !

Validity Any evidence that a test measures what it says it is measuring Any evidence that inferences made from the test are useful and meaningful 3 types of evidence Content Criterion-Related Construct

Content Validity Think of a test as a sample of possible problems/items 4 th grade spelling test should be a representative sample of 4 th grade spelling words GRE Quantitative should be a representative sample of the math problems a grad school applicant might be expected to solve Should be part of design Identifying # of algebra, trig, calculus, etc. should be on test (table of specifications) Frequently evaluated by item analysis or expert opinions

Criterion-Related Validity How does test score correlate with some external measure (criterion) Placement test score and performance in class Admission test score and GPA for first semester Sometimes called Predictive or Concurrent Validity Correlation that is effected by error in the test and error in the criterion Only top students take GRE Graduate School grade restriction

To use or not to use…. Depends on the question…. What is impact of decision? What is cost of using? Of not using? Decision Theory can be a guide to determining incremental validity Net gain in using scores

Decision Theory False negativeTrue positive True negativeFalse Positive GPAGPA GRE score ABCABC Maximize success

Decision Theory False negativeTrue positive True negativeFalse Positive GPAGPA GRE score ABCABC Maximize opportunity

Predictive Utility Effectiveness = True Positive + True Negative True Pos+False Pos+True Neg+False Neg Have to weigh effectiveness against cost

Construct Validity Most important for psychological test where what you are measuring is abstract or theoretical Intelligence Personality characteristics Attitudes and beliefs Usually involves multiple pieces of evidence

Construct Validity Convergentcorrelates with measures of same thing Divergentdoes not correlate with measures of something else Scores show expected changes after treatment, education, maturation, etc. Factor analysis supports expected factor structure

Things to remember The normal curve Correlation Reliability Standard Error of the Measurement Validity Decision Theory

Not all scales are created equal Nominal ScaleSex, Race Ordinal ScalePercentile Rank, Letter Grades Interval ScalesIQ, temperature, SAT Ratio Scalesspeed, weight