UNIT I INTRODUCTION TO MEASUREMENT THEORY CHAP 1: WHAT IS TEST THEORY CHAP 2: STATISTICAL CONCEPTS FOR TEST THEORY CHAP 3: INTRODUCTION TO SCALLING CHAP 4: PROCESS OF TEST CONSTRUCTION CHAPTER 5: TEST SCORES AS COMPOSITES
UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8: INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY COEFFICIENTS FOR CRITERION-REFERENCED TESTS
UNIT III VALIDITY CHAP 10: INTRODUCTION TO VALIDITY CHAP 11: STATISTICAL PROCEDURES FOR PREDICTION AND CLASSIFICATION CHAP 12: BIAS IN SELECTION CHAP 13: FACTOR ANALYSIS
UNIT IV ITEM ANALYSIS IN TEST DEVELOPMENT CHAP 14: ITEM ANALYSIS CHAP 15: INTRODUCTION TO ITEM RESPONSE THEORY CHAP 16: DETECTING ITEM BIAS
UNIT V TEST SCORING AND INTERPRETATION CHAP 17: CORRECTING FOR GUESSING AND OTHER SCORING METHODS CHAP 18: SETTING STANDARDS CHAP 19: NORMS AND STANDARD SCORES CHAP 20: EQUATING SCORES FROM DIFFERENT TESTS
Introduction to Classical and Modern Test Theory Chapter 1
*What is Test Theory? The study of measurement problems, influence of these measurement problems on tests or inventories, and how to create methods to minimize these problems
Pioneer countries in test theory are: Historic Origins Pioneer countries in test theory are: Germany, England, France, and the United States
Germany Wilhelm Wundt, Ernest Weber, and Gustavo Fechner used procedures for collection of observations in a standard way for all subjects, such as reading the instructions at the top of the test page (see next slide).
Germany Multiple Choice Identify the choice that best completes the statement or answers the question. 1. The type of sensation you experience depends on which area of the brain is activated. This is known as a. sensory localization. b.transduction. c.sensory adaptation.d.cerebralization. 2. A hypnic jerk usually occurs during a.light sleep.b.deep sleep.c.episodes of hypersomnia.d.episodes of sleep apnea. See p.14 Exercise 4-b
Germany p.14 Exercise 4-b 4.Consider the following testing practices and indicate which nineteenth-century psychological researcher probably should be credited with the origin? b. A teacher about to give a test reads aloud from the test manual: “Please read the instructions at the top of the page silently while I read them aloud…..” (see previous slide)
England Karl Pearson-----Pearson Correlation Charles Spearman-----Spearman Correlation. Used Factor Analysis in his “Theory of Intelligence.” Galton----Categorizing half cousin to Darwin
*The Difference between Ratio IQ and Deviation IQ or Normative IQ France Alfred Binet & Theodore Simon (1905) Developed the first IQ test. IQ=MA/CAx100 MA=Mental Age CA= Chronological Age *The Difference between Ratio IQ and Deviation IQ or Normative IQ
*James McKeen Cattell “Mental Testing” United States *James McKeen Cattell “Mental Testing” Thorndike -- An Introduction to the Theory of Mental and Social Measurement Trail and Error A Theory of Learning
Key Terms Test Optimal Performance Typical Performance Observable Performance Constructs Measurement
Key Terms Test: Test is a Procedure for obtaining a sample of an individual’s performance. Optimal Performance: Refers to the performance on Aptitude Tests (GRE,SAT,ACT), or Achievement Tests (WRAT, WIAT)
Key Terms Typical Performance: Refers to the performance on questioners and inventories to report one’s feelings, attitudes, interests, or reactions to a situation. Observable Performance: Refers to perform in an observable behavior (watching children interacting with each others, natural observation).
Key Terms Measurement: Quantifying an observable behavior or when quantitative value is given to a behavior. See Exercise 1 & 2 on P.14
Confounding Variables Confounding variables are variables that the researcher failed to control, or eliminate, damaging the internal validity of an experiment. Also, known as a third variable or a mediator variable, can adversely affect the relation between the independent variable and dependent variable. Ex. Next
Heavy drinkers die at a younger age
Ex. A research group might design a study to determine if heavy drinkers die at a younger age. Heavy drinkers may be more likely to smoke, or eat junk food, all of which could be factors in reducing longevity. A third variable may have adversely influenced the results.
Intervening Variables A variable that explains a relation or provides a causal link between other variables. Also called “Mediating Variable” or “intermediary variable.” Ex. Next slide
Intervening Variables Ex: The statistical association between income and longevity needs to be explained because just having money does not make one live longer. Other variables intervene between money and long life. People with high incomes tend to have better medical care than those with low incomes. Medical care is an intervening variable. It mediates the relation between income and longevity.
extraneous variables These variables are undesirable because they add error to an experiment. A major goal in research design is to decrease or control the influence of extraneous variables as much as possible. Ex; In a study examining the effect of post-secondary education on lifetime earnings, some extraneous variables might be gender, ethnicity, social class, genetics, intelligence, age, and so forth.
Key Terms They are difficult to measure. Constructs: Constructs are hypothetical concepts or psychological attributes/traits, such as personality, anxiety, depression etc. They are difficult to measure. Constructs are not physical attributes such as height and weight.
*Why do we have Measurement Problems in Psychology?? 1.There is no single universal way of defining psychological construct 2. Psychological measurements are based on samples of behavior 3. Sampling of behavior results in errors in measurement 4.The units (scales) of measurements are not well defined. 5. The measurements must have demonstrated relationship to other variables to have meaning.
Role of Test Theory in Research & Evaluation Selecting a Problem Operational Definitions of Variables Instruments Accuracy of the Instruments Data Collection Use of Statistics Optometrists and Ophthalmologists
Merriam Webster Dictionary and Thesaurus Definition of Short-Sighted 1. Near sighted or Myopia 2. Lacking Foresight 3. Lacking the power of foreseeing 4. Inability to look forward My Operational Definition: 5. person who is able to see near things more clearly than distant ones, needs to wear corrected eyeglasses prescribed (measured) by Ophthalmologist.
The American Heritage Dictionary Definition of Intelligent 1. Having or indicating a high or satisfactory degree of intelligence and mental capacity My Operational Definition of Intelligent: 2. Revealing or reflecting good judgment or sound thought : skillful And is measured by the IQ score from the Stanford-Binet V IQ Test ( in the Method section of the research paper we write about the reliability and validity of this instrument). Or select WAIS or WISC
Statistical Concepts for Test Theory Chapter 2 Statistical Concepts for Test Theory
Population Sample
Population and Sample Population: Population is the set of all individuals of interest for a particular study. Measurements related to Population are PARAMETERS. Sample: Sample is a set of individuals selected from a population. Measurements related to sample are STATISTICS.
Sample The people chosen for a study are its subjects or participants, collectively called a sample The sample must be representative
Statistics Descriptive Inferential Describes the distribution of scores and values such as mean, median, and mode Inferential Infer or draw a conclusion from a sample.
Key Terms Constant I.e. temp in learning and hunger Variable IV manipulate DV measure Discrete Numbers 1, 2 , 3, 14 Continues Numbers 1.3, 3.6
CONTINUOUS VERSUS DISCRETE VARIABLES Discrete variables (categorical) Values are defined by category boundaries E.g., gender Continuous variables Values can range along a continuum E.g., height
Statistics Scales of Measurement Frequency Distributions and Graphs Measures of Central Tendency Standard Deviations and Variances Z Score 1- Pearson Correlations 2- Spearman
Scales of Measurement (NOIR) Nominal Scale Qualities Example What You Can Say What You Can’t Say Assignment of labels Gender— (male or female) Preference— (like or dislike) Voting record—(for or against) Each observation belongs in its own category An observation represents “more” or “less” than another observation
ORDINAL SCALE Rank in college Order of finishing a race Qualities Example What You Can Say What You Can’t Say Assignment of values along some underlying dimension (order) Rank in college Order of finishing a race One observation is ranked above or below another. The amount that one variable is more or less than another
INTERVAL SCALE Number of words spelled correctly on Qualities Example What You Can Say What You Can’t Say Equal distances between points arbitrary zero Number of words spelled correctly on Intelligence test scores Temperature One score differs from another on some measure that has equally appearing intervals The amount of difference is an exact representation of differences of the variable being studied
RATIO SCALE Age Weight Time? Absolute zero Qualities Example What You Can Say What You Can’t Say Meaningful and non-arbitrary zero Absolute zero Age Weight Time? One value is twice as much as another or no quantity of that variable can exist Not much!
LEVELS OF MEASUREMENT Level of Measurement For Example Quality of Level Ratio Rachael is 5’ 10” and Gregory is 5’ 5” Absolute zero Interval Rachael is 5” taller than Gregory An inch is an inch is an inch Ordinal Rachael is taller than Gregory Greater than Nominal Rachael is tall and Gregory is short Different from Variables are measured at one of these four levels Qualities of one level are characteristic of the next level up The more precise (higher) the level of measurement, the more accurate is the measurement process
WHAT IS ALL THE FUSS? Measurement should be as precise as possible In psychology, most variables are probably measured at the nominal or ordinal level But—how a variable is measured can determine the level of precision
Frequency Distributions and Graphs
histogram
*Histogram for Test Scores
Quiz 1. Frequency distributions of test scores are frequently illustrated by which kind of graph? a. a histogram b. a scatterplot c. a pie chart d. a bar graph
Quiz 14. Frequency distributions of test scores are frequently illustrated by which kind of graph? *a. a histogram b. a scatterplot c. a pie chart d. a bar graph
Polygon
Frequency Distributions and Graphs
PERCENTILES When the results of a test for a specific person are presented in terms of Percentiles, we have direct information about that person’s performance relative to a group.
Quartiles and Z-Score
Platykurtic Mesokurtic, Leptokurtic
Frequency Distributions 2, 4, 3, 2, 5, 3, 6, 1, 1, 3, 5, 2, 4, 2 Σƒ=N=14 Ρ=ƒ/N P=Proportion %=P x 100
Frequency Distributions X f fX Ρ=ƒ/N %=P x 100 Cum% 6 1 6 1/14=.07 7% 5 2 4 2 3 3 2 4 1 2
Frequency Distribution Table X f fX P=f/n %= px100 Cumulative % 6 1 1/14=.07 7% 5 2 10 2/14=.14 14% 21% 4 8 35%
How do you Calculate Cumulative Percent ? Add each new individual percent to the running tally of the percentages that came before it. For example, if your dataset consisted of the four numbers: 100, 200, 150, 50 then their individual values, expressed as a percent of the total (in this case 500), are 20%, 40%, 30% and 10%. The cumulative percent would be:1.Proportion 2.percentage 100/500=0.2x100: 20% 200: (i.e. 20% from the step before + 40%)= 60% 150: (i.e. 60% from the step before + 30%)= 90% 50: (i.e. 90% from the step before + 10%) = 100%
Frequency Distributions X=2, f=4, N=14 Ρ=ƒ/N P=4/14=.29 %=P x 100= 29% X=3, f=3, N=14 P=3/14=.21 %= 21% μ=ΣƒX/Σƒ
Measures of Central Tendency Mean--------Interval or Ratio scale The sum of the values divided by the number of values--often called the "average." μ=ΣX/N Add all of the values together. Divide by the total number of values to obtain the mean. Example: X 7 12 24 20 19 ????
Mean The Mean is: μ=ΣX/N= 82/5=16.4 (7 + 12 + 24 + 20 + 19) / 5 = 16.4.
Median Measures of Central Tendency Median or Middle ------Ordinal Scale Divides the values into two equal halves, with half of the values being lower than the median and half higher than the median. Sort the values into ascending order. If you have an odd number of values, the median is the middle value. If you have an even number of values, the median is the arithmetic mean (see above) of the two middle values. Ex: The median of the same five numbers (7, 12, 24, 20, 19) is ???.
Mode The median is 19. Mode ----Nominal Scale The most frequently-occurring value (or values). Calculate the frequencies for all of the values in the data. The mode is the value (or values) with the highest frequency. Example: For individuals having the following ages -- 18, 18, 19, 20, 20, 20, 21, and 23, the mode is ????
CHARACTERISTICS OF MODE Nominal Scale Discrete Variable Describing Shape
The Range The Mode is 20 The Range: The Range is the difference between the highest number –lowest number +1 2, 4, 7, 8, and 10 -> Discrete Numbers 2, 4.6, 7.3, 8.4, and 10 -> Continues Numbers The difference between the upper real limit of the highest number and the lower real limit of the lowest number.
@ Variability
1. Describes the distribution Variability Range, Interquartile Range, Semi-Interquartile Range, Standard Deviation, and Variance are the Measures of Variability Variability is a measure of dispersion or spreading of scores around the mean, and has 2 purposes: 1. Describes the distribution Next slide
Variability 2. How well an individual score or group of scores represents the entire distribution. (i.e. in Z Score) Ex. In inferential statistics we collect information from a small sample then, generalize the results obtained from the sample to the entire population. Next slide
Variability SS, Standard Deviations and Variances X σ² = ss/N Pop 1 σ = √ss/N 2 4 s² = ss/n-1 or ss/df Standard deviation 5 s = √ss/df Sample SS=Σx²-(Σx)²/N Computation SS=Σ( x-μ)² Definition Sum of Squared Deviation from Mean Variance (σ²) is the Mean of Squared Deviations=MS
MEASURES OF VARIABILITY Variability is the degree of dispersion/spreading of scores in a set of scores (data) Standard Deviation—Average difference of each score from mean Variance is the variability/changes of scores in a set of scores (data)
Suppose you earned a score of X = 54 on an exam. Which set of parameters would give you the highest grade? a. μ= 50 and σ= 2 σ²=4 b. μ= 50 and σ= 4 σ²=16 c. μ= 54 and σ= 2 σ²=4 d. μ= 54 and σ= 4 σ²=16
Suppose you earned a score of X = 46 on an exam. Which set of parameters would give you the highest grade? a. μ= 50 and σ= 2 σ²=4 b. μ= 50 and σ= 4 σ²=16 c. μ= 54 and σ= 2 σ²=4 d. μ= 54 and σ= 4 σ²=16
Covariance
Covariance Correlation is based on a statistic called Covariance (Cov xy or S xy) ….. COVxy=SP/N-1 Correlation-- r=sp/√ssx.ssy Covariance is a number that reflects the degree to which 2 variables vary together. Original Data X Y 8 1 1 0 3 6 0 1
Spearman Correlation rank order data then proceed X Y 1 1 2 3 3 2 4 4
Ranking/Monotonic Transformation Score Rank position Final Rank 3 1 1.5 3 2 1.5 5 3 3 6 4 5 6 5 5 6 6 5 12 7 7
Z Scores Z=x-μ/ σ Single score Z=M-μ/ σm Sample Mean for research σm= σ/√n we use Z score when σ is known.
Z-Scores X= σ(Z)+µ µ= X- σZ σ= (X-µ)/Z If X=60 µ=50 σ=5 Z=?
Computations/ Calculations / Collect Data and Compute test Statistics Z Score for a Sample M=115, n=25
Z Score for Research Standard Error (σm )
*Stanines Stanines are used to compare an individual student’s achievement with the results obtained by a national reference sample chosen to represent a certain year level i.e. 2nd level, 3rd level a nine-point scale used for normalized test scores, with 1-3 below average, 4-6 average, and 7-9 above average. It is a nine-point scale of standard score with mean of 5 and SD of 2.
The Correlational Method Correlational data can be graphed and a “line of best fit” can be drawn 1- Pearson Correlations 2-Spearman
The Correlational Method Correlation is the degree to which events or characteristics vary from each other. Measures the strength of a relationship Does not imply cause and effect
The Correlational Method Correlation has 3 characteristics: 1. The Form of the Relationship 2. The Direction of the Relationship 3. The strength or Consistency of the Relationship
1. The Form of the Relationship The most common use of correlation is to measure straight-line (linear form) relationship. However, other forms of relationships do exist and there are special correlations used to measure them.
2. The Direction of the Relationship Correlational data can be graphed and a “line of best fit” can be drawn
Positive correlation = variables change in the same direction
Positive Correlation
Negative correlation = variables change in the opposite direction
Negative Correlation
Unrelated = No consistent relationship No Correlation Unrelated = No consistent relationship
No Correlation
The Correlational Method The magnitude (strength) of a correlation is also important High magnitude = variables which vary closely together; fall close to the line of best fit Low magnitude = variables which do not vary as closely together; loosely scattered around the line of best fit
3. The strength or Consistency of the Relationship Direction and magnitude of a correlation are often calculated statistically Called the “Correlation Coefficient,” symbolized by the letter “r” Sign (+ or -) indicates direction Number (from 0.00 to 1.00) indicates magnitude 0.00 = no consistent relationship +1.00 = perfect positive correlation -1.00 = perfect negative correlation Most correlations found in psychological research fall far short of “perfect”
The Correlational Method Correlations can be trusted based on statistical probability “Statistical significance” means that the finding is unlikely to have occurred by chance By convention/agreement, if there is less than a 5% probability that findings are due to chance or (p < 0.05), results are considered “significant,” and thought to reflect the larger population Generally, confidence increases with the size of the sample (n) and the magnitude of the correlation (r)
The Correlational Method Advantages of correlational studies: Have high external validity Can generalize findings Can repeat (replicate) studies on other samples Difficulties with correlational studies: Lack internal validity Results describe but do not explain a relationship
External & Internal Validity *External Validity External validity addresses the ability to generalize your study to other people and other situations. *Internal Validity Internal validity addresses the "true" causes of the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables BUT a strong justification that causally links your independent variables (IV) to your dependent variables (DV).
The Correlational Method Pearson r=sp/√ssx.ssy Original Data X Y 1 3 2 6 4 4 5 7 SP requires 2 sets of data SS requires only one set of data
The Correlational Method Spearman r=sp/√ssx.ssy Original Data Ranks X Y X Y 1 3 1 1 2 6 2 3 4 4 3 2 5 7 4 4 SP requires 2 sets of data SS requires only one set of data
Regression and Prediction Y=bX+a Regression Line e
Three Levels of Analysis for Prediction/Validity INPUTS PROCESSES OUTCOMES Ex. Stress (INPUT) is an unpleasant psychological (PROCESS) that occurs in response to environmental pressures (job) and can lead to withdrawal/quit job (OUTCOME).
prognosis
Please read chapter 3 and 4 for the next week