Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Similar presentations


Presentation on theme: "Unit 2: Test Worthiness and Making Meaning out of Raw Scores"— Presentation transcript:

1 Unit 2: Test Worthiness and Making Meaning out of Raw Scores
Common Assessment Instruments for Today’s World

2 Test Worthiness: What Does it Take
Four requirements of test worthiness: Validity: measures what it is supposed to Reliability: Score is an accurate measure of his/her true score Cross-Cultural Fairness: Test is true reflection of the individual & not a function of cultural bias inherent in test Practicality: Test is appropriate for situation

3 Correlation Coefficient
Correlation Coefficient: Relationship between two sets of test scores. Range from -1.0 to +1.0 Positive Correlation: Tendency for scores to be related in the same direction Negative Correlation: Tendency for scores to trend toward opposite direction (inverse)

4 Strong Correlation (Relationship)
Indication of Strong Relationship: -1.0 and +1.0 indicates strong relationship Weak or No Relationship: 0 Scatterplot: Graph showing two or more sets of test scores Positive correlation: Diagonal line rises from left to right Negative correlation: Diagonal line rises from right to left

5 Scatterplot: Positive Correlation
Negative Correlation

6 Scatterplot: Weak or No Correlation

7 Coefficient of Determination: Shared Variance
Coefficient of Determination: Common factors that account for a relationship. Correlation Coefficient² Example: On tests of depression & anxiety, a .85 correlation was found in these two tests. Square .85: .85 x .85 = .7225 .7225 x 100 = or 72% This shows that anxiety & depression share a large number of factors - but not all factors.

8 Test Worthiness: Validity
Validity: The degree to which a test measures what it’s supposed to measure Forms of Validity: Content Validity Criterion-related Validity Concurrent Validity Predictive Validity Construct Validity Experimental Design Validity Convergent Validity Discriminant Validity

9 Validity: Content Validity
Content Validity: The content of the test is appropriate for what the test intends to measure Face Validity: The superficial appearance of the test. A valid test may or may not have face validity. *Face validity is not a true measure of validity

10 Validity: Criterion-related Validity
Criterion-related Validity: Relationship between test scores and another standard Concurrent Validity: Relationship between test scores & another currently obtainable benchmark Predictive Validity: Relationship between test scores & a future standard Standard Error of Estimate: Range where a predicted score might lie False Positive: A test incorrectly predicts a test- taker will have an attribute or be successful False Negative: A test incorrectly predicts a test- taker will not have an attribute or be successful

11 Validity: Construct Validity
Construct Validity: Evidence that an idea or concept is actually being measured by the test (Is the test for intelligence truly measuring intelligence?) Evidence used to measure construct validity: a) Experimental design: Using experimentation to show that a test measures a concept b) Factor analysis: Statistically examining relationship between subscales and larger construct (between individual subject areas and the test as a whole)

12 Validity: Construct Validity
Convergent Validity: Relationship between a test and other similar tests (highly correlated - say .75 range) Discriminant Validity: Showing a lack of relationship between a test and tests of unrelated concepts (test between depression and anxiety)

13 Reliability Reliability: The degree to which test scores
are free from errors of measurement “Perfect world” scenario: Test is well-made, the environment is optimal, & the test taker is at his/her best Reliability Coefficient: Are test scores consistent and dependable?

14 Reliability: Measuring Reliability
Test-retest Reliability: Relationship between test scores from one test given at two different administrations to the same people The closer the two sets of scores, the more reliable the test Test-retest reliability is more effective in areas that are less likely to change over time

15 Reliability: Measuring Reliability
Alternate Forms Reliability: Relationship between scores from two similar versions of the same test Examiner designs alternate, parallel, or equivalent forms of the original test and administers this alternate form as the second test One of the problems is to insure that both tests are truly equal

16 Reliability: Internal Consistency
Internal Consistency: Reliability measured statistically by going “within” the test (how scores on individual items relate to each other or the test as a whole) Types of Internal Consistency: 1) Split-half (odd-even) 2) Cronbach’s Coefficient Alpha 3) Kuder-Richardson

17 Reliability: Internal Consistency
Split-half Reliability: Correlating one half of a test against the other half Advantages of Split-half: 1) Having to give only one test 2) Not having to create a separate alternate form Disadvantages of Split-half: 1) False reliability if two halves are not parallel or equivalent 2) Make test half as long (shortening test may decrease reliability

18 Reliability: Internal Consistency
Spearman-Brown Equation: Mathematical compensation for shortening the number of correlations by using the split-half reliability test Spearman-Brown Equation: Spearman = Brown reliability = 2ʳʰʰ 1 + ʳʰʰ - Where ʳʰʰ is the split-half reliability estimate *If a test manual states that split-half was used, check to see if the Spearman-Brown formula was used. If not, the test may be more reliable than is noted.

19 Reliability: Internal Consistency
Cronbach’s Coefficient Alpha and Kuder-Richardson: Methods that attempt to estimate the reliability of all the possible split-half combinations by correlating the scores for each item on the test with the total score on the test and finding the average correlation for all of the items Kuder-Richardson can only be used with tests that have right and wrong answers (achievement) Coefficient Alpha can be used with tests with various types of responses (rating scales)

20 Reliability: Item Response Theory
Item Response Theory: Examines each item individually for its ability to measure the trait being examined Item Characteristic Curve: Assumes that as people’s abilities increase, their probability of answering an item correctly increases

21 Reliability: Item Characteristic Curve
If “S” flattens out: Less ability to discriminate or provide a range of probabilities of getting a correct or incorrect response If “S” is tall: Item is creating strong differentiation across ability 1.0 .75 Probability of Correct Answer .50 .25 0.0 55 70 85 100 115 130 145 IQ Ability

22 Cross-cultural Fairness
Cross-cultural Fairness: Degree to which cultural background, class, disability, and gender do not affect test results Tests must be carefully selected to prevent bias Test scores must be interpreted in light of the cultural, ethnic, disability, or linguistic factors that may impact scores

23 Practicality Practicality: Feasibility considerations in
test selection and administration Major Practical Concerns: 1) Time: Amount of time to administer 2) Cost: Budgeting issues 3) Format: Print, type of questions 4) Readability: Understandability 5) Ease of Administration, Scoring, & Interpretation

24 Selecting & Administering a Good Test
1) Determine goals of your client 2) Choose instrument to reach client goals 3) Access information about possible instruments a) Source books on testing 1) Buros Mental Measurements Yearbook 2) Tests in Print 4) Examine Validity, Reliability, Cross-cultural Fairness, & Practicality of the Possible Instruments 5) Choose an Instrument Wisely

25 Unit 2: Statistical Concepts
Making Meaning Out of Raw Scores

26 Raw Scores are Meaningless
Raw Scores: Untreated score before manipulation or processing Norm Group Comparisons Are Helpful: 1) Tells us relative position within the norm group 2) Allows us to compare the results among test- takers 3) Allows us to compare test results on two or more different tests taken by same person

27 Procedures for Normative Comparisons
Frequency Distribution: List of scores & number of times a score occurred Orders a set of scores from highest to lowest & lists corresponding frequency of each score Allows identification of most frequent scores and helps identify where an individual’s score falls relative to the rest of the group

28 Histograms & Frequency Polygons
Histogram: Bar graph of class intervals & frequency of a set of scores Class Intervals: Grouping scores by a pre-determined range Frequency Polygon: Line graph of class intervals & frequency of a set of scores

29 Cumulative Distributions (Ogive Curve)
Cumulative Distribution: Line graph to examine percentile rank of a set of scores Applications: Good for conveying information about percentile rank

30 Normal Curves & Skewed Curves
Normal Curve: Bell-shaped curve that human traits tend to fall along Predictable pattern that occurs whenever we measure human traits and abilities Skewed Curves: Test scores that do not fall along a normal curve Negatively Skewed Curve: Majority of scores at the upper end Positively Skewed Curve: Majority of scores at the lower end

31 Measures of Central Tendency
Central Tendency: Give you a sense of how close a score is to the middle of the distribution Three Measures of Central Tendency: 1) Mean: Arithmetic average of all scores: add all scores and divide by # of scores 2) Median: Middle score: 50% fall above; 50% fall below 3) Mode: Most frequently occurring score *In a skewed distribution, median is a better measure of central tendency.

32 Measures of Variability
Measures of Variability: How much scores vary in a distribution Three Measures of Variability: 1) Range: Difference between highest & lowest score plus 1 2) Interquartile Range: Middle 50% of scores around the median 3) Standard Deviation: How scores vary from the mean

33 Measures of Variability: Range
Range: Tells you the distance from the highest to lowest score Calculated by subtracting the lowest score from the highest score and adding 1

34 Measures of Variability: Interquartile Range
Interquartile Range: Provides the range of the middle 50% of scores around the median Useful with skewed curves because it offers a more representative picture of where a large percentage of scores fall Calculate: Subtract the score that is 1/4 of the way from the bottom from the score that is 3/4 of the way from the bottom & divide by 2. Next, add & subtract this number to the median

35 Measures of Variability: Standard Deviation
Standard Deviation: Describes how scores vary from the mean In all normal curves, the percentage of scores between standard deviation units is the same 99.5% of people fall within the first three standard deviations *Adequate scores are in the “eye of the beholder”

36 Common Assessments: Situation Specific
Developmental Disabilities: Impairment in Cognitive, Communication, Social/Emotional, & Adaptive (daily living skills) Functioning Assessments Used: 1) Bayley Scales of Infant Development 2) Wechsler Preschool & Primary Scales of Intelligence, 3rd Edition 3) Wechsler Intelligence Scale for Children, 4th Ed. 4) Autism Diagnostic Observation Scale 5) Vineland Adaptive Behavior Scale, 2nd Ed.

37 Common Assessments: Situation Specific
Learning Disabilities: Disorders that affect a broad range of academic & functional skills, i.e., speaking, listening, reading, writing, spelling, & completing math calculations. Deficit in one or more ways the brain processes information Assessments 1) Wechsler Preschool & Primary Scale of Intelligence 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Wechsler Individual Achievement Test, 2nd Ed.

38 Learning Disabilities Assessments, Continued
5) Wechsler Memory Scale, 3rd Ed. 6) Woodcock-Johnson Test of Achievement, 3rd Ed. 7) Comprehensive Test of Phonological Processing 8) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 9) Beck Depression Inventory, 2nd Ed. 10) Beck Anxiety Inventory

39 Common Assessments: Situation Specific
Attention Deficit/Hyperactivity Disorder 1) Wechsler Intelligence Scale for Children, 4th Ed. 2) Processing Speed Index 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Woodcock-Jackson Test of Achievement, 3rd Ed. 5) Understanding Directions Subset 6) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 7) Behavior Assessment System for Children, 2nd Ed. (Parent report, Teacher report, Self-report)

40 Common Assessments: Situation Specific
Gifted and Talented Evaluation: Individuals who are so gifted or advanced, they need special provisions to meet their educational needs Assessments 1) Wechsler Preschool & Primary Scale of Intelligence (3rd Ed.) 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed.


Download ppt "Unit 2: Test Worthiness and Making Meaning out of Raw Scores"

Similar presentations


Ads by Google