Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Unit 2: Test Worthiness and Making Meaning out of Raw Scores
Common Assessment Instruments for Today’s World

Test Worthiness: What Does it Take
Four requirements of test worthiness: Validity: measures what it is supposed to Reliability: Score is an accurate measure of his/her true score Cross-Cultural Fairness: Test is true reflection of the individual & not a function of cultural bias inherent in test Practicality: Test is appropriate for situation

Correlation Coefficient
Correlation Coefficient: Relationship between two sets of test scores. Range from -1.0 to +1.0 Positive Correlation: Tendency for scores to be related in the same direction Negative Correlation: Tendency for scores to trend toward opposite direction (inverse)

Strong Correlation (Relationship)
Indication of Strong Relationship: -1.0 and +1.0 indicates strong relationship Weak or No Relationship: 0 Scatterplot: Graph showing two or more sets of test scores Positive correlation: Diagonal line rises from left to right Negative correlation: Diagonal line rises from right to left

Scatterplot: Positive Correlation
Negative Correlation

Scatterplot: Weak or No Correlation

Coefficient of Determination: Shared Variance
Coefficient of Determination: Common factors that account for a relationship. Correlation Coefficient² Example: On tests of depression & anxiety, a .85 correlation was found in these two tests. Square .85: .85 x .85 = .7225 .7225 x 100 = or 72% This shows that anxiety & depression share a large number of factors - but not all factors.

Test Worthiness: Validity
Validity: The degree to which a test measures what it’s supposed to measure Forms of Validity: Content Validity Criterion-related Validity Concurrent Validity Predictive Validity Construct Validity Experimental Design Validity Convergent Validity Discriminant Validity

Validity: Content Validity
Content Validity: The content of the test is appropriate for what the test intends to measure Face Validity: The superficial appearance of the test. A valid test may or may not have face validity. *Face validity is not a true measure of validity

Validity: Criterion-related Validity
Criterion-related Validity: Relationship between test scores and another standard Concurrent Validity: Relationship between test scores & another currently obtainable benchmark Predictive Validity: Relationship between test scores & a future standard Standard Error of Estimate: Range where a predicted score might lie False Positive: A test incorrectly predicts a test- taker will have an attribute or be successful False Negative: A test incorrectly predicts a test- taker will not have an attribute or be successful

Validity: Construct Validity
Construct Validity: Evidence that an idea or concept is actually being measured by the test (Is the test for intelligence truly measuring intelligence?) Evidence used to measure construct validity: a) Experimental design: Using experimentation to show that a test measures a concept b) Factor analysis: Statistically examining relationship between subscales and larger construct (between individual subject areas and the test as a whole)

Validity: Construct Validity
Convergent Validity: Relationship between a test and other similar tests (highly correlated - say .75 range) Discriminant Validity: Showing a lack of relationship between a test and tests of unrelated concepts (test between depression and anxiety)

Reliability Reliability: The degree to which test scores
are free from errors of measurement “Perfect world” scenario: Test is well-made, the environment is optimal, & the test taker is at his/her best Reliability Coefficient: Are test scores consistent and dependable?

Reliability: Measuring Reliability
Test-retest Reliability: Relationship between test scores from one test given at two different administrations to the same people The closer the two sets of scores, the more reliable the test Test-retest reliability is more effective in areas that are less likely to change over time

Reliability: Measuring Reliability
Alternate Forms Reliability: Relationship between scores from two similar versions of the same test Examiner designs alternate, parallel, or equivalent forms of the original test and administers this alternate form as the second test One of the problems is to insure that both tests are truly equal

Reliability: Internal Consistency
Internal Consistency: Reliability measured statistically by going “within” the test (how scores on individual items relate to each other or the test as a whole) Types of Internal Consistency: 1) Split-half (odd-even) 2) Cronbach’s Coefficient Alpha 3) Kuder-Richardson

Split-half Reliability: Correlating one half of a test against the other half Advantages of Split-half: 1) Having to give only one test 2) Not having to create a separate alternate form Disadvantages of Split-half: 1) False reliability if two halves are not parallel or equivalent 2) Make test half as long (shortening test may decrease reliability

Spearman-Brown Equation: Mathematical compensation for shortening the number of correlations by using the split-half reliability test Spearman-Brown Equation: Spearman = Brown reliability = 2ʳʰʰ 1 + ʳʰʰ - Where ʳʰʰ is the split-half reliability estimate *If a test manual states that split-half was used, check to see if the Spearman-Brown formula was used. If not, the test may be more reliable than is noted.

Cronbach’s Coefficient Alpha and Kuder-Richardson: Methods that attempt to estimate the reliability of all the possible split-half combinations by correlating the scores for each item on the test with the total score on the test and finding the average correlation for all of the items Kuder-Richardson can only be used with tests that have right and wrong answers (achievement) Coefficient Alpha can be used with tests with various types of responses (rating scales)

Reliability: Item Response Theory
Item Response Theory: Examines each item individually for its ability to measure the trait being examined Item Characteristic Curve: Assumes that as people’s abilities increase, their probability of answering an item correctly increases

Reliability: Item Characteristic Curve
If “S” flattens out: Less ability to discriminate or provide a range of probabilities of getting a correct or incorrect response If “S” is tall: Item is creating strong differentiation across ability 1.0 .75 Probability of Correct Answer .50 .25 0.0 55 70 85 100 115 130 145 IQ Ability

Cross-cultural Fairness
Cross-cultural Fairness: Degree to which cultural background, class, disability, and gender do not affect test results Tests must be carefully selected to prevent bias Test scores must be interpreted in light of the cultural, ethnic, disability, or linguistic factors that may impact scores

Practicality Practicality: Feasibility considerations in
test selection and administration Major Practical Concerns: 1) Time: Amount of time to administer 2) Cost: Budgeting issues 3) Format: Print, type of questions 4) Readability: Understandability 5) Ease of Administration, Scoring, & Interpretation

Selecting & Administering a Good Test
1) Determine goals of your client 2) Choose instrument to reach client goals 3) Access information about possible instruments a) Source books on testing 1) Buros Mental Measurements Yearbook 2) Tests in Print 4) Examine Validity, Reliability, Cross-cultural Fairness, & Practicality of the Possible Instruments 5) Choose an Instrument Wisely

Unit 2: Statistical Concepts
Making Meaning Out of Raw Scores

Raw Scores are Meaningless
Raw Scores: Untreated score before manipulation or processing Norm Group Comparisons Are Helpful: 1) Tells us relative position within the norm group 2) Allows us to compare the results among test- takers 3) Allows us to compare test results on two or more different tests taken by same person

Procedures for Normative Comparisons
Frequency Distribution: List of scores & number of times a score occurred Orders a set of scores from highest to lowest & lists corresponding frequency of each score Allows identification of most frequent scores and helps identify where an individual’s score falls relative to the rest of the group

Histograms & Frequency Polygons
Histogram: Bar graph of class intervals & frequency of a set of scores Class Intervals: Grouping scores by a pre-determined range Frequency Polygon: Line graph of class intervals & frequency of a set of scores

Cumulative Distributions (Ogive Curve)
Cumulative Distribution: Line graph to examine percentile rank of a set of scores Applications: Good for conveying information about percentile rank

Normal Curves & Skewed Curves
Normal Curve: Bell-shaped curve that human traits tend to fall along Predictable pattern that occurs whenever we measure human traits and abilities Skewed Curves: Test scores that do not fall along a normal curve Negatively Skewed Curve: Majority of scores at the upper end Positively Skewed Curve: Majority of scores at the lower end

Measures of Central Tendency
Central Tendency: Give you a sense of how close a score is to the middle of the distribution Three Measures of Central Tendency: 1) Mean: Arithmetic average of all scores: add all scores and divide by # of scores 2) Median: Middle score: 50% fall above; 50% fall below 3) Mode: Most frequently occurring score *In a skewed distribution, median is a better measure of central tendency.

Measures of Variability
Measures of Variability: How much scores vary in a distribution Three Measures of Variability: 1) Range: Difference between highest & lowest score plus 1 2) Interquartile Range: Middle 50% of scores around the median 3) Standard Deviation: How scores vary from the mean

Measures of Variability: Range
Range: Tells you the distance from the highest to lowest score Calculated by subtracting the lowest score from the highest score and adding 1

Measures of Variability: Interquartile Range
Interquartile Range: Provides the range of the middle 50% of scores around the median Useful with skewed curves because it offers a more representative picture of where a large percentage of scores fall Calculate: Subtract the score that is 1/4 of the way from the bottom from the score that is 3/4 of the way from the bottom & divide by 2. Next, add & subtract this number to the median

Measures of Variability: Standard Deviation
Standard Deviation: Describes how scores vary from the mean In all normal curves, the percentage of scores between standard deviation units is the same 99.5% of people fall within the first three standard deviations *Adequate scores are in the “eye of the beholder”

Common Assessments: Situation Specific
Developmental Disabilities: Impairment in Cognitive, Communication, Social/Emotional, & Adaptive (daily living skills) Functioning Assessments Used: 1) Bayley Scales of Infant Development 2) Wechsler Preschool & Primary Scales of Intelligence, 3rd Edition 3) Wechsler Intelligence Scale for Children, 4th Ed. 4) Autism Diagnostic Observation Scale 5) Vineland Adaptive Behavior Scale, 2nd Ed.

Learning Disabilities: Disorders that affect a broad range of academic & functional skills, i.e., speaking, listening, reading, writing, spelling, & completing math calculations. Deficit in one or more ways the brain processes information Assessments 1) Wechsler Preschool & Primary Scale of Intelligence 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Wechsler Individual Achievement Test, 2nd Ed.

Learning Disabilities Assessments, Continued
5) Wechsler Memory Scale, 3rd Ed. 6) Woodcock-Johnson Test of Achievement, 3rd Ed. 7) Comprehensive Test of Phonological Processing 8) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 9) Beck Depression Inventory, 2nd Ed. 10) Beck Anxiety Inventory

Attention Deficit/Hyperactivity Disorder 1) Wechsler Intelligence Scale for Children, 4th Ed. 2) Processing Speed Index 3) Wechsler Adult Intelligence Scale, 3rd Ed. 4) Woodcock-Jackson Test of Achievement, 3rd Ed. 5) Understanding Directions Subset 6) Attention Deficit Disorder Evaluation Scale (Home, Self-report, & School version) 7) Behavior Assessment System for Children, 2nd Ed. (Parent report, Teacher report, Self-report)

Gifted and Talented Evaluation: Individuals who are so gifted or advanced, they need special provisions to meet their educational needs Assessments 1) Wechsler Preschool & Primary Scale of Intelligence (3rd Ed.) 2) Wechsler Intelligence Scale for Children, 4th Ed. 3) Wechsler Adult Intelligence Scale, 3rd Ed.

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Similar presentations

Presentation on theme: "Unit 2: Test Worthiness and Making Meaning out of Raw Scores"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 2: Test Worthiness and Making Meaning out of Raw Scores

Similar presentations

Presentation on theme: "Unit 2: Test Worthiness and Making Meaning out of Raw Scores"— Presentation transcript:

Similar presentations

About project

Feedback