Assessment Theory and Models Part II

Assessment Theory and Models Part II
Red Book Ch. 2

Test Structure Norm-referenced tests
Based on Generalizability Theory to an extent Samples client’s attitudes or functional ability and compares the scores against the scores received by the general population. SATs, GREs Cardio-vascular endurance Pulse rate Strength Endurance Breathing rate

Test Structure Criterion-References Test
Sample client’s attitudes or functional abilities and then compare the scores (or performance) to the actual task or attitude and not necessarily to the scores of others. Driver’s Ed test

Test Structure Table 2.8 (pg. 27)
Comparison of Criterion-Referenced vs Norm Referenced Tests

Statistics Biometric Psychometric The measurement of organisms (bio)
The measurement of behaviors and thought processes

Statistics Three Principles related to reliability and validity
It is incorrect to state that a testing tool is “valid” or “reliable” It is not Yes or NO Reliability and Validity are stated as being the degree to which the test is reliable or valid A testing tool may have outstanding reliability and validity but still be inappropriate Validity and Reliability are interconnected There’s a good chance if a test has good validity, it will also have good reliability BUT, the opposite is NOT true...just because a test has good reliability, doesn’t mean it has good validity.

Measuring Validity Gather information empirically
To gather information through observation NOT through reasoning or “reading between the lines” Write things operationally To describe the action or process in such a way that it can be observed and measured

Measuring Validity Content Validity
Tells us how well the test measures the scope of the subject matter and behavior under consideration. Does it appropriately measure the whole of what we need to measure?

Measuring Validity Criterion-Related Validity
How well the test scores compare to what is being measured To measure criterion-related validity we need to compare our measurements with another way of measuring the same thing to see if we come up with the same results. Compare client’s scores on the new tool to those of an established, standardized tool that measures the same thing Difficult due to lack of standardized tools that have endured the appropriate statistical testing.

Measuring Validity Construct Validity
How well we have operationalized the different elements of our content so that they can be measured accurately. Would ask if we selected the right way to measure the content and criterion information Example Test content intended to measure how well an individual can socialize with his/her peers

Measuring Validity Clinical Validity
Measures how well test results can be used to predict future performance and health care outcomes. To do this we compare the measured level of performance as baseline then determine the possible meanings of the outcomes of treatment by reviewing clients’ performance over time.

Measuring Validity Sampling Techniques and Validity
When selecting tests to use, its important for the CTRS to understand the characteristics of the people who were Used to help measure the validity of the theory used to create the test Used to define the scoring thresholds (criterion-referenced) or percentiles (norm-referenced)

Measuring Validity Sampling Techniques and Validity
Non-probability Sampling Use of groups that do not represent the general population mix Assume this one Probability Sampling Use of a sample in which there is an equal chance that any one member of the population could be selected. Very few of the testing tools used by CTRSs are based on probability sampling

Determining Reliability
In statistics, the term reliability addresses the quality of performance of the testing tool where validity addresses the quality of performance of the theory or concept that the test is based upon. Reliability Coefficient = the degree to which a testing tool is able to demonstrate consistency

Test-Retest Reliability One of the most common methods to measure reliability Makes sure a client’s scores do not change if there was no actual change in what was being measured Same test, given twice, to the same group with a time interval of a few minutes to a few years

Inter-rater Reliability Different therapists come up with the same findings when they observe the same situation Generally requires two things: That the test is written so that multiple professionals interpret performance exactly the same way That the professionals administering and interpreting the test have been trained to always follow the same protocols and rating systems.

Equivalent Forms Reliability AKA Alternate Form Reliability Two forms of the same test given to the same group in close succession Scores from the two forms are then compared Uses the coefficient of equivalence

Statistical methods developed so that a test’s consistency can be measured after only one administration Kinder-Richardson Reliability Alpha Reliability Split-half Reliability Measures the internal consistency of a testing tool Given only one time to a group and the formula is applied to the scores on the test

Alpha Reliability AKA Cronbach’s Alpha Coefficient Indicates the degree to which the individual items in a test relate to one another Split-half Reliability Test given once Two equivalent halves scored Scores compared using the Spearman-Brown formula Measures the coefficient of internal consistency Works best on tests that are not criterion-referenced

Fairness Removing bias or stereotypes from the assessment that might significantly lower the scores of an individual or a subgroup Ex. SATs English/Reading section

Correlation Coefficients
A correlation is a relationship between two different things so that when there is a chance in one, there is a predictable change in the second Positive or negative Perfect scores are +1.0 or -1.0

Factor Analysis A type of correlation coefficient that helps the developers of testing tools tell if individual items within the test relate to each other, either with a positive or negative correlation. Therapists should look for tests with subscales to have subscale reliabilities of at least .70 (preferably .80 or better)

Bias Bias is an error in measurement. Response Bias Social desirability Absolute Threshold Bias Occurs when the tester must rely on the client’s impression of when a threshold has been passed, instead of directly being able to observe a client passing a threshold Pancakes Uphill hiking

Assessment Theory and Models Part II

Similar presentations

Presentation on theme: "Assessment Theory and Models Part II"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Assessment Theory and Models Part II

Similar presentations

Presentation on theme: "Assessment Theory and Models Part II"— Presentation transcript:

Similar presentations

About project

Feedback