Unit 5: Improving and Assessing the Quality of Behavioral Measurement

Unit 5: Improving and Assessing the Quality of Behavioral Measurement
PS 522: Behavioral Measures and Interpretation of Data Lisa R. Jackson, Ph.D.

Indicators of Trustworthy Measurement
Validity Directly measures a socially significant behavior Measures a dimension of the behavior relevant to the question Helps you determine that you are testing what you think you are testing

Indicators of Trustworthy Measurement
Accuracy Observed values match the true values of an event Reliability Measurement yields the same values across repeated measurement of the same event This is a measure of consistency, stability Does it reliably produce the same results over time? Is your assessment reliable within itself? If you use different versions, are they all equally reliable?

Types of Validity: Face Validity
This is all about how the tests looks, not what it really measures. Does the test measure what the test items suggest it will? High face validity can increase test-taker’s confidence, interest and motivation in the test. Low face validity could disguise the real purpose of the test to decrease self-report bias Good for measuring something people don’t want to admit to It is the least scientifically accurate measure of validity

Types of Validity: Content Validity
How well the test samples behavior representative of the whole universe of behavior it is designed to sample. Ex: A test of depression needs to measure thoughts, emotions, motivation and behavior, not just mood. An assessment for autism cannot just assess language skills Affected by who writes the test

Types of Validity: Criterion Related
This is an external measure – how well does the test measure up against another standard? How well can a score on this test be used to infer an individual’s likely standing on the criterion? Concurrent: How well is a score on this test correlated to an individual’s likely standing on a criterion of interest in the present? Ex: if you test low on one depression test, do you also test low on the Beck Depression Inventory? Predictive: How well is a score on this test correlated to an individual’s likely standing on a criterion of interest in the future? Ex: Does SAT predict college achievement?

Types of Validity: Construct Validity
How appropriate are conclusions drawn from test scores regarding an individual’s standing on a construct? Does this test really assess the construct? This takes time to determine. Ex: Over time, the Beck Depression Inventory has acquired construct validity. Most experts agree that it tests for depression.

Threats to Measurement Validity
Indirect measurement Measuring a behavior other than the behavior of interest Example: Using children’s responses to a questionnaire as a measure of how often and how well they get along with their classmates Measuring a dimension that is irrelevant or ill suited to the reason for measuring behavior Example: Using a ruler in a pot of water to measure temperature Trying to measure reading endurance in oral reading by counting the number of correct and incorrect words read, but not counting how long the student read

Types of Reliability: Internal Consistency
This one is really all about the questions, not the test. The goal is to see if the questions are consistent with one another. Do test items reliably produce the same results? Do all of the questions measure the same idea?

Types of Reliability: Split Half
Divide a test in half equally Give one half to one group, the other half to an equal group The goal is to judge the consistency of the test Are the 2 halves strongly correlated to each other? If they are, the test is reliable. One way is to use odd/even reliability: questions 1,3,5,7 on one half, 2,4,6,8 on other Use when impractical to have two tests or two administrations; saves time and expense

Types of Reliability: Test/Retest
Using the same instrument to measure the same thing at two different points in time. The goal is to determine the consistency of the test over time Does it produce similar results each time? If so, the test is reliable Good for a stable construct such as personality Won’t work to assess a reading test if subjects improve in reading between administrations

Types of Reliability: Parallel and Alternate Forms
A make-up test is an example You might not want to give the same one Minimizes memory effects if administering the same test to the same person The person won’t have memorized answers from testing to testing The mean on each form must be equivalent to the original Can be time consuming, expensive

Assessing the Reliability of Measurement
Measurement is reliable when it yields the same values across repeated measures of the same event Not the same as accuracy Reliable application of measurement system is important Requires permanent products for re-measurement Low reliability signals suspect data

Using Interobserver Agreement to Assess Behavioral Measurement
The degree to which two or more independent observers report the same values for the same events Determine competence of new observers Detect observer drift Judge clarity of definitions and system Increase believability of data

Requisites for IOA Observers must:
Use the same observation code and measurement system Observe and measure the same participants and events Observe and record independently of one another

Methods for Calculating IOA
Percentage of agreement is most common way to calculate Event Recording methods compare: Total count recorded by each observer Mean count-per-interval Exact count-per-interval Trial-by-trial

Methods for Calculating IOA
Timing recording methods: Total duration IOA Mean duration-per-occurrence IOA Latency-per-response Mean IRT-per-response Interval recording and Time sampling: Interval-by-interval IOA (Point by point) Scored-interval IOA Unscored-interval IOA

Considerations in IOA Obtain and report IOA at the same levels at which researchers will report and discuss in study results For each behavior For each participant In each phase of intervention or baseline

Considerations in IOA Believability of data increases as agreement approaches 100% History of using 80% agreement as acceptable benchmark Depends upon the complexity of the measurement system

Considerations in IOA Reporting IOA
Narrative form Table Graphs In all formats, report how, when, and how often IOA was assessed

Assessing the Accuracy and Reliability of Behavioral Measurement
First, design a good measurement system Second, train observers carefully Third, evaluate extent to which data are accurate and reliable Measure the measurement system Accuracy means the observed values match the true values of an event You can’t base research conclusions or treatment decisions on faulty data

Assessing the Accuracy of Measurement
Four purposes of accuracy assessment: Determine if data are good enough to make decisions Discovery and correction of measurement errors Reveal consistent patterns of measurement error Assure consumers that data are accurate

Accuracy Assessment Procedures
Measurement is accurate when observed values match true values Accuracy determined by calculating correspondence of each data point with its true value Process for determining true value must differ from measurement procedures Accuracy assessment should be reported in research

Threats to Measurement Accuracy and Reliability
Inadequate observer training Explicit and systematic Careful selection Train to competency standard On-going training to minimize observer drift

Threats to Measurement Accuracy and Reliability
Unintended influences on observers Observer expectations of what the data should look like Observer reactivity when she/he is aware that others are evaluating the data Measurement bias Feedback to observers about how their data relates to the goals of intervention

Final Points The data that you gather is used to help improve the lives of real people Others may use your data as the basis of intervention Measurements need to be accurate, consistent and relevant

Questions?? Thanks for participating!
I am sure you have been asking questions here in seminar!  Great job! But, if you have more, me: These slides are posted in the Doc Sharing area for your review.

Unit 5: Improving and Assessing the Quality of Behavioral Measurement

Similar presentations

Presentation on theme: "Unit 5: Improving and Assessing the Quality of Behavioral Measurement"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unit 5: Improving and Assessing the Quality of Behavioral Measurement

Similar presentations

Presentation on theme: "Unit 5: Improving and Assessing the Quality of Behavioral Measurement"— Presentation transcript:

Similar presentations

About project

Feedback