Validity and Reliability in Instrumentation

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Conceptualization and Measurement
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
Reliability and Validity
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
VALIDITY AND RELIABILITY
Reliability, the Properties of Random Errors, and Composite Scores.
RESEARCH METHODS Lecture 18
Validity, Sampling & Experimental Control Psych 231: Research Methods in Psychology.
Concept of Measurement
Beginning the Research Design
47.269: Research I: The Basics Dr. Leonard Spring 2010
SOWK 6003 Social Work Research Week 4 Research process, variables, hypothesis, and research designs By Dr. Paul Wong.
SOWK 6003 Social Work Research Week 4 Research process, variables, hypothesis, and research designs By Dr. Paul Wong.
Psych 231: Research Methods in Psychology
Variables cont. Psych 231: Research Methods in Psychology.
Validity, Reliability, & Sampling
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Validity and Reliability
Reliability, Validity, & Scaling
Experimental Research
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Instrumentation.
Foundations of Educational Measurement
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Final Study Guide Research Design. Experimental Research.
User Study Evaluation Human-Computer Interaction.
Review for Exam 1 PowerPoint lectures: Introduction Psychology as a science, other ways of knowing, pseudoscience, ethics, theory, and literature review.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Psychology 290 Lab #2 Sept. 26 – 28 Types & Parts of Articles Operational Definition Variables Reliability & Validity.
The Basics of Experimentation Ch7 – Reliability and Validity.
EDU 8603 Day 6. What do the following numbers mean?
Measurement Validity.
Psychology 3051 Psychology 305A: Theories of Personality Lecture 2 1.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Advanced Research Methods Unit 3 Reliability and Validity.
Experiment Basics: Variables Psych 231: Research Methods in Psychology.
Research: Conceptualization and Measurement Conceptualization Steps in measuring a variable Operational definitions Confounding Criteria for measurement.
Reliability, the Properties of Random Errors, and Composite Scores Week 7, Psych R. Chris Fraley
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Psychology 3051 Psychology 305: Theories of Personality Lecture 2.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Validity and Reliability in Instrumentation : Research I: Basics Dr. Leonard February 24, 2010.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Psychology 3051 Psychology 305A: Theories of Personality Lecture 2 1.
©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.
Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Survey Methodology Reliability and Validity
Reliability and Validity
Ch. 5 Measurement Concepts.
Reliability and Validity in Research
Understanding Results
Journalism 614: Reliability and Validity
پرسشنامه کارگاه.
5. Reliability and Validity
RESEARCH METHODS Lecture 18
Presentation transcript:

Validity and Reliability in Instrumentation 47.469: Research I: Basics Dr. Leonard February 24, 2010

Recap Research design can be… experimental or non-experimental (maybe quasi-experimental) basic or applied research laboratory or field setting quantitative or qualitative data collection Research must be based in solid theory and testable hypotheses Research must include clear conceptual and operational definitions

Quasi-experimental Occurring more commonly in psychology Apply experimental principles like cause and effect or group comparison to field, or less controlled settings More like correlational research Less control over extraneous variables but can take place outside of lab, which may decrease the artificial feeling Interpretation of results not as clean as in experimental research but closer to “real world” application

Scientific method 1. Formulate theories √ 2. Develop testable hypotheses (operational definitions) √ 3. Conduct research, gather data √ 4. Evaluate hypotheses based on data 5. Cautiously draw conclusions

Next steps…gather data Once you have explicitly clear conceptual and operational definitions to guide the research, you must develop your measures for collecting data Operational definition proposes type of measures Instrumentation is the process of selecting or creating measures for a study (the measure is your instrument) Two overarching goals for instrumentation Validity: the extent to which a measure (operationally defined) taps the concept it’s designed to measure and not some other concept Reliability: the consistency or stability of a measure, i.e., same results obtained if measure used again

Caveats Can never be certain of the validity (or reliability) of our instruments so we try to speculate the degree of validity We might claim “modest” or “partial” validity Hard to capture true essence of a concept/construct and some concepts/constructs are more elusive than others! An estimate of the validity of our measures depends on the purpose of the study Keep focused on the hypotheses and operational definitions! Two types of validity we estimate Judgmental validity Empirical validity

Types of validity: Judgmental Content validity: whether the concept being measured is a real concept AND whether the measurement being used is the most appropriate one to be using Is our operationally defined variable (concrete) really capturing the hypothetical concept (abstract) we are interested in studying? Are we capturing the central meaning? Concept Variable/ Measure Intelligence - what is central construct…mental flexibility, problem solving, speed of processing, knowledge of many content areas? Measure - ACT, SAT? THINK: Does our operationalization actually reflect the true theoretical meaning of the variable: Shyness scale– in which we ask people whether they enjoy and get energized from being with others– does that kind of question actually get at this construct of shyness.

Types of validity: Judgmental Content validity, or any other type of validity alone, is never enough to determine if our measure is valid so we consider other types… Face validity: measure is valid because it makes sense; on the surface, it seems to tap into construct of interest Face Validity is neither sufficient nor absolutely necessary for overall validity, but is a helpful clue Could have high face validity but low content validity! What Is Your Emotional Intelligence Quotient? In face validity, you look at the operationalization and see whether "on its face" it seems like a good translation of the construct. This is probably the weakest way to try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is a good measure of math ability (i.e., the label "math ability" seems appropriate for this measure). Situation: You find out that the promotion you were hoping for was given to someone else. Do you:   You lock yourself in your office and cry.  You obsess over what the other person had that you didn't and compare yourself to him or her unmercifully.  You continue to do your best; you know the next promotion is yours.  You forget about it. You didn't want the job that much anyway. Does this measure seem to have good face validity for capturing EQ?

Good face validity? Rosenberg Self-Esteem Scale 1= Strongly Disagree, 7 = Strongly Agree _____1. I feel that I am a person of worth, at least on an equal basis with others. _____2. I feel that I have a number of good qualities. _____3. All in all, I am inclined to think that I am a failure.* _____4. I am able to do things as well as most people. _____5. I feel that I do not have much to be proud of.* _____6. I take a positive attitude towards myself. _____7. On the whole, I am satisfied with myself. _____8. I wish I could have more respect for myself.* _____9. I certainly feel useless at times.* _____10. At times I think I am no good at all.* *Reverse scored

Types of validity: Empirical Criterion-related Validity: extent to which your measure of a concept relates to a theoretically meaningful criterion for that concept, a “gold standard” for that concept Predictive validity: The measure should be able to predict future behavior that is related to the concept E.g., Job skills test and future ratings of performance Concurrent (convergent) validity: The measure should be meaningfully related or correlated to some other measure of the behavior E.g., Scores on two different job skills tests Predicitve or concurent validity coefficient: a number (0-1) based on correlation that quantifies whether the measure is in fact related to other measures it should be related to

Future performance ratings Predictive Validity Correlation coefficient = .60 Qualification For job Job skills test Future performance ratings In predictive validity, we assess the operationalization's ability to predict something it should theoretically be able to predict. A high correlation would provide evidence for predictive validity -- it would show that our measure can correctly predict something that we theoretically thing it should be able to predict. Job skills test

Concurrent (convergent) Validity Qualification for job Job skills test B Job skills Test A Job skills Test B In convergent validity, we examine the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be similar to. **BUT it’s not enough just to show convergent validity--. Also need to make sure that it doesn’t correlate with a bunch of other stuff that it shouldn’t correlate. Need to take a more disconfirmatory approach: Make sure it is unrelated to things it should be unrelated to. For example, if you developed a measure of social anxiety– and you found convergent validity- because it correlated with a shyness measure. Before you get excited, you should make sure it doesn’t correlate with other forms of anxiety like test anxiety or other non-social forms. In order for it to be valid, it has to discriminate. … that leads us to DISCRIMINATE VALIDITY… Job skills test A

Types of validity: Judgmental-Empirical Construct validity represents a combined approach for estimating validity using 1) a subjective prediction about what other concepts (indicators) the concept being measured should relate to and.. May relate positively OR negatively 2) an empirical test of whether the concept is in fact related to those other indicators E.g., Depression should be linked to disengagement from schoolwork among college students so test relationship between depression scores and GPA among a sample of students RELATIONSIP COULD BE POSITIVE OR NEGATIVE

Construct Validity exercise Take heart rate for 30 seconds and multiply by 2, record on separate paper Repeat Average two heart rate measurements Turn in paper Complete Manifest Anxiety Scale Score Turn in sheet Why is this as a test of construct validity?

Reliability The consistency or stability of a measure; easier to establish when measure is unidimensional Related to validity? Yes! Generally, more valid measures tend to be more reliable BUT you could have a highly reliable measure that is low in validity Think of gun shooting a target example Like validity, reliability can be estimated by a correlation coefficient (0-1) Generally, to be respectable in the scientific community, reliability should be .80 (80%) or higher ASK: Can we think of examples in which a test is not valid, but it is reliable? (Measurement of eyesight for a measure of intelligence, as was actually done by Galton) Now that we have discussed these concepts generally, I want to look at them in more detail. This is a greater level of detail than is offered in your text; I think this is one place where telling you more will make MORE sense, so here we go: Another way of thinking about reliability is to consider it from the other side. No psychological measure gives us exact results time after time, even when the underlying concept, what we think we are measuring doesn’t change. Fluctuations occur. These fluctuations are called “measurement errors.” Highly reliable tests have little measurement error, whereas tests that are unreliable have a lot of measurement error. Notice here that we have not established that the age-ism measure actually measures age-ism - we just learn whether it measures something consistently. Based on our assessment of reliability, we cannot make conclusions about what that SOMETHING is, only that it is being measured consistently.

Relationship between reliability and validity Is our measure RELIABLE? Does it have consistency and stability in measurement? Is our measure VALID? Does it measure what it’s supposed to measure? Validity is more important to a research study; reliability can’t tell us if we are measuring the correct concept, only if we are measuring something consistently.

Classical Test Theory X = T + E An observed measurement (or score, X) is comprised of a true score (T, the score that would be obtained if there was no measurement error) and some random measurement error (E). X = T + E X is the observed score T is the true score E is the measurement error Reliability is the degree to which a measurement is consistent (reflects the True score, T) and does not contain measurement error (E). T is theoretical and assumed to be a fixed value. E or the measurement error will vary and thus X the observed score will vary. T is theoretical and assumed to be a fixed value. E or the measurement error will vary and thus X the observed score will vary.

Types of Reliability Test-retest reliability - consistent results from same measure under same conditions two times Across time Inter-rater reliability - consistent results when same measure is given twice, but with different test givers, or have two independent observers code some behavior Across raters or observers Alpha reliability - individual items/questions from a scale measuring same concept are correlated Across items Split-half reliability - items from one part of a scale are correlated and measure same concept as another part

Test-retest reliability (across time) ID Time 1 X Time 2 1 2 3 4 18 12 29 25 19 13 28

Inter-rater reliability (across raters) ID Rater 1 X Rater 2 Rater 3 1 2 3 4 18 12 29 25 19 13 28 20 14 27 24 Pattern test calls this INTER-OBSERVER!

Alpha reliability (across items) ID Item 1 X1 Item 2 X2 (Reversed) Item 3 X3 Item 4 X4 Item 5 X5 Item 6 X6 1 2 3 4 5 I WILL CALCULATE THIS FOR THE MAS FOR NEXT TIME, ALONG WIT CONSTRUCT VALIDITY OF MAS AND HEART RATE Sometimes called internal consistency

Split-half reliability (across items) ID First 1/2 items Second 1/2 items 1 2 3 4 5 6 7 8 9 10 1 2 3 1 2 3 Sometimes called internal consistency

The more, the better As with validity, it is always better if you can estimate or test for multiple forms of reliability! Sometimes called parallel-forms reliability if measure is available in more than one version and can be given in both ways and then compared

Valid? Reliable? Concept: Parental engagement in child’s academic development How often do you help your child with his/her homework (please check one)? _Never _Rarely _Sometimes _Often _Everyday

Concept? Valid? Reliable? Is there a chance that you could get HIV/AIDS? 1--------------------2--------------------3 Not at Small chance Yes All Definitely Do you worry about getting HIV/AIDS? (circle one number) 1--------------------2--------------------3--------------------4--------------------5 Never Almost Sometimes Often Very Never Often

Concept? Valid? Reliable? How important is financial success to you? _Very important _Somewhat important  _Not at all important How important is it for you to have nice things?

Our total MAS scores and average heart rate were only correlated at - Our total MAS scores and average heart rate were only correlated at -.04; correlational relationships can have a magnitude from 0-1 but also a direction (+ or -) Good construct validity? Why? Good predictive validity? Why?

Beginning APA style for your proposal Author last name, author first and middle initials. (Year published). Title of article. Title of Journal, Volume number (Issue number), pg.-pg. Morelli, G. A., Rogoff, B., Oppenheim, D., & Goldsmith, D. (1996). Cultural variations in infants’ sleeping arrangements: Questions of independence. Developmental Psychology, 28(4), 604-613. Put the following three articles into an APA style reference

Three APA references Samuolis, J., Layburn, K., & Schiaffino, K. M. (2001). Identity development and attachment to parents in college students. Journal of Youth and Adolescence, 30 (3), 373-383. Tripodi, S. J., Bender, K., Litschge, C., & Vaughn, M. G. (2010). Interventions for reducing adolescent alcohol abuse: A meta-analytic review. Archives of Pediatric and Adolescent Medicine, 164 (1), 85-91.