Measuring Research Variables

Slides:



Advertisements
Similar presentations
Topics: Quality of Measurements
Advertisements

Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
VALIDITY AND RELIABILITY
1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
VALIDITY.
Concept of Measurement
Beginning the Research Design
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Research Methods in MIS
Chapter 7 Correlational Research Gay, Mills, and Airasian
Classroom Assessment A Practical Guide for Educators by Craig A
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
Reliability, Validity, & Scaling
Instrumentation.
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
SELECTION OF MEASUREMENT INSTRUMENTS Ê Administer a standardized instrument Ë Administer a self developed instrument Ì Record naturally available data.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Reliability & Validity
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
EDU 8603 Day 6. What do the following numbers mean?
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Chapter 6 - Standardized Measurement and Assessment
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Measuring Research Variables
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Measurement Chapter 6. Measuring Variables Measurement Classifying units of analysis by categories to represent variable concepts.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Measurement and Scaling Concepts
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Survey Methodology Reliability and Validity
Concept of Test Validity
Evaluation of measuring tools: validity
CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.
Reliability & Validity
پرسشنامه کارگاه.
Reliability and Validity of Measurement
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
RESEARCH METHODS Lecture 18
Chapter 11: Measuring Research Variables
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Measuring Research Variables Chapter 11 Measuring Research Variables Research Methods in Physical Activity

Validity Validity (Degree to which a test or instrument measures what it purports to measure; can be categorized as logical, content, criterion, or construct validity. ) refers to the soundness of the interpretation of scores from a test, the most important consideration in measurement. There are different purposes for using certain measures. Consequently, there are different kinds of validity. There are four basic types of validity: logical, content, criterion, and construct. Logical validity - Degree to which a measure obviously involves the performance being measured; also known as face validity. Content validity - Degree to which a test (usually in educational settings) adequately samples what was covered in the course. Criterion validity - Degree to which scores on a test are related to some recognized standard or criterion. The two main types of criterion validity are concurrent validity and predictive validity. Research Methods in Physical Activity

Validity Criterion validity (cont) concurrent validity - Type of criterion validity in which a measuring instrument is correlated with some criterion that is administered concurrently or at about the same time. predictive validity - Degree to which scores of predictor variables can accurately predict criterion scores. Note that shrinkage can occur when using one prediction equation formed from one sample, then applied to another. Shrinkage is the reduction in the predictive ability. This phenomenon can be addressed by cross-validating the prediction equated from the original sample to the sample tested. This becomes important when the two samples vary in subject demography and/or if the original prediction formula was from a small sample size. Cross-Validation: Technique to assess the accuracy of a prediction formula in which the formula is applied to a sample not used in developing the formula. Research Methods in Physical Activity

Validity Construct validity - Degree to which a test measures a hypothetical construct; usually established by relating the test results to some behavior. For example, certain behaviors are expected of someone with a high degree of sportsmanship. Such a person might be expected to compliment the opponent on shots made during a tennis match. For an indication of construct validity, a test maker could compare the number of times a person scoring high on a test of sportsmanship complimented the opponent with the number of times a person scoring lower on the test did so. The known group difference method is sometimes used in establishing construct validity. The known groups difference method is used for establishing construct validity in which the test scores of groups that should differ on a trait or ability are compared. (ex. If the sprinters and jumpers score significantly better on a test designed to measure anaerobic power than the distance runners do, this finding would provide some evidence that the test measures anaerobic power.) An experimental approach is occasionally used in demonstrating construct validity. For example, a test of cardiovascular fitness might be assumed to have construct validity if it reflected gains in fitness following a conditioning program. Research Methods in Physical Activity

Reliability An integral part of validity is reliability, which pertains to the consistency, or repeatability, of a measure. A test cannot be considered valid if it is not reliable. In other words, if the test is not consistent—if you cannot depend on successive trials to yield the same results—then the test cannot be trusted. Test reliability is sometimes discussed in terms of observed score, true score, and error score. test score obtained by an individual is the observed score. an observed score theoretically consists of the person’s true score and error score. expressed in terms of score variance, the observed score variance consists of true score variance plus error score variance. The goal of the tester is to remove error to yield the true score. because true score variance is never known, it is estimated by subtracting error variance from observed score variance. Thus, the reliability coefficient (discussed later) reflects the degree to which the measurement is free of error variance. The coefficient of reliability is the ratio of true score variance to observed score variance. Research Methods in Physical Activity

Reliability Sources of Error Measurement error can come from four sources: the participant, the testing, the scoring, and the instrumentation. Participant Error - Measurement error associated with the participant includes many factors, including mood, motivation, fatigue, health, fluctuations in memory and performance, previous practice, specific knowledge, and familiarity with the test items. Testing Error – testing error is related to how clear and complete the directions are, how rigidly the instructions are followed, and whether supplementary directions or motivation is applied. Research Methods in Physical Activity

Reliability Sources of Error Scoring Error - Errors in scoring relate to the competence, experience, and dedication of the scorers and to the nature of the scoring itself. The extent to which the scorer is familiar with the behavior being tested and the test items can greatly affect scoring accuracy. Carelessness and inattention to detail produce measurement error. Measurement Error - Measurement error because of instrumentation includes such obvious causes as inaccuracy and lack of calibration of mechanical and electronic equipment. It also refers to the inadequacy of a test to discriminate between abilities and to the difficulty of scoring some tests. Research Methods in Physical Activity

Reliability Coefficient Expression of Reliability The degree of reliability is expressed by a correlation coefficient, ranging from 0.00 to 1.00. The closer the coefficient is to 1.00, the less error variance it reflects and the more the true score is assessed. Interclass correlation - This coefficient is a bivariate statistic, meaning that it is used to correlate two different variables. But interclass correlation is not appropriate for establishing reliability because two values for the same variable are being correlated. (When a test is given twice, the scores on the first test are correlated with the scores on the second test to determine their degree of consistency) Intraclass correlation - The procedures leading to the calculation of intraclass correlation (R) are the same as those of simple ANOVA with repeated measures. (see Table 11.2, p.199). Note that the “F” statistic for “trials” determines whether there was any significant difference between three trails of the same measure. The intraclass correlation is calculated on p.200 (note that the best way to increase the “R” value is to decrease the residual scores – remove unexplained variance) Research Methods in Physical Activity

Methods of Establishing Reliability Stability - A coefficient of reliability measured by the test–retest method on different days. In the test–retest method, the test is given one day and then repeated a day or so later. Intraclass correlation should be used to compute the coefficient of stability of the scores on the two tests. Alternate-forms method - establishing reliability involves the construction of two tests that supposedly sample the same material. This method is sometimes referred to as the parallel-form method or the equivalence method. The two tests are given to the same individuals. Ordinarily, some time elapses between the two administrations. The scores on the two tests are then correlated to obtain a reliability coefficient. Research Methods in Physical Activity

Methods of Establishing Reliability Internal Consistency- An estimate of the reliability that represents the consistency of scores within a test. Same-day test–retest method - Method of establishing reliability in which a test is given twice to the same participants on the same day. Split-half technique - Method of testing reliability in which the test is divided in two, usually by making the odd-numbered items one half and the even numbered items the other half. The two halves are then correlated. Research Methods in Physical Activity

Methods of Establishing Reliability Internal Consistency- Flanagan method - A process for estimating reliability in which the test is split into two halves, and the variances of the halves of the test are analyzed in relation to the total variance of the test. (see example 11.3, p. 202) Kuder-Richardson (KR) method of rational equivalence - Formulas developed for estimating reliability of a test from a single test administration. Only one test administration is required, and no correlation is calculated. The resulting coefficient represents an average of all possible split-half reliability coefficients Research Methods in Physical Activity

Methods of Establishing Reliability Intertester Reliability - the degree to which different testers can achieve the same scores on the same subjects. Also called objectivity. Objectivity - The degree to which different testers can achieve the same scores on the same subjects, also known as intertester reliability. The degree of objectivity (intertester reliability) can be established by having more than one tester gather data. Then the scores are analyzed with intraclass correlation techniques to obtain an intertester reliability coefficient. This approach typically involves a coding instrument to construct Interobserver Agreement (see formula 11.4, p. 203) Research Methods in Physical Activity

Standard Scores To Compare Performance (also refer to Table 2 in Appendix) Z – scores (see p. 205, for example) The basic standard score is the z score. The z scale converts raw scores to units of standard deviation in which the mean is zero and a standard deviation is 1.0. The formula is z = (X – M)/s T scale (see p. 205, for example) Type of standard score that sets the mean at 50 and standard deviation at 10 to remove the decimal found in z scores and to make all scores positive. Research Methods in Physical Activity

Measuring Affective Behavior To be continued. ( Exam three will include information up to this point. The remaining information from Chapter 11 on scales for measuring affective behavior will be covered in class with the information from Chapter 15 on Survey Research). The remainder of the information in this Chapter will be included in Exam four) Chapter 11 Information continues on next slide. Research Methods in Physical Activity

Measuring Affective Behavior Affective behavior includes attitudes, personality, anxiety, self-concept, social behavior, and sportsmanship. Scales for Measuring Affective Behavior Likert-Type Data : Type of closed question that requires the participant to respond by choosing one of several scaled responses; the intervals between items are assumed to be equal. Example: I prefer quiet recreational activities such as chess, cards, or checkers rather than activities such as running, tennis, or basketball. Strongly agree Agree Undecided Disagree Strongly disagree Research Methods in Physical Activity

Measuring Affective Behavior Benefits of Likert-Type Data A principal advantage of scaled responses such as the Likert-type is that they permit a wider choice of expression than responses such as “always” or “never,” or “yes” or “no.” The five, seven, or more intervals may help increase the reliability of the instrument. Semantic Differential Scale: Is used to measure affective behavior in which the respondent is asked to make judgments about certain concepts by choosing one of seven intervals between bipolar adjectives. (see example in text, p 208) Research Methods in Physical Activity

Measuring Affective Behavior Rating Scales: A measure of behavior that involves a subjective evaluation based on a checklist of criteria. Raters are usually experts on the criterion measure. When more than one judge is asked to rate performances, some common standards must be set. Rating Errors Leniency - Tendency for observers to be overly generous in rating. Central tendency errors - Inclination of the rater to give an inordinate number of ratings in the middle of the scale, avoiding the extremes of the scale. Halo effect - A threat to internal validity wherein raters allow previous impressions or knowledge about a certain individual to influence all ratings of that individual’s behaviors. Research Methods in Physical Activity

Measuring Affective Behavior Rating Errors Proximity error - Inclination of a rater to consider behaviors to be more nearly the same when they are listed close together on a scale than when they are separated by some distance. (For example, if the qualities “active” and “friendly” are listed side by side on the scale, proximity errors result if raters evaluate performers as more similar on those characteristics than if the two qualities were listed several lines apart on the rating scale.) Observer bias error - Inclination of a rater to be influenced by his or her own characteristics and prejudices. Observer bias errors are directional because they produce errors that are consistently too high or too low. Research Methods in Physical Activity

Measuring Affective Behavior Rating Errors Observer expectation error - Inclination of a rater to see evidence of certain expected behaviors and interpret observations in the expected direction. Observer expectations can contaminate the ratings because a person who expects certain behaviors is already inclined to see evidence of those behaviors and interpret observations in the “expected” direction. (In the research setting, potential observer expectation errors are likely when the observer knows what the experimental hypotheses are and is thus inclined to watch for these outcomes more closely than if the observer were unaware of the expected outcomes. ) Research Methods in Physical Activity

Measuring Knowledge Item Analysis Item analysis - Process in analyzing knowledge tests in which the suitability of test items and their ability to discriminate are evaluated. Thus, the purpose of item analysis is to determine which test items are suitable and which need to be rewritten or discarded. Two important parts of item analysis are: To analyze the difficulty of the items on the test To determine the degree of item discrimination Research Methods in Physical Activity

Measuring Knowledge Item Analysis Item Difficulty - analysis of the difficulty of each test item in a knowledge test, determined by dividing the number of people who correctly answered the item by the total number of people who responded to the item. (The more difficult the item is, the lower its difficulty index is) Most test authorities recommend eliminating questions with difficulty indices below .10 or above .90. The best questions are those that have difficulty indices around .50. Research Methods in Physical Activity

Measuring Knowledge Item Analysis Item Discrimination - The degree to which a test item discriminates between people who did well on the entire test and those who did poorly; also called index of discrimination. Item Discrimination may be calculated by dividing the completed tests into a high-scoring group and a low-scoring group and then use the following formula: Index of discrimination = (nH – nL)/n where nH is the number of high scorers who answered the item correctly, nL is the number of low scorers who answered the item correctly, and n is the total number in either the high or the low group. (Ex. if we have 30 in the high group and 30 in the low group and if 20 of the high scorers answered an item correctly and 10 of the low scorers answered it correctly, the index of discrimination would be (20 – 10)/30 = 10/30 = .33.) Research Methods in Physical Activity

End of Presentation Research Methods in Physical Activity