Reliability and Validity

Slides:

Advertisements

Similar presentations

Assessment in Early Childhood Education Fifth Edition Sue C. Wortham

Advertisements

Chapter 8 Flashcards.

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.

Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.

Correlation CJ 526 Statistical Analysis in Criminal Justice.

Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.

Basic Statistical Concepts Psych 231: Research Methods in Psychology.

Lecture 4: Correlation and Regression Laura McAvinue School of Psychology Trinity College Dublin.

Basic Statistical Concepts

Statistics Psych 231: Research Methods in Psychology.

Curriculum-Based Assessment and Other Informal Measures

Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,

FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,

SIMPLE LINEAR REGRESSION

Basic Statistical Concepts Part II Psych 231: Research Methods in Psychology.

Research Methods in MIS

Chapter 7 Correlational Research Gay, Mills, and Airasian

Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Correlation and Linear Regression

Linear Regression and Correlation

Correlation and regression 1: Correlation Coefficient

Chapter 15 Correlation and Regression

Instrumentation.

Foundations of Educational Measurement

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

C H A P T E R 15 Standardized Tests and Teaching

Chapter 1: Research Methods

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

Research & Statistics Looking for Conclusions. Statistics Mathematics is used to organize, summarize, and interpret mathematical data 2 types of statistics.

Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.

Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.

Investigating the Relationship between Scores

Reliability & Validity

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Chapter 4 Validity Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.

Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

Measurement MANA 4328 Dr. Jeanne Michalski

Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.

Chapter 6 - Standardized Measurement and Assessment

Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,

©2005, Pearson Education/Prentice Hall CHAPTER 6 Nonexperimental Strategies.

Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.

Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.

Reliability & Validity

Chapter 15: Correlation.

Evaluation of measuring tools: reliability

Presentation transcript:

Reliability and Validity Chapter 4 Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Reliability-Having confidence in the consistency of the test results. Validity-Having confidence that the test is measuring what it is supposed to measure. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Correlation- A statistical method of observing the degree of relationship between two sets of data or two sets of variables. Correlation coefficient -The numerical representation of the strength and direction of the relationship between two sets of variables or data (0 to 1.0; can be positive or negative). Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

General Principles of Research Correlational Studies A positive correlation (+) means that as one variable increases, so does the other. A zero or near zero correlation means that the variables have no relationship

The direction of the line plotted on the scatter plot provides a clue about the relationship. If the relationship is positive, the direction of the line looks like this: Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

General Principles of Research Correlational Studies The value of the correlation coefficient can range from –1.00 to +1.00. The higher the absolute value, the stronger the relationship is, regardless of the direction. A negative correlation (-) means that as one variable increases, the other decreases.

If the data represent a negative correlation, the direction of the line in the scatter plot looks like this: Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Concept Check The greater the score on a depression inventory, the lower the score on a memory test Negative

Concept Check Which relationship is stronger? +. 85 or -.90 or 1.2 - .90 +. 85 or -.90 or 1.2 .25 or - .60 or + .63 +.63

Figure 2.9 A strong correlation between depression and impaired sleep does not tell us whether depression interferes with sleep, poor sleep leads to depression, or whether another problem leads to both depression and sleep problems.

Positive Correlation * In investigating how data are related, it is important to determine if the two sets of data represent positive, negative, or no correlation. *In a positive correlation, when a student scores high on the first variable or test, the student will also score high on the second measure. *The data below illustrate a positive correlation: Student 1- 75 Student 2- 88 Student 3- 90 Student 4- 63 Student 1- 72 Student 2- 89 Student 3- 93 Student 4- 64 Set 2 Set 1 Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Negative Correlation In a negative correlation, when a student scores high on one variable or test, the student will score low on the other variable or test. The data below illustrate a negative correlation: Student 1- 88 Student 2- 99 Student 3- 56 Student 4- 97 Student 1- 32 Student 2- 45 Student 3- 15 Student 4- 12 Set 1 Set 2 Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Height and Temperament of 20 Men Correlation Height and Temperament of 20 Men 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 80 63 61 79 74 69 62 75 77 60 64 76 71 66 73 70 68 90 42 81 39 48 72 57 30 84 Subject Height in Inches Temperament

Scatterplot of Height and Temperament Correlation 95 90 85 80 75 70 65 60 55 50 45 40 35 30 25 Temperament scores Height in inches 55 60 65 70 75 80 85 Scatterplot of Height and Temperament

Pearson correlation coefficient r = the Pearson coefficient r measures the amount that the two variables (X and Y) vary together (i.e., covary) taking into account how much they vary apart Pearson’s r is the most common correlation coefficient; there are others.

Computing the Pearson correlation coefficient To put it another way: Or

Sum of Products of Deviations Measuring X and Y individually (the denominator): compute the sums of squares for each variable Measuring X and Y together: Sum of Products Definitional formula Computational formula n is the number of (X, Y) pairs

Correlation Coefficent: the equation for Pearson’s r: expanded form:

Example What is the correlation between study time and test score:

Calculating values to find the SS and SP:

Calculating SS and SP

Calculating r

Correlation Coefficient Interpretation Range Strength of Relationship 0.00 - 0.20 Very Low 0.20 - 0.40 Low 0.40 - 0.60 Moderate 0.60 - 0.80 High Moderate 0.80 - 1.00 Very High

Methods of Studying Reliability Test-retest reliability- A study that employs the readministration of a single instrument to check for consistency across time. Equivalent forms reliability-Consistency of a test using like forms that measure the same skill, domain, or trait; also known as alternate forms reliability. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Methods of Studying Reliability, Continued Internal consistency-Methods to study the reliability across the items of the test. Examples include split-half reliability, K-R 20, and coefficient alpha. Split-half reliability-A method of measuring internal consistency by studying the reliability across items by comparing the data of the two halves of the test. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Methods of Studying Reliability, Continued Interrater Reliability- The consistency of a test to measure a skill, trait, or domain across examiners. This type of reliability is most important when responses are subjective or open-ended. Reliability coefficients may vary across age and grade levels of a specific instrument. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Methods of Studying Reliability, Continued Kuder-Richardson 20- (K-R 20)- A formula used to check consistency across items of an instrument that has items scored as 1 or 0 or right/wrong. Coefficient Alpha- A formula used to check the consistency across items of instruments with responses of varying credit. For example, items may be scored as 0,1,2,or 3 points. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Standard Error of Measurement Each test score is made of two parts: true score and error. A student’s true score may only be estimated. The standard error of measurement is a method used to estimate the amount of error of a test. It represents the typical amount of error of any obtained score. The standard error of measurement is used to estimate a range of scores within which the student’s true scores exists. Obtained Score = True Score + Error Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Standard Error of Measurement, Continued The standard error of measurement is calculated using the following formula: SEM = SD 1-.r Where SEM = the standard error of measurement SD = the standard deviation of the norm group of scores obtained during the development of the instrument r = the reliability coefficient Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Example of Calculating SEM For a specific test, the standard deviation is 4. The reliability coefficient is .89. The SEM would be: 4 1- .89 .11 This represents the amount of error on this test instrument. 4 4 x .33 = 1.32 SEM = 1.32 Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

The range of possible scores is Applying the SEM The SEM was 1.32. A student’s obtained score was 89. To determine the range of possible true scores, add and subtract the SEM from the obtained score of 89. 89 + 1.32 = 90.32 89 - 1.32 = 87.68 The range of possible scores is 87.68 - 90.32 Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Selecting the Best Test Instruments When considering which tests will be the most reliable, it is important to select a test that has the highest reliability coefficient and the smallest standard of error. This will mean that the results obtained are more likely to be more consistent with the student’s true ability. The obtained score will contain less error. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Test Validity Validity indicates the degree of quality of the test instrument. This is a measure of how accurately the test measures what it is designed to measure. Some of the methods that indicate validity of a test include criterion-related validity, content validity, construct validity. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Criterion-related Validity There are two ways to study criterion-related validity. Concurrent-related validity- When a test is compared with another measure at the same time. Predictive validity-When a test is compared with a measure in the future. For example, when college entrance exams are compared with student performance in College (GPAs). Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Content Validity In order for a test to have good content validity, it must have items that are representative of the domain or skill being assessed. During the development of the test, items are selected after careful study of the items and the domain they represent. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Content Validity In order for a test to have good content validity, it must have items that are representative of the domain or skill being assessed. During the development of the test, items are selected after careful study of the items and the domain they represent. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Construct Validity Construct validity means that the instrument has the ability to assess the psychological constructs it was meant to measure. A construct is a psychological trait or characteristic such as creativity or mathematical ability. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Studying Construct Validity Construct validity can be studied by Investigating: Developmental changes-If an ability is expected to change across time and that ability is the construct of the test, testing across time will show changes in the performance on that test as the ability responds to developmental change. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Construct Validity Continued Correlation with other tests-If the construct has been measured successfully by other instruments, a correlation with those other instruments is evidence of construct validity. Factor analysis-Using a statistical method known as factor analysis, evidence can be gathered for construct validity. Factor analysis allows the investigator to see if items that are thought to measure a construct are answered in the same manner. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Construct Validity Continued Internal Consistency-This measure also provides evidence of construct validity. Convergent or discriminant validity-These Measures indicate that the performance is consistent with like measures (convergent) or the performance is unlike measures thought to measure different constructs than the one being assessed by the specific instrument. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e

Construct Validity Continued Experimental interventions-When the construct being assessed can be influenced by an intervention or treatment, assessment before and after the intervention provides evidence of construct validity. Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey 07458 All rights reserved. Terry Overton Assessing Learners with Special Needs, 5e