 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Slides:



Advertisements
Similar presentations
Standardized Scales.
Advertisements

Chapter 8 Flashcards.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Consistency in testing
Topics: Quality of Measurements
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Some (Simplified) Steps for Creating a Personality Questionnaire Generate an item pool Administer the items to a sample of people Assess the uni-dimensionality.
Reliability and Validity checks S-005. Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency.
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
The Department of Psychology
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
Part II Sigma Freud & Descriptive Statistics
Measurement the process by which we test hypotheses and theories. assesses traits and abilities by means other than testing obtains information by comparing.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Reliability and Validity of Research Instruments
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability n Consistent n Dependable n Replicable n Stable.
Conny’s Office Hours will now be by APPOINTMENT ONLY. Please her at if you would like to meet with.
Research Methods in MIS
Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Measurement and Data Quality
Reliability, Validity, & Scaling
MEASUREMENT MODELS. BASIC EQUATION x =  + e x = observed score  = true (latent) score: represents the score that would be obtained over many independent.
Reliability and Validity what is measured and how well.
Foundations of Educational Measurement
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
CJT 765: Structural Equation Modeling Class 8: Confirmatory Factory Analysis.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
MEASUREMENT: PART 1. Overview  Background  Scales of Measurement  Reliability  Validity (next time)
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability Ability to produce similar results when repeated measurements are made under identical conditions. Consistency of the results Can you get.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Lesson 5.1 Evaluation of the measurement instrument: reliability I.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Reliability. Basics of test score theory Each person has a true score that would be obtained if there were no errors in measurement. However, measuring.
Measurement and Scaling Concepts
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Understanding Results
Calculating Reliability of Quantitative Measures
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
The first test of validity
15.1 The Role of Statistics in the Research Process
Ch 5: Measurement Concepts
Presentation transcript:

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account to measure the variable  Typical research lingo “How will you operationalize that variable?

 Identify operational definitions of the following latent constructs: ◦ Intelligence of an individual ◦ Market value of a firm ◦ Employee theft ◦ Organizational performance ◦ Accounting Fraud ◦ Customer retention ◦ Team diversity

 Consistency, dependability, or stability in measurements over observations or time  Degree to which we measure something without error (It is a theory of error)  Necessary but not sufficient condition for validity  Reliability forms the upper bound for validity

X = T + E T = score or measurement that would be obtained under perfect conditions E = error because measurement is never perfect, assumes error is random  X expected to equal  T because  E = 0, thus E(X) = E(T)

 2 x =  2 t +  2 e In other words, the total variance associated with any measure is equal to true score variance plus error variance Where does error variance come from?

r xt =  t 2  x 2 In other words, the theoretical definition of reliability: Reliability is the portion of the overall variance in a measure which is true score variance

  Correlation equal to the ratio of the standard deviation of true scores to the standard deviation of observed scores  Squared correlation indicates proportion of variance in observed scores due to true differences among people r xt 2 = r xx =  t 2 /  x 2

 Since r xt 2 = r xx =  t 2 /  x 2  Reliability defined as ratio of true score variance to observed-score variance  Square root of reliability is correlation between observed and true scores - called the reliability index

 2 x =  2 t +  2 e so,  2 t =  2 x -  2 e reliability =  2 t /  2 x so by substitution, reliability = (  2 x -  2 e ) /  2 x so, reliability = 1 - (  2 e /  2 x ) so, reliability can range from 0 to 1 and if r xx =.80 then 80% variance is systematic Does not distinguish between true variance and systematic error In reliability theory, unsystematic variance = error

 Equivalent Forms  Test-Retest  Internal Consistency  Interrater/Interobserver (not considered a true model of reliability)  Differ in how they treat error

 Test Retest - Observations taken at two different time periods using the same measure are correlated  Coefficient of stability  Error includes anything that changes between one administration of measures and next (including real changes)  Not good for unstable constructs

 Equivalent Forms - Observations using two different forms of measurement are correlated  Coefficient of equivalence (and stability if forms are administered at different times)  Error includes any changes between administrations plus anything that varies between one form and next (including real changes over time)  Not good for unstable constructs (e.g., mood) unless measures are taken within a short period of time

 Internal Consistency - Items within a given measure are correlated  Coefficient of equivalence ◦ Split-half  Even-odd  Early-late  Random split ◦ Coefficient alpha  Does not assess stability of a measure over time

 Can be viewed as variation of equivalent forms  Measures within a given form are split and correlated  Because correlation is based half of the items, can correct using Spearman-Brown r xx = 2r 1/2 1/2 / (1 + r 1/2 1/2 ) Where r 1/2 1/2 is the correlation between halves  Flawed approach (many possible halves)

 Can be derived from Spearman-Brown formula  Assumes measures are homogenous  Average of all possible split-half reliability coefficients of a given measure  Does NOT tell you if there is a single factor or construct in your scale (does NOT indicate unidimensionality)

k = number of items (or measures) Alternatively, the formula can be written: Where  C is the sum of all items in a covariance matrix With this formula, what effect should k have?

 Number of items usually increases internal consistency reliability (if items are of equal quality)  Increased correlations between items increases internal consistency reliability  With enough items, alpha can be high even when correlations between items is low  Alpha should NOT be taken to indicate that the scale is unidimensional  Benchmark for “acceptable” alpha =.7 (Nunnally and Bernstein, 1994)

 Interobserver or Interrater - Measures between different raters are correlated  Coefficient of equivalence for raters  Not the same as agreement (agreement is absolute, but reliability is correlational/ relational)  Error includes anything that leads to differences between observer/judge ratings

 Often we need a measure of agreement - kappa is a measure of agreement between two judges on categories  Researchers often use % agreement - not appropriate  % agreement doesn’t take into account chance levels (e.g., two categories, chance = 50%)  kappa deals with this by using a contingency table  Use kappa if you need to measure interrater agreement on categorical coding

 kappa - a measure of agreement between two observers taking into account agreement that could occur by chance (expected agreement). kappa = Observed agreement - Expected agreement % - Expected agreement

 High reliability, low agreement (job applicants rated by interviewers on 1-5 scale) App 1App 2App3 Intv1544 Intv2433 Intv3322

 Nature of construct being measured  Sources of error relevant to construct ◦ Test/retest - all changes due to time ◦ Equivalent forms - all changes due to time and all differences between forms ◦ Internal consistency - differences between items within a measure ◦ Interrater/Interobserver - differences between observers and judges

 Early stages of research in an area, lower reliabilities may be acceptable  Higher reliabilities are required when measures used to differentiate among groups  Extremely high reliabilities required when making important decisions  Rule of thumb =.7 minimum reliability; preferably.8+ (Nunnally & Bernstein, 1994) but.95+ may be TOO high

 Attenuation of correlations r xy = r * xy  r xx r yy ) r * xy =correlation between true scores of X,Y  Correction for attenuation r * xy = r xy /  r xx r yy )

 Look at the hypothesis statements you wrote last week  For each variable (independent, dependent, mediator, moderator) from last week, discuss how you would operationally define and measure this variable. After you have identified your measure, indicate how you would assess the reliability of your measure, if possible. If it is not possible to assess reliability, why not?