EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION)

Slides:



Advertisements
Similar presentations
Multiple Indicator Cluster Surveys Survey Design Workshop
Advertisements

Fluency Assessment Bryan Karazia.
Validity and Reliability of Analytical Tests. Analytical Tests include both: Screening Tests Diagnostic Tests.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
ORGANISING AN ORAL HEALTH SURVEY WHO ORAL HEALTH SURVEYS Basic Methods
Power and sample size.
Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.
Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 4 – Reliability Observed Scores and True Scores Error
Estimation of Sample Size
15 de Abril de A Meta-Analysis is a review in which bias has been reduced by the systematic identification, appraisal, synthesis and statistical.
Concept of Measurement
Intermediate methods in observational epidemiology 2008 Quality Assurance and Quality Control.
Anthropometry Technique of measuring people Measure Index Indicator Reference Information.
A quick introduction to the analysis of questionnaire data John Richardson.
Statistics for Health Care
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Assessing and Evaluating Learning
Chapter 9: Introduction to the t statistic
Chapter 14 Inferential Data Analysis
Quality Assurance in the clinical laboratory
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Chemometrics Method comparison
Sample size calculation
Validity and Reliability Dr. Voranuch Wangsuphachart Dept. of Social & Environmental Medicine Faculty of Tropical Medicine Mahodil University 420/6 Rajvithi.
Are the results valid? Was the validity of the included studies appraised?
HYPOTHESIS TESTING Dr. Aidah Abu Elsoud Alkaissi
Lecture 4: Assessing Diagnostic and Screening Tests
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Final Study Guide Research Design. Experimental Research.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Teaching Registrars Research Methods Variable definition and quality control of measurements Prof. Rodney Ehrlich.
Chapter 2 Research in Abnormal Psychology. Slide 2 Research in Abnormal Psychology  Clinical researchers face certain challenges that make their investigations.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Categorical data 1 Single proportion and comparison of 2 proportions دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright 2010, The World Bank Group. All Rights Reserved. Principles, criteria and methods Part 2 Quality management Produced in Collaboration between.
Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
1 Wrap up SCREENING TESTS. 2 Screening test The basic tool of a screening program easy to use, rapid and inexpensive. 1.2.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
10 May Understanding diagnostic tests Evan Sergeant AusVet Animal Health Services.
Sample Size Determination
USE OF UNCERTAINTY OF MEASUREMENT IN TESTING ROHAN PERERA MSc ( UK ), ISO/IEC Technical Assessor, Metrology Consultant.
Sampling Design & Measurement Scaling
Reliability Ability to produce similar results when repeated measurements are made under identical conditions. Consistency of the results Can you get.
Quality Control: Analysis Of Data Pawan Angra MS Division of Laboratory Systems Public Health Practice Program Office Centers for Disease Control and.
Measuring Dental Caries
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Sample Size Mahmoud Alhussami, DSc., PhD. Sample Size Determination Is the act of choosing the number of observations or replicates to include in a statistical.
Chapter 13 Understanding research results: statistical inference.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
By Dr Hidayathulla Shaikh. Objectives  At the end of the lecture student should be able to –  Define survey  Mention uses of survey  Discuss types.
Establishing by the laboratory of the functional requirements for uncertainty of measurements of each examination procedure Ioannis Sitaras.
Oral Health Training & Calibration Programme
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Statistical analysis.
Sample Size Determination
TOOTH NUMBERING SYSTEM
Statistical analysis.
Class session 7 Screening, validity, reliability
Reliability & Validity
Intermediate methods in observational epidemiology 2008
Presentation transcript:

EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر

RELIABILITY AND VALIDITY OF DATA Two main reasons for variability in scoring (WHO, 1997): Difficulty in scoring the different levels of oral diseases, particularly dental caries and periodontal diseases Physical and psychological factors (fatigue, fluctuations in interest in the study, variations in visual acuity and tactile sense) that affect the judgement of examiners format time to time and to a different degree

RELIABILITY AND VALIDITY OF DATA What is the principal problem/ issue with the variability? To decide whether examiners are sufficiently close to each other in their interpretation and application of the clinical criteria. In this sense, data from their samples can be pooled together to provide area/district estimates, whose variances reflect true inter-subject , differences in oral health and not an inflation due to examiner differences (Pine et al. 1997).

RELIABILITY AND VALIDITY OF DATA Objectives of standardisation and calibration (WHO , 1997): To ensure uniform interpretation, understanding and application by all examiners of the codes and criteria for the various diseases and conditions to be observed and recorded . To ensure that each examiner can examine consistently.

RELIABILITY AND VALIDITY OF DATA How can this problem / issue be tackled? Training of examiners and interviewers Calibration exercise Repeat examinations

TRAINING EXERCISE What do we mean by a training exercise? The training exercise aims to thoroughly and intensively teach to the survey examiners the logistics of the examination protocol and the agreed interpretation of the diagnostic criteria. In practical terms, the full range of diagnostic situations Are presented and discussed in detail: a) on slides, b) on actual subjects. It takes place before the survey and requires at least 2 days of intensive work. It may be repeated at specific intervals during the survey.

CALIRBATION EXERCISE What do we mean by a calibration exercise? The calibration exercise completes the training and reflects a formal measure of how well the examiner can interpret the criteria, compared to the "gold-standard" set by, the trainer. It takes place before the survey and may be (repeated annually.

CALIBRATION EXERICISE How does this practically happen? Some subjects are examined by some (or even all) examiners and by the gold-standard examiner and the data are compared. Repeated annually, in order to ensure consistency in the interpretation of criteria and familiarity with new measures. The calibration exercise should include a sufficient number of cases (≥20 subjects), on which a wide range of diagnostic decisions have to be made (i.e. treated and untreated caries, as well as caries-free subjects).

CALIBRATION EXERCISE What is the action taken? Outlier" examiners and the specific areas of over- or under-scoring are identified. The issue is discussed and thoroughly clarified. A repeat calibration exercise should be undertaken. On a repetitive unsatisfactory result, the outlier may be excluded from the survey. (practical difficulties) NB .Ability to standardise clinical examination results is not a measure of clinical skill (Claritfii1 advance)

REPEAT EXAMINATIONS What do we mean by repeat examinations? The repeat examinations can be carried out: a) by the same examiner, aiming to monitor the intra-examiner diagnostic consistency (single examiner), or b) by the gold-standard examiner, aiming to ensure inter examiner diagnostic consistency (group of examiners). In practical terms, this implies performing duplicate examinations on 5·10% of the survey sample (≥ 25 subjects). It should take place in various stages of the survey (beginning, half-way, end).

TRAINING AND CALIBRATION OF EXAMINERS 1. Intensive training in the examination protocol and criteria, guided by gold-standard examiner(s). 2. Calibration exercise for key measures. 3 Identification of problems, clarification with respective examiners. 4. Final training session and meeting with interviewers .before each wave of examinations (refresh knowledge, highlight key problematic areas) 5. Repeat examinations by examiner (single examiner) or by gold standard examiner (group of examiners).

TRAINING OF INTERVIEWERS Familiarise with the procedure and appropriate order of clinical examination (gold-standard examiner). 2. Training in the administration of the questionnaire (explanation, instructions on the format and the administration of questions, practical exercises). 3. Final meeting with examiners before each wave of fieldwork (meet examiners, highlight key points, discuss issues raised during fieldwork in previous waves) Re- training for interviewers that have not participated in the survey for a predefined period (e.g. 1 month).

ASSESSMENT OF REPRODUCIBILITY: METHODS 1. Use of master sheets. 2. Calculation of mean indices by examiner and the size and direction of deviation from gold-standard examiner. 3. Calculation of group means and 95% confidence limits. 4. Assessment of percentage of agreement between examiner and gold-standard examiner. 5. Sensitivity and Specificity estimations. 6. Dice’s concordance index. T. Kappa and weighted Kappa statistic.

DEVIATION FROM GOLD STANDARD EXAMINER 1. Establishment of an arbitrary cut-off point for acceptable deviation from the gold-standard examiner (e.g±.5 dmft/DMFT). Calculation of mean dmft/DMFT for the gold-standard examiner . 4. Estimation of the size and direction of deviation from the gold- standard examiner for each examiner, comparison with the chosen level of acceptance.

GROUP MEAN AND 95% CONFIDENCE LIMITS The basic concept is to identify the outliers, if any, whose mean scores fall outside the 95% confidence interval of the mean score for all examiners. The calculation of the group mean score excludes the gold-standard examiner. The value of t varies according to the number of examiners . The general formula for the 95% confidence limits is: group mean ± t (0.05, df=n-1) x sd

PERCENTAGE OF AGREEMENT Estimated as the exact number of agreements expressed as a percentage of the total. Very simple Takes no account of where in the table the agreement was Some agreement expected even by chance. Lack of accuracy when the prevalence of disease or condition is rather low.

SENSITIVITY AND SPECIFICITY Sensitivity refers to the ability to correctly identify the true positive cases. It is the proportion of true positive cases which are tested positive. Specificity refers to the ability to detect the true negative cases. It is the proportion of true negative cases which are tested negatives. Sensitivity=TP I (TP+FN) , Specificity=TN I (TN+FP) Affected by disease experience and treatment provision (e.g caries experience and proportion restored).

DICE’S CONCORDANCE INDEX Appropriate when only one outcome is the object of interest (e.g.decayed teeth) Quick and easy Does NOT use all available data D=2a / (2a+b+c) Examiner Examiner B - + B A d c

(KAPPA (K) STATISTIC Kappa (Cohen, 1960) is a measure of agreement that can be calculated between a pair of examiners (examiner and gold-standard examiner) that takes chance agreement into account. It reflects the chance corrected proportional agreement. It may involve a comparison on a surface or on a tooth level, or even on aggregate indices (e.g. DMF). It may also Include all possible codes for a condition, as well as different groupings of data (flexibility in application).

KAPPA CALCULATION Eexaminer 1 K=(P0-Pe) P0=(a+d)/n Pe([a+c)×(a+b)+ (b+d) ×(c+d)]/n2 Po reflects the proportion of observed agreement and pe the proportion of agreement that could be expected by chance Eexaminer 1 Examiner 2 Total Caries Sound a+b b a c+d d c n b+d a+c

Kappa does NOT take into account the degree of disagreement Kappa does NOT take into account the degree of disagreement. In ordinal variables, it is preferable to use the weighted Kappa, which provides weights to disagreements according to the magnitude of discrepancy (the closer to the diagonal, the better). Kappa and weighted Kappa represent the best approach to measuring variability - "statistics cannot provide a simple substitute to clinical judgement" (Altman , 1991).

KAPPA INTERPRETATION Strength of agreement Value of K Poor <0.20 Fair 0.21-0.40 Moderate 0.41-0.60 Good 0.61-0.80 Very good 0.81-1.00 Landis and koch (1977)

CORRELATION Correlation is an expression of how much two variables vary together; it does not reflect their proximity to 1: 1 correspondence Correlation is a measure of the strength of the association between two variables, not of their agreement. Consequently: Correlation should be avoided for the analysis of calibration exercise.

TRAINING AND CALIBRATION KEY POINTS Use the minimum number of examiners in surveys, Training and calibration exercise at baseline and repeated at later stages, Follow standardised procedures and agreed criteria, Include sufficient number of cases in calibration, so as to cover a wide range of diagnostic decisions. Determine key clinical variables and appropriate data Grouping, to be included in the calibration exercise. Calculate and interpret Kappa scores. Re- calibrate exclude outliers. Plan repeat examinations during the survey.