Accuracy, Reliability, and Validity of Freesurfer Measurements David H. Salat

Slides:

Advertisements

Similar presentations

Chapter 2 The Process of Experimentation

Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

Animal, Plant & Soil Science

The Research Consumer Evaluates Measurement Reliability and Validity

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

Reliability and Validity

VALIDITY AND RELIABILITY

Chapter 4 Validity.

Concept of Measurement

Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.

Psych 231: Research Methods in Psychology

Personality, 9e Jerry M. Burger

How Science Works Glossary AS Level. Accuracy An accurate measurement is one which is close to the true value.

Sampling & Experimental Control Psych 231: Research Methods in Psychology.

Validity, Reliability, & Sampling

Research Methods in MIS

Chapter 7 Correlational Research Gay, Mills, and Airasian

Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Fig Theory construction. A good theory will generate a host of testable hypotheses. In a typical study, only one or a few of these hypotheses can.

Quality Assessment 2 Quality Control.

Instrumentation.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

The Psychology of the Person Chapter 2 Research Naomi Wagner, Ph.D Lecture Outlines Based on Burger, 8 th edition.

Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.

Chapter 1: Research Methods

WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.

VALIDITY AND VALIDATION: AN INTRODUCTION Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.

Experiment Basics: Variables Psych 231: Research Methods in Psychology.

Calibration Involves fixing known points and constructing a scale between these fixed points. Causal Link A change in one variable that results from, or.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

The Theory of Sampling and Measurement. Sampling First step in implementing any research design is to create a sample. First step in implementing any.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

CHAPTER 2 Research Methods in Industrial/Organizational Psychology

Cluster validation Integration ICES Bioinformatics.

Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.

Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.

1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.

Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.

Chapter 6 - Standardized Measurement and Assessment

UCI Progress MBIRN AHM. Progress Tool Development –FIPS and HID: Modifications to store derived data, including morphometric measures from Freesurfer.

Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.

Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.

Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.

Research Methods & Design Outline

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.

CRITICALLY APPRAISING EVIDENCE Lisa Broughton, PhD, RN, CCRN.

Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.

Group Averaging of fMRI Data

Concept of Test Validity

Classical Test Theory Margaret Wu.

Validity and Reliability

CHAPTER 2 Research Methods in Industrial/Organizational Psychology

Reliability & Validity

پرسشنامه کارگاه.

Samples or groups for comparison

Experiment Basics: Variables

Experimental Design: The Basic Building Blocks

Intermediate methods in observational epidemiology 2008

Research Methods.

Biological Science Applications in Agriculture

AS Psychology Research Methods

Validity This refers to the extent to which the test measures what it claims to measure If an IQ test actually measures intelligence = valid However,

Validity This refers to the extent to which the test measures what it claims to measure If an IQ test actually measures intelligence = valid However,

Presentation transcript:

Accuracy, Reliability, and Validity of Freesurfer Measurements David H. Salat

Why Talk About This? This is not meant to imply that everything is perfect in FreeSurfer processing; it is a sample of the types of procedures that we and others have used to provide information about what works and what doesn’t, and to enhance confidence in our results. The information here should be used as a guide for how to assess the data in your own projects.

What is Accuracy? Accuracy: the degree of closeness of a measured or calculated quantity to its actual (true) value (e.g. a physical property such as length or thickness) MRI measures are indirect. We may be able to measure morphometry accurately given the contrast of the MR image, however, this contrast may differ from measurements from the actual tissue.

What is Reliability? Measures obtained for the same individual on two different days, close together in time to avoid a biological influence on the reliability measure – Reliability of a labeling procedure in the same scan – Reliability of the labeling procedure on two different scans – Reliability of the labeling procedure on two different scans collected on two different scanners The reliability of an overall effect can be assessed by replication of the experiment in an independent sample. This is a general theory, that applies to all types of data, structural, functional, cognitive, etc.

What is Validity? Validity: the extent to which an indirect measurement is representative of what it is supposed to measure. For example, in fMRI we use blood flow as an indirect measure of neural activity. Is this a valid measure of neural activity?

Validity Examples Internal validity: What is the strength of the overall experimental design, study sample size, analysis procedures, etc.? External validity: Would the effect measured generalize to another sample? (replication) Ecological validity: Can the results be applied in the real world outside of the experimental setting? (clinical application) Construct validity: Does the totality of evidence support the validity of a single measure? (do the data fit with what is known?) Face validity: Does the measure seem to be a good measure? Convergent validity: How well does the measure correlate with other types of measures that it should theoretically be correlated with? (do the data correlate with ‘gold standards’) Discriminant validity: Is the measure not correlated with measures it should not be correlated with? (ICV/age)

One does not necessarily ensure the other A measure that is perfectly reliable (e.g. you get the same exact measure every time), but not accurate, or valid. We can measure morphometry very precisely, but the validity of this measure depends on the quality of the input data. If an experiment is not reliable, then it is likely inaccurate and invalid.

Types of Error Random Error: Unknown and unpredictable changes in the measurement – Should be unbiased – Accuracy, reliability, and validity all limited by error Systematic error: Predictable offset or scaling of data – Typically comes from some aspect of the data acquisition/analysis – Can be identified and corrected by analyzing standards that closely match the real sample (e.g. do you get the same values at 1.5T as at 3T?)

How does poor reliability and validity affect your studies? Poor reliability increases variance across individuals and across timepoints. Validity is directly tied to interpretation. You may have a valid measure of ‘cortical thickness’, but ‘cortical thickness’ might not be a valid measure of degeneration – E.g. normal variation, hydration Many studies would benefit from the ability to measure minute changes across time.

Accuracy and Validity of Spherical Averaging for Labeling Structural and Functional Anatomy Fischl et al., 1999

Anatomical Labeling Fischl et al., 1999

Functional Labeling Fischl et al., 1999

Enhanced Statistical Power Fischl et al., 1999

Face Validity: Results fall within Expected Range Consistent with published findings: – crowns of gyri are thicker than the fundi of sulci – sensory areas are among the thinnest in the cortex. Fischl et al., 1999

Validate against manual measurements of imaging data from another study Fischl et al., 1999

Automated measures are similar in size and region to manual measures, and predict who will develop AD Fischl et al., 2002

Comparison with Postmortem Measures Rosas et al., 2002

Manual Measurements Can only be done in regions where folds are appropriate Calcarine also consistent across studies Orbitofrontal Calcarine Kuperberg et al., 2003 Salat et al., 2004

Compared to Manually Labeled Data 1 volume and 2 surface based labeling schemes Percent of subjects labeled correctly at each location across the surface. Fischl et al., 2004Desikan et al., 2006 Volume AtlasSurface AtlasSurface Atlas 2

Replication of Result: Split Sample Concordant results are likely not due to statistical error Current study with 5 samples used in prior literature Salat et al., 2004

Cross Sequence Parameters Fischl et al., 2004

Comparison across time, scanner, field strength, number of scans, sequence type, scanner upgrade, and scanner manufacturer Han et al., 2006

Effects of Pulse Sequence, Voxel Geometry, and Parallel Imaging Wonderlick et al., 2008

Replication of Effects in Same Participants Across Scanning Conditions Dickerson et al., 2008

WMPARC: same subjects scanned at different times (test-retest) Salat et al., 2008

Replicable results across sex and hemisphere Men Women Salat et al., 2008

Consistent Findings Across 5 samples Used To Identify Regions with Predictive Validity Regional measures predict who wll progress to AD. Dickerson et al., 2008

Conclusions Any tool used for MR analysis should be rigorously tested for accuracy, reliability, and validity Most of the measures from Freesurfer have good accuracy, reliability, and validity across a range of conditions These results are dependent on optimal input data and correct implementation These data provide confidence, but do not substitute for using similar procedures to check data from each new study