Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation.

Slides:



Advertisements
Similar presentations
Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional.
Advertisements

Cal State Northridge Psy 427 Andrew Ainsworth PhD
Part II Sigma Freud & Descriptive Statistics
Part II Sigma Freud & Descriptive Statistics
Establishing Reliability and Validity: An Ongoing Process Rebecca S. Lindell 1 and Lin Ding 2 1 Purdue University 2 The Ohio State University Establishing.
Research in Psychology Chapter Two
Standards for Qualitative Research in Education
Chapter 4 Validity.
Developing and validating a stress appraisal measure for minority adolescents Journal of Adolescence 28 (2005) 547–557 Impact Factor: A.A. Rowley.
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Choosing Your Primary Research Method What do you need to find out that your literature did not provide?
© 2013 Cengage Learning. Outline  Types of Cross-Cultural Research  Method validation studies  Indigenous cultural studies  Cross-cultural comparisons.
Instrumentation.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
CRITICAL APPRAISAL OF SCIENTIFIC LITERATURE
Evaluating a Research Report
Chapter 2 Doing Social Psychology Research. Why Should You Learn About Research Methods?  It can improve your reasoning about real-life events  This.
Research Seminars in IT in Education (MIT6003) Research Methodology I Dr Jacky Pow.
Introduction to Validity
Assessing the Quality of Research
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.
Reliability Analysis Based on the results of the PAF, a reliability analysis was run on the 16 items retained in the Task Value subscale. The Cronbach’s.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 15 Developing and Testing Self-Report Scales.
The Development and Validation of the Evaluation Involvement Scale for Use in Multi-site Evaluations Stacie A. ToalUniversity of Minnesota Why Validate.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Chapter 6 - Standardized Measurement and Assessment
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Overview of Types of Measures Margaret Kasimatis, PhD VP for Academic Planning & Effectiveness.
Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
HCS 465 OUTLET Experience Tradition /hcs465outlet.com FOR MORE CLASSES VISIT
Quality Assurance processes
Selecting the Best Measure for Your Study
Chapter 4 Research Methods in Clinical Psychology
Measurement: Part 2.
Lecture 5 Validity and Reliability
Leacock, Warrican and Rose (2009)
Quantitative research design Guide
Study of Religion.
QUESTIONNAIRE DESIGN AND VALIDATION
Data Collection Methods for Problem Statement
Concept of Test Validity
Assessment Theory and Models Part II
Qualitative Data Collection: Accuracy, Credibility, Dependability
Test Design & Construction
Test Validity.
Evaluation of measuring tools: validity
Measurement: Part 2.
Katie Barghaus, PhDa Kathy Buek, MPHa John Fantuzzo, PhDa
Validity and Reliability
Journalism 614: Reliability and Validity
Classroom Assessment Validity And Bias in Assessment.
Week 3 Class Discussion.
پرسشنامه کارگاه.
Reliability and Validity of Measurement
Dr. Chin and Dr. Nettelhorst Winter 2018
Research in Psychology
Reliability and Validity
Learning online: Motivated to Self-Regulate?
Measurement: Part 2.
Assessment Literacy: Test Purpose and Use
Chapter 4 Summary.
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Investigations into Comparability for the PARCC Assessments
Presentation transcript:

Applied Psychometric Strategies Lab Applied Quantitative and Psychometric Series Abbey Love, MS, & Dani Rosenkrantz, MS, EdS Guiding Steps for the Evaluation or Creation of a Scale: A Starter Kit March 20, 2018 DANI Current abstract: This presentation will be an application-focused talk on 1.) How to determine if an existing scale should be used, and 2.) How to develop a scale when wanting to measure a psychological construct. The first half of the discussion will advise on where to look for measures and what a reliable and valid scale looks like. The second part of the presentation will focus on the steps needed to develop a measure, should you find this is necessary. We will discuss the measurement issues that arise from poor tools and suggest best practices in scale development.

What is a “bad” scale? I saw in someone’s dissertation a newly published scale… It depends! dani

But, will bad measurement really hurt my study? DANI

Yes! That’s why we are here.  Big Picture: Poor measurement is an ethical concern because If the measurement is problematic, the reliability of our findings is compromised …in other words… Our degree of trust in our results is in question, which means our statistical conclusion validity is in question!

Sample specific measurement challenges that may occur Low reliability or large measurement error around scores Uncertainty about using a total score for each person or subscale scores for each person Not all response options are being used or low item response variability Poor factor structure solutions due to cross-loading issues on multiple factors, low loadings on factors, and/or influences such as item phrasing dani

Examples From Applied Work Cognitive Flexibility Inventory for Dissertation Only original paper explored factor structure Cross loading issues on original Poor recovery of factor structure in my sample Had to reduce items for better fit, a controversial decision Internal structure assessment of Objectified Body Consciousness Scale Poor fit with trans women, indicating inappropriate to use without further study DANI

Why does good measurement matter? DANI

If you use a well established measure you will likely find the following High reliability which will result in low measurement error accurate effect size estimates (d, R2) better captures effects of interest (Beta, path coefficients) improves inferential techniques (more accurate SEs and ultimately statistical decisions) Confidence in how to score the scale (total and/or subscales) All response options are being used Strong recovery of factor structure solution with near zero cross-loadings across factors, high loadings on intended factors, and/or minimal influence due to method factors

1.) Evaluation of psychological scales - Should I use a scale I found? Looking for a Scale? Guiding Steps for the Evaluation or Creation of a Scale 1.) Evaluation of psychological scales - Should I use a scale I found? 2.) Scale development – I can’t find a scale. What steps do I need to develop a scale to measure a psychological construct? DANI We will be giving you a global picture of both – not talking specifics

How do I know if my scale is “good?” DANI

Good scales… have ongoing and multiple sources of evidence that can be used to evaluate the validity of the interpretation of the scale for a particular use. ABBEY

Sources of Validity in Instrument Development Evidence based on... Test content Am I measuring what I planned to measure? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

Sources of Validity in Instrument Development Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

Sources of Validity in Instrument Development Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

Sources of Validity in Instrument Development Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. Internal structure What is the degree to which the items on my scale are conforming to the construct and how I intend to interpret the scale? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

Sources of Validity in Instrument Development Evidence based on... Test content Am I measuring what I planned to measure? Reponses processes Are my participants understanding the items on my scale in an expected way? Relations to other variables Do the items I have chosen to represent my construct relate to other variables in an expected way? This can include convergent and discriminant evidence. Internal structure What is the degree to which the items on my scale are conforming to the construct and how I intend to interpret the scale? ABBEY As recommended by AERA, APA, and NCME (2014), we followed multiple steps to establish evidence for the validity of the newly-created scale. Each source of evidence described below was used to increase the degree to which these student-specific teaching self-efficacy scores can be accepted as reliable and valid.

Sources of Validity in Instrument Development Evidence based on... Test content Literature review, content specification, expert judges Reponses processes Cognitive interviews Relations to other variables Analysis of the relationship of the scale scores to variables external to the scale (correlational evidence) Internal structure Factor analysis, measurement invariance ABBEY **See Standards for Educational and Psychological Testing

Determining If You Should Use An Instrument Length and Content Does the scale represent the breadth of the construct? Reliability Is the score reliability from your scale reasonable, using similar samples?* *Depends on the seriousness/specificity of the reliability issue… DANI

Determining If You Should Use An Instrument Previous Samples Has the scale been used with samples similar to your sample of interest? Intended Performance Has the scale previously performed as intended based on review of past psychometric analyses? EFA, CFA, correlational, SEM DANI

Determining If You Should Use An Instrument Scoring How has the scale been scored in the past? Was sufficient testing done to evaluate appropriateness of using a total score, if needed? DANI 10 studies, 4 factors, crossloadings, have they been well done or just trying to use efa cfa CONSISTENCY ACROSS PAPERS IS NOT ENOUGH - if I see consistency, that is not enough Papers that use a total score are not enough

Is one EFA okay? When is it enough? Consider whether there is psychometric evidence for your specific sample abbey Some will argue against this slide

Where can I find good scales? ABBEY

Where To Find Instruments: Literature Review the literature on your construct and scales that measure your construct, paying close attention to: Definitions about the construct Reliability Factor Structure Subscales vs. total scores Exploratory Factor Analysis Confirmatory Factor Analysis Validation sample Measurement invariance ABBEY

Where To Find Instruments: Reviews Mental Measurements Yearbook (MMY) A tool to locate information about commercial tests and measures Issues from 1938-2017 Provides factual information on published tests Critical test reviews written by: Professionals and psychometricians in education, psychology, speech/language/hearing, law, health care, and other related fields ABBEY The MMY includes timely, consumer-oriented test reviews, providing evaluative information to promote and encourage informed test selection. Latest version was july 2017

Developing an Instrument If Needed ABBEY

Developing an Instrument If Needed ABBEY

Developing an Instrument If Needed ABBEY

Developing an Instrument If Needed ABBEY

Developing an Instrument If Needed ABBEY

ABBEY

Best Practices In Instrument Development Recognize Instrument Development as an ongoing process, not a one time event ABBEY

Helpful References for Scale Construction American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association. DeVellis, R. F. (2012). Scale development: Theory and applications. Los Angeles, CA: Sage. Kline, P. (1986). Making tests reliable II: Personality inventories. In P. Kline (Ed), A Handbook of Test Construction: Introduction to psychometric design (pp. 59-76). London, United Kingdom: Methuen. Thorndike, R. M., & Thorndike-Christ, T. (2010). Measurement and evaluation in psychology and education. Boston, MA: Prentice Hall. Willis, G. B. & Artino, A. R. (2013). What do our respondents think we’re asking? Using cognitive interviewing to improve medical education surveys. Journal of Graduate Medical Education, 5, 353-356. doi:10.4300/JGME-D-13-00154.1 ABBEY

What things did we want to get into, but did not have time to do so? Best practices in CFA and EFA Bifactor Analyses SEM Using IRT Measurement Invariance or DIF Cognitive diagnostic models Multilevel SEM and IRT DANI

What did I learn? Think twice before using a scale to measure a construct of interest Good measurement matters Instrument development is an ongoing process Consider gathering psychometric evidence to support the intended use of the scale within your study DANI