Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional.

Slides:



Advertisements
Similar presentations
Differentiated Instruction (DI) Meets Understand by Design (UbD) UB EDUC- 503 October 15, 2012.
Advertisements

Action Research Not traditional educational research often research tests theory not practical Teacher research in classrooms and/or schools/districts.
Performance Management I
Understanding By Design: Integration of CTE and Core Content Curriculum Michael S. Gullett.
Collecting data Chapter 5
VALUE – ADDED 101 Ken Bernacki and Denise Brewster.
Dissertation Writing.
Clinical Assessment (I) : The Assessment Interview
Susan Malone Mercer University.  “The unit has taken effective steps to eliminate bias in assessments and is working to establish the fairness, accuracy,
TWS Aid for Scorers Information on the Background of TWS.
Chapter 7: Performance Management Learning Objectives Understand the concept of performance management. Understand how performance appraisal contributes.
Performance Appraisal
MSc Applied Psychology PYM403 Research Methods Validity and Reliability in Research.
Uses of Language Tests.
Business research methods: data sources
CHAPTER 8 MANAGING EMPLOYEES’ PERFORMANCE
Understanding Validity for Teachers
+ Teaching psychological research methods through a pragmatic and programmatic approach. Patrick Rosenkranz, Amy Fielden, Efstathia Tzemou.
Science Inquiry Minds-on Hands-on.
NANDA International Investigating the Diagnostic Language of Nursing Practice.
USING THE METHODOLOGY FOR EXTERNAL LEGAL EDUCATION QUALITY ASSESSMENT Training on behalf of USAID FAIR Justice project – 27 th and 28 th May 2015.
Title I Needs Assessment/ Program Evaluation Title I Technical Assistance & Networking Session October 5, 2010.
Factors affecting contractors’ risk attitudes in construction projects: Case study from China 박병권.
Enhancing assessment capacity For teachers of Authority and Authority-registered subjects.
Scales and Indices While trying to capture the complexity of a phenomenon We try to seek multiple indicators, regardless of the methodology we use: Qualitative.
Servant Leadership Paper The student will concentrate on their individual workplace or business as the focus of a 5-7 page research paper discussing Servant.
DEVELOPING ALGEBRA-READY STUDENTS FOR MIDDLE SCHOOL: EXPLORING THE IMPACT OF EARLY ALGEBRA PRINCIPAL INVESTIGATORS:Maria L. Blanton, University of Massachusetts.
Student Engagement Survey Results and Analysis June 2011.
Program Evaluation. Program evaluation Methodological techniques of the social sciences social policy public welfare administration.
© New Zealand Ministry of Education copying restricted to use by New Zealand education sector. Page 1 Consider the Evidence Evidence-driven.
Literature Review Evaluating Existing Research
RESEARCH IN MATH EDUCATION-3
Classroom Assessments Checklists, Rating Scales, and Rubrics
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Course on Data Analysis and Interpretation P Presented by B. Unmar Sponsored by GGSU PART 2 Date: 5 July
Measuring Complex Achievement
8-1 McGraw-Hill/IrwinCopyright © 2011 by The McGraw-Hill Companies, Inc. All Rights Reserved. fundamentals of Human Resource Management 4 th edition by.
Evaluation of Level of Service at Airport Passenger Terminals: Individual Components and Overall Perspectives Anderson Correia Department of Civil Engineering.
Experimental Research Methods in Language Learning Chapter 1 Introduction and Overview.
Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
National Commission for Academic Accreditation & Assessment Developmental Reviews at King Saud University and King Faisal University.
Copyright  2004 McGraw-Hill Pty Ltd. PPTs t/a Marketing Research by Lukas, Hair, Bush and Ortinau 2-1 The Marketing Research Process Chapter Two.
1: Overview and Field Research in Classrooms ETL329: ENTREPRENEURIAL PROFESSIONAL.
Performance Assessment OSI Workshop June 25 – 27, 2003 Yerevan, Armenia Ara Tekian, PhD, MHPE University of Illinois at Chicago.
© New Zealand Ministry of Education copying restricted to use by New Zealand education sector. Page 1 Consider the Evidence Evidence-driven.
PPA 502 – Program Evaluation Lecture 2c – Process Evaluation.
CHAPTER 1 Understanding RESEARCH
For ABA Importance of Individual Subjects Enables applied behavior analysts to discover and refine effective interventions for socially significant behaviors.
SURVEY RESEARCH.  Purposes and general principles Survey research as a general approach for collecting descriptive data Surveys as data collection methods.
RE - SEARCH ---- CAREFUL SEARCH OR ENQUIRY INTO SUBJECT TO DISCOVER FACTS OR INVESTIGATE.
Unpacking the Elements of Scientific Reasoning Keisha Varma, Patricia Ross, Frances Lawrenz, Gill Roehrig, Douglas Huffman, Leah McGuire, Ying-Chih Chen,
META-ANALYSIS, RESEARCH SYNTHESES AND SYSTEMATIC REVIEWS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
1 TESL Evaluating CALL Packages:Curriculum/Pedagogical/Lingui stics Dr. Henry Tao GUO Office: B 418.
The Development and Validation of the Evaluation Involvement Scale for Use in Multi-site Evaluations Stacie A. ToalUniversity of Minnesota Why Validate.
Scales and Indices While trying to capture the complexity of a phenomenon We try to seek multiple indicators, regardless of the methodology we use: Qualitative.
Performance Appraisals
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Environmental Systems and Society Internal Assessment.
Ensuring Consistency in Assessment of Continuing Care Needs: An Application of Differential Item Functioning Analysis R. Prosser, M. Gelin, D. Papineau,
What Is Science?. 1. Science is limited to studying only the natural world. 2. The natural world are those phenomena that can be investigated, discovered,
Project VIABLE - Direct Behavior Rating: Evaluating Behaviors with Positive and Negative Definitions Rose Jaffery 1, Albee T. Ongusco 3, Amy M. Briesch.
Overview of Types of Measures Margaret Kasimatis, PhD VP for Academic Planning & Effectiveness.
Stages of Research and Development
Classroom Assessments Checklists, Rating Scales, and Rubrics
Internal assessment criteria
Classroom Assessments Checklists, Rating Scales, and Rubrics
Week 3 Class Discussion.
Consider the Evidence Evidence-driven decision making
Presentation transcript:

Scale Construction and Halo Effect in Secondary Student Ratings of Teacher Performance Ph.D. Dissertation Defense Eric Paul Rogers Department of Instructional Psychology and Technology Brigham Young University 30 June 2005

Background Teacher evaluation has been a frustrating process for teachers and administrators. Scholars have identified a variety of inadequacies in the practice of teacher evaluation. The use of rating scales in teacher evaluation has been the target of particularly severe criticism. Employing students as raters has been one of the most controversial aspects of rating scale use. Much of the criticism is justified given the poor design and implementation of rating scales in teacher evaluation.

Background It is proposed that these criticisms can be effectively mitigated by the careful design, implementation, and interpretation of student ratings of teacher performance. Efforts to address these criticisms are explored in this study with student ratings of teachers in religious education settings sponsored by the Church Educational System of the Church of Jesus Christ of Latter-day Saints. In addition, among the various threats to the validity of decisions based on ratings, halo effect is considered ubiquitous. This study employs various approaches to diagnosing halo effect and seeks to discover whether males and females exhibit differing degrees of halo in their ratings of teachers, a question not previously addressed in the research literature.

Overview 1.What are the key areas of teacher performance valued by CES administrators, teachers, and students? 2.In what ways do students conceptualize these areas of valued teacher performance? 3.To what degree do the items derived from student conceptualizations function to produce reliable ratings from which valid conclusions may be drawn about teacher performance? 4.In what ways should items and scales be revised to improve reliability and validity? 5.To what extent do male and female seminary students exhibit differing degrees of halo effect in their ratings of teachers? This study employs a combination of qualitative and quantitative research techniques to answer five specific research questions:

ResultsResearch Question 1 What are the key areas of teacher performance valued by CES administrators, teachers, and students? Teaches students the Gospel of Jesus Christ Teaches by the Spirit Teaches by example Establishes and maintains an appropriate setting Helps students accept responsibility for gospel learning Effectively decides what to teach Effectively decides how to teach Effectively uses scripture study skills Effectively uses teaching skills Relates well with students Prepares young people for effective church service Has high expectations for students An effective CES teacher:

In what ways do students conceptualize these areas of valued teacher performance? Examples of responses elicited during student focus group interviews: Teaches students the Gospel of Jesus Christ They teach from the scriptures. They teach less opinion and more doctrine. They avoid expressing personal opinions. They recognize their own opinions. They teach what the prophets teach. ResultsResearch Question 2

To what degree do the items derived from student conceptualizations function to produce reliable ratings from which valid conclusions may be drawn about teacher performance? ResultsResearch Question 3 Student-Teacher Rapport Scale (STRS) Scripture Mastery Expectation Scale (SMES) Spiritual Learning Environment Scale (SLES) Although twelve scales were originally developed, only three scales are defensible based on established psychometric standards:

In what ways might items and scales be revised to improve reliability and validity? ResultsResearch Question 4 Semantic changes to improve item performance Unidimensionality Improved factor loadings Reduced/eliminated secondary factor loadings Improved fit statistics Local item independence Reduced/eliminated error correlations Response category changes Better alignment of item difficulties with person measures Better tap the upper end of the scales

To what extent do male and female seminary students exhibit differing degrees of halo effect in their ratings of teachers? ResultsResearch Question 5 Traditional approaches to halo diagnosis suggest that males are more likely to exhibit halo than females. The results of the Rasch model approaches to halo diagnosis are mixed, but also suggest that males are more likely to exhibit halo than females.

Study Limitations Representativeness Generalizability Limited variability versus halo effect

Instructional Design Conclusions about the effectiveness of instruction, whatever the setting or the instructional design model, are based upon evidence that objectives have been achieved. Despite the criticism of some scholars, when carefully designed, developed, and implemented, rating scales provide a basis on which to make valid judgments about instruction and the design models upon which the instruction is based. Threats to the validity of conclusions about instructional interventions abound. Instructional designers should be aware of these threats and take appropriate steps to diagnose and mitigate them as they assess. This study notes the requirements of fundamental measurement when applying statistical analyses to assessment data and highlights the influence of halo effect upon ratings.

Study Contributions This study provides a developmental framework for scale construction that integrates Classical Test Theory, Item Response Theory, and factor analytic techniques in a way that leads to defensibly reliable data from which valid conclusions may be drawn. It also establishes a firm basis for three scales that measure traits of importance to CES that meet widely acceptable psychometric standards. Finally, this study provides evidence that males exhibit halo to a greater degree than do females among secondary students on the traits examined. Although not generalizable to other traits or other instructional settings, it raises a caution about drawing conclusions about teachers from ratings produced by differing gender distributions.

Future Research Do the twelve scales developed in this study function as desired with more mature raters (e.g., adult students, peer teachers, supervisors, trainers)? How does improved scale function impact the diagnosis of halo effect and the apparent gender-based differences revealed in this study? How can researchers meaningfully differentiate between restricted variability and halo effect?