Exploring the Equivalence and Rater Bias in AC Ratings Prof Gert Roodt – Department of Industrial Psychology and People Management, University of Johannesburg.

Slides:

Advertisements

Similar presentations

Standardized Scales.

Advertisements

Reading Procedures: MODELLED READING

Appraising and Managing Performance (c) 2007 by Prentice Hall7-1 Chapter 7.

Cross Cultural Research

Topics: Quality of Measurements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Team Training Dr. Steve Training & Development INP6325 * Adapted from Salas & Canon-Bowers.

Culture and psychological knowledge: A Recap

Azra Rafique Khalid Mahmood. Introduction “To learn each and everything in a limited time frame of degree course is not possible for students”. (Mahmood,

Chapter 15 Conducting & Reading Research Baumgartner et al Chapter 15 Measurement Issues in Research.

Implementing a Performance Management System: Overview

McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Two THE DESIGN OF RESEARCH.

Impact and outcome evaluation involve measuring the effects of an intervention, investigating the direction and degree of change Impact evaluation assesses.

Chapter 5 Instrument Selection, Administration, Scoring, and Communicating Results.

Validity, Reliability, & Sampling

Lingnan-BEL Workshop Session 6: Designing and Implementing Cross-Cultural Research Dr. Andre Pekerti and Professor Victor Callan.

Using the T-9 Net This resource describes how schools use the T-9 Net to monitor the literacy and numeracy skills of students in Transition, Year 1 and.

Division Of Tagum.

Celeste M. Schwartz, Ph.D. Montgomery County Community College Blue Bell, Pennsylvania

Chapter 13: Inference in Regression

Assessment Center Essentials Kevin R. Murphy Department of Psychology Pennsylvania State University, USA.

Exploring perceptions of assessment centres in light of organisational justice, and how these perceptions are related to perceived organisational ethical.

Health promotion and health education programs. Assumptions of Health Promotion Relationship between Health education& Promotion Definition of Program.

Data and Data Collection Questionnaire

Statistics for Education Research Lecture 10 Reliability & Validity Instructor: Dr. Tung-hsien He

Measurement in Exercise and Sport Psychology Research EPHE 348.

Overall Teacher Judgements

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

An Analysis of The Perceived Competencies of Sports Managers in Taiwan Ling-Mei Ko Professor Ian Henry Centre of Olympic Studies & Research.

ABSTRACT METHODS RESULTS CONCLUSION Background: Georgia rates the quality of early childcare learning centers using a tiered quality improvement system.

Measuring Complex Achievement

HOW TO WRITE RESEARCH PROPOSAL BY DR. NIK MAHERAN NIK MUHAMMAD.

Introduction to Development Centres Sandra Schlebusch The Consultants.

IMPLEMENTATION QUALITY RESEARCH OF PREVENTION PROGRAMS IN CROATIA MIRANDA NOVAK University of Zagreb, Faculty of Education and Rehabilitation Sciences.

ETHICS AND LAW FOR SCHOOL PSYCHOLOGISTS Chapters 6 and 8.

Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.

Academic Research Academic Research Dr Kishor Bhanushali M

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

Professional Seminar Bucharest 6-10 October 2008 “Assessment: From Programmes to Profiles, Exchanging Best Practices”

Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.

Examining Rubric Design and Inter-rater Reliability: a Fun Grading Project Presented at the Third Annual Association for the Assessment of Learning in.

Selection, Administration, Scoring, and Communicating Assessment Results Chapter 5.

Alternative Assessment Chapter 8 David Goh. Factors Increasing Awareness and Development of Alternative Assessment Educational reform movement Goals 2000,

An Assessment of the Readiness of a Tertiary Healthcare Organization in Saudi Arabia, in Adopting Effective Online Staff Development Programs Adnan D.

Assessing Information Literacy with SAILS Juliet Rumble Reference & Instruction Librarian Auburn University.

Assessment Ice breaker. Ice breaker. My most favorite part of the course was …. My most favorite part of the course was …. Introduction Introduction How.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

ANOVA By Heather Nydam-Fragapane ANOVA Anova is an acronym for  Analysis of Variance.

RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.

Chapter 6 - Standardized Measurement and Assessment

Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.

1 Dr. Ali Mistarihi Employee Training & Development.

Creating Positive Culture through Leadership (Recovery Orientation) Jennifer Black.

Assessing College Students’ Desire to Enhance Global Learning Competencies Rosalind R. King, Ph.D. ABSTRACT Literature indicates the urgency to enhance.

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞（南京工业大学，外国语言文学学院，江苏南京211816） Introduction Vocabulary acquisition is of great.

Applying thinking skills to EFL classrooms By Mei-Hui Chen Newcastle University

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Reading Procedures: MODELLED READING

Master Project: A Study of Levels of Autonomy of Students at Santirat Wittayalai School Advisor: Jutarat Vibulphol, ph.D. Presenter: Miss Shi Xiaowei.

One way ANOVA One way Analysis of Variance (ANOVA) is used to test the significance difference of mean of one dependent variable across more than two.

Scope: “See the Person”

Qualities of a good data gathering procedures

Presentation transcript:

Exploring the Equivalence and Rater Bias in AC Ratings Prof Gert Roodt – Department of Industrial Psychology and People Management, University of Johannesburg Sandra Schlebusch – The Consultants ACSG Conference 17 – 19 March 2010

Presentation Overview  Background and Objectives of the Study  Research Method  Results  Discussion and Conclusions  Recommendations

BackgroundBackground Construct Validity has long been a Problem in ACs (Jones & Born, 2008) Perhaps the Mental Models that the Raters use are Part of the Problem However, other Factors that Influence Reliability Should not be Neglected

Background Continued To Increase Reliability Focus On all aspects of the Design Model (Schlebusch & Roodt, 2007):  Analysis  Design  Implementation o Context o Participants: o Process Owners (Simulation Administrator; Raters; Role-players)

Background Continued  Analysis (International Guidelines, 2009) o Competencies / Dimensions oAlso Characteristics of Dimensions (Jones & Born, 2008) o Situations o Trends/Issues in Organisation o Technology

Background Continued  Design of Simulations o Fidelity o Elicit Behaviour o Pilot

Background Continued  Implementation o Context:  Purpose o Participants o Simulation Administration (Potosky, 2008)  Instructions  Resources  Test Room Conditions

Background Continued  Raters  Background  Characteristics  “What are Raters Thinking About When Making Ratings?” (Jones & Born, 2008)

Sources of Rater Bias  Rater Differences (background; experience, etc.)  Rater Predisposition (attitude; ability; knowledge; skills, etc.)  Mental Models

Objective of the Study The Focus of this Study is on Equivalence and Rater Bias in AC Ratings More specifically on:  Regional Differences  Age Differences  Tenure Differences  Rater Differences

Participants (Ratees) Region Research Method Region FrequencyPercent Valid Percent Cumulative Percent Valid Western Central Eastern Total

Participants (Ratees) Age Research Method (cont.) Age (Recode) FrequencyPercent Valid Percent Cumulative Percent Valid 30 years or less years years years or older Total MissingSystem Total

Participants (Ratees) Tenure Research Method (cont.) Years of Service (Recode) FrequencyPercent Valid Percent Cumulative Percent Valid 10 years or less years years years or more Total MissingSystem Total

Research Method (cont.) Measurement: In-Basket Test Measuring Six Dimensions: Initiative; Information Gathering; Judgement; Providing Direction; Empowerment; Management Control Overall In-Basket Rating

Research Method (cont.) Procedure: Ratings were Conducted by 3 Raters on 1057 Ratees Observer (Rater) FrequencyPercentValid Percent Cumulative Percent Valid Total

Initiative ResultsResults FrequencyPercentValid PercentCumulative Percent Valid ND R Total

Results (cont.) Initiative Reliability Statistics: Initiative Cronbach's Alpha N of Items.5564 Reliability Statistics: Initiative ObserverCronbach's AlphaN of Items

Results (cont.) Information Gathering FrequencyPercent Valid Percent Cumulative Percent Valid ND R Total

Results (cont.) Information Gathering Reliability Statistics: Information Gathering Cronbach's AlphaN of Items.4853 Reliability Statistics: Information Gathering ObserverCronbach's AlphaN of Items

Results (cont.) Judgement FrequencyPercentValid PercentCumulative Percent Valid ND R E Total

Results (cont.) Judgement Reliability Statistics: Judgement Cronbach's AlphaN of Items.8135 Reliability Statistics: Judgement ObserverCronbach's AlphaN of Items

Results (cont.) Providing Direction FrequencyPercent Valid Percent Cumulative Percent Valid ND R E HE Total

Results (cont.) Providing Direction Reliability Statistics: Providing direction Cronbach's AlphaN of Items.7455 Reliability Statistics: Providing direction ObserverCronbach's AlphaN of Items

Results (cont.) Empowerment FrequencyPercent Valid Percent Cumulative Percent Valid ND R E HE Total

Results (cont.) Empowerment Reliability Statistics: Empowerment Cronbach's AlphaN of Items.7493 Reliability Statistics: Empowerment ObserverCronbach's AlphaN of Items

Results (cont.) Control FrequencyPercent Valid Percent Cumulative Percent Valid ND R E HE Total MissingSystem1.1 Total

Results (cont.) Control Reliability Statistics: Control Cronbach's AlphaN of Items.7485 Reliability Statistics: Control ObserverCronbach's AlphaN of Items

Results (cont.) Overall In-Basket Rating Reliability Statistics: In-basket Cronbach's AlphaN of Items.7686 Reliability Statistics: In-basket ObserverCronbach's AlphaN of Items

Results (cont.) Regional Differences Robust Tests of Equality of Means Statistic(a)df1df2Sig. InitiativeBrown-Forsythe Info GatheringBrown-Forsythe JudgementBrown-Forsythe Providing Direction Brown-Forsythe EmpowermentBrown-Forsythe ControlBrown-Forsythe In-BasketBrown-Forsythe a Asymptotically F distributed.

Results (cont.) Age Differences Robust Tests of Equality of Means Statistic(a)df1df2Sig. InitiativeBrown-Forsythe Info GatheringBrown-Forsythe JudgementBrown-Forsythe Providing Direction Brown-Forsythe EmpowermentBrown-Forsythe ControlBrown-Forsythe In-BasketBrown-Forsythe a Asymptotically F distributed.

Results (cont.)- tenure Tenure differences ANOVA Sum of SquaresdfMean SquareFSig. Initiative Between Groups Within Groups Total Info Gathering Between Groups Within Groups Total Judgement Between Groups Within Groups Total Providing Direction Between Groups Within Groups Total Empowerment Between Groups Within Groups Total Control Between Groups Within Groups Total In-Basket Between Groups Within Groups Total

Results (cont.) Rater Differences Robust Tests of Equality of Means Statistic(a)df1df2Sig. InitiativeBrown-Forsythe Info GatheringBrown-Forsythe JudgementBrown-Forsythe Providing Direction Brown-Forsythe EmpowermentBrown-Forsythe ControlBrown-Forsythe In-BasketBrown-Forsythe a Asymptotically F distributed.

Results (cont.) Post Hoc Tests: Judgement Multiple Comparisons Dependent Variable: Judgement Dunnett T3 (I) Observer (J) Observer Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Lower Bound Upper Bound Lower Bound (*) (*) (*) (*) * The mean difference is significant at the.05 level.

Results (cont.)

Post Hoc Tests: Providing Direction Multiple Comparisons Dependent Variable: Providing Direction Dunnett T3 (I) Observer (J) Observer Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Lower Bound Upper Bound Lower Bound (*) (*) (*) (*) (*) (*) * The mean difference is significant at the.05 level.

Results (cont.)

Post Hoc Tests: Empowerment Multiple Comparisons Dependent Variable: Empowerment Dunnett T3 (I) Observer (J) Observer Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Lower Bound Upper Bound Lower Bound (*) (*) (*) (*) * The mean difference is significant at the.05 level.

Results (cont.)

Post Hoc Tests: Control Multiple Comparisons Dependent Variable: Control Dunnett T3 (I) Observer (J) Observer Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Boun d Lower Bound Upper Bound Lower Bound (*) (*) * The mean difference is significant at the.05 level.

Results (cont.)

Post Hoc Tests: In-Basket Multiple Comparisons Dependent Variable: In-Basket Dunnett T3 (I) Observer (J) Observer Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound Lower Bound Upper Bound Lower Bound (*) (*) * The mean difference is significant at the.05 level.

Results (cont.)

Initiative Info Gathering Judgement Providing Direction Empower ment ControlIn-Basket Initiative1.000 Info Gathering.813(**)1.000 Judgement.448(**).445(**)1.000 Providing Direction.554(**).506(**).493(**)1.000 Empower ment.441(**).428(**).479(**).469(**)1.000 Control.491(**).535(**).419(**).431(**).400(**)1.000 In-Basket.475(**).418(**).761(**).679(**).814(**).595(**)1.000 Non-Parametric Correlations

Clear Regional; Age and Tenure Differences Do Exist among Participants Possible Sources of the Differences:  Regional Administration of In-Basket  Thus Differences in Administration Medium (Potosky, 2008) o Different Administrators (Explaining Purpose; Giving Instructions; Answering Questions) o Different Resources o Different Test Room Conditions DiscussionDiscussion

Differences Between Participants Regionally:  English Language Ability (not tested)  Motivation to Participate in the Assessment (not tested)  Differences in Employee Selection Processes as well as Training Opportunities (Burroughs et al., 1973)  Simulation Fidelity (not tested) Discussion (cont.)

Clear Regional; Age and Tenure Differences Do Exist among Participants Supporting Findings by Burroughs et al. (1973)  Age does Significantly Influence AC Performance  Participants from Certain Departments Perform Better Discussion (cont.)

Appropriateness of In-Basket for Ratees  Level of Complexity  Situation Fidelity Recommendations:  Ensure Documented Evidence (Analysis Phase in Design Model)  Pilot In-Basket on Target Ratees (Design Phase of Design Model)  Shared Responsibility of Service Provider and Client Organisation Discussion (cont.)

Context in Which In-Basket Administered  Purpose Communicated Recommendations:  Ensure Participants (Ratees) and Process Owners Understand and Buy- into Purpose Discussion (cont.)

Consistent Simulation Administration:  Instructions Given Consistently  Interaction with Administrator  Appropriate Resources Available During Administration  Test Room Conditions Appropriate for Testing Recommendations:  Ensure All Administrators Trained  Standardise Test Room Conditions Discussion (cont.)

Rater Differences do Exist Possible Sources of Rater Differences:  Background (All from a Psychology Background, with Management Experience)  Characteristics such as Personality (Bartels & Doverspike)  Owing to Cognitive Load on Raters  Owing to Differences in Mental Models (Jones & Born, 2008) Discussion (cont.)

Possible Sources of Rater Differences (cont.):  Training o All Received Behaviour Oriented Rater Training o Frame of Reference Different Discussion (cont.)

Recommendations:  Frame of Reference Training on:  Dimensions,  Management-Leadership Behaviour,  Norms  Project Management  Personality Assessment of Raters  Sub-dimension Differences

Questions?Questions? ?

SummarySummary  Found Rater Bias  Need to Research the Source of the Bias  Recommend Frame of Reference Training, Project Management Communication of Purpose and Administrator Training