The extent to which an experiment, test or any measuring procedure shows the same result on repeated trials.

Slides:



Advertisements
Similar presentations
The meaning of Reliability and Validity in psychological research
Advertisements

Assessing Student Performance
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Presented by Eroika Jeniffer.  We want to set tasks that form a representative of the population of oral tasks that we expect candidates to be able to.
2.06 Understand data-collection methods to evaluate their appropriateness for the research problem/issue.
Chapter 4 – Reliability Observed Scores and True Scores Error
Lesson Six Reliability.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Testing What You Teach: Eliminating the “Will this be on the final
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Alternative Assessments FOUN 3100 Fall 2003 Sondra M. Parmer.
Some Practical Steps to Test Construction
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
Lesson Seven Reliability. Contents  Definition of reliability Definition of reliability  Indication of reliability: Reliability coefficient Reliability.
1 BASIC CONSIDERATIONS in Test Design 2 Pertemuan 16 Matakuliah: >/ > Tahun: >
Basic Issues in Language Assessment 袁韻璧輔仁大學英文系. Contents Introduction: relationship between teaching & testing Introduction: relationship between teaching.
Psych 231: Research Methods in Psychology
BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY
Creating Effective Classroom Tests by Christine Coombe and Nancy Hubley 1.
Assessing and Evaluating Learning
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
11/08/ Individualisation-Standardisation 11/08/
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Proposal Writing.
RELIABILITY BY DESIGN Prepared by Marina Gvozdeva, Elena Onoprienko, Yulia Polshina, Nadezhda Shablikova.
ITEM ANALYSIS.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Reliability Lesson Six
Classroom Assessments Checklists, Rating Scales, and Rubrics
+ Old Reliable Testing accurately for thousands of years.
Validity & Practicality
Chap. 2 Principles of Language Assessment
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
Module 6 Testing & Assessment Part 1
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Imagine…  A hundred students is taking a 100 item test at 3 o'clock on a Tuesday afternoon.  The test is neither difficult nor easy. So, not ALL get.
Stages of Test Development By Lily Novita
RELIABILITY BY DONNA MARGARET. WHAT IS RELIABILITY?  Does this test consistently measure what it’s supposed to measure?  The more similar the scores,
PRINCIPLES OF LANGUAGE ASSESSMENT Riko Arfiyantama Ratnawati Olivia.
Unit 3 L2 Testing (2): The cornerstones of language testing.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Language Assessment.
Research Methods for Business Students
EVALUATING EPP-CREATED ASSESSMENTS
Educational Communication & E-learning
Classroom Assessments Checklists, Rating Scales, and Rubrics
Professor Jim Tognolini
Introduction to Assessment
Concept of Test Validity
Test Administration Pertemuan 25
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Developing a Methodology
Classroom Assessments Checklists, Rating Scales, and Rubrics
Testing testing: developing tests to support our teaching and learning
Video 3: Scoring Policies and
پرسشنامه کارگاه.
Statistics and Research Desgin
Office of Education Improvement and Innovation
RELIABILITY IN TESTING
Marketing Surveys Lecture: min 29.2.
TOPIC 4 STAGES OF TEST CONSTRUCTION
Experiment Basics: Variables
Performance Management
Adrian Enright EUROCONTROL
Starter: 1. Suggest two more pieces of observational data that could be collected by the psychologist, one qualitative and one quantitative.    2. One.
Why do we assess?.
“QA” = quality assurance
Presentation transcript:

the extent to which an experiment, test or any measuring procedure shows the same result on repeated trials.

THURSDAY FRIDAY

TEST A TEST B Student Score obtained Score which would have been obtained on the following day Bill Mary Ann Harry Don Colin Sue Kate Sam 68 46 19 89 43 56 27 76 82 28 34 67 63 59 35 23 62 Student Score obtained Score which would have been obtained on the following day Bill Mary Ann Harry Don Colin Sue Kate Sam 65 48 23 85 44 56 38 19 67 69 52 21 90 39 59 35 16 62

Scores on an interview using a five-point scale Student Score obtained Score which would have been obtained on the following day Bill Mary Ann Harry Don Colin Sue Kate Sam 5 4 2 3 1

Types of Reliability Student- (or Person-) related reliability Rater- (or Scorer-) related reliability Intra-rater reliability Inter-rater reliability Test administration reliability Test (or instrument-related) reliability

1-Student-Related Reliability The source of the error score comes from the test takers. Temporary illness Fatigue Anxiety Other physical or psychological factors Test-wiseness (i.e., strategies for efficient test taking)

2-Rater (or Scorer) Reliability Fluctuations: including human error, subjectivity, and bias Principles: Use experienced trained raters. Use more than one rater. Raters should carry out their assessments independently. Two kinds of rater reliability: Intra-rater reliability Inter-rater reliability

A-Intra-Rater Reliability Unclear scoring criteria Fatigue Bias toward particular good and bad students Simple carelessness

B-Inter-Rater Reliability Fluctuations including: Lack of attention to scoring criteria Inexperience Inattention Preconceived biases

3-Test Administration Reliability Street noise Listening comprehension test Photocopying variations Lighting Variations in temperature Condition of desks and chairs Monitors

4-Test Reliability Measurement errors come from the test itself: Test is too long Test with a time limit Test format allows for guessing Ambiguous test items Test with more than one correct answer

THE RELIABILITY COEFFICIENT

Reliability Coefficient (r) To quantify the reliability of a test  allow us to compare the reliability of different tests. 0 ≤ r ≤ 1 (ideal r= 1, which means the test gives precisely the same results for particular testees regardless of when it happened to be administered). If r = 1: 100% reliable A good achievement test: r>= .90 R<.70  shouldn’t use the test

How to make a test MORE reliable? Question: How many components do we have to test reliability? Answer: 2 Question: What are they? Answer: 1- Performance of candidates from occasion to occasion 2- the reliability of scoring

How to make a test MORE reliable? To achieve consistent performances from candidates Set up a score reliability

How can performance of candidates be made reliable? -7 1-Take enough samples of behavior 2- Do not allow candidates too much freedom 3- Write unambiguous items 4- Provide clear and explicit instructions

How can performance of candidates be made reliable? -7 5- Ensure that test are well laid out and perfectly legible 6-Candidates should be familiar with format and testing techniques 7- Provide uniform and non-distracting conditions of administration

How can performance of candidates be made reliable? 1- Take enough samples of behavior ??? *Other things being equal, the more items that one has on a test, the more reliable the test will be. THE MORE THE MERRIER = DO NOT CAUSE boredom with LONG lists of questions DECREADING the RELIABILITY LEVEL *how many extra items are similar to the ones already in the test will be needed to increase the reliability coefficient (factor) to a required level E.g.: Visual materials are very useful to learn something. E.g.: I love to see teachers use power- points, posters and visual materials. Every additional item represent a fresh start for the candidate

How can performance of candidates be made reliable? 2-Do not allow candidates too much freedom *to offer candidates to choose the questions – no choice *to offer candidates to choose how to answer – restricted in terms of possible answers

How can performance of candidates be made reliable? 2-Do not allow candidates too much freedom Example: Which one is more reliable? Write a composition on tourism. Write a composition on tourism in this country. Write a composition on how we might develop the tourists industry in this country. Discuss the following measures intended to increase the number of foreign tourists coming to this country: 1-Better advertising and more information (where &what form) 2-Improve facilities (hotels, transportation, communication…) 3-Training of responsible people(guides, hotel managers...)

How can performance of candidates be made reliable? 3- Write an unambiguous items * the meaning of items should be clear * acceptable answer should also be clear both for the test writer and test taker * No place for a test taker to make any interpretations for the question in different ways on different occasions

How can performance of candidates be made reliable? 3- Write an unambiguous items How can we prepare such clear items Write a draft for items Give them to a peer / colleague to check it Pilot it among non-participant students If no chance for piloting, scorers will take the responsibility.

How can performance of candidates be made reliable? 4- Provide clear and explicit instructions *Peer / colleague correction and criticism TO AVOID complaining about students’ being unintelligent, stupid ;) 5- Ensure that the tests are well laid out (edited) and perfectly legible (easy to read) NO badly typed items or no legible handwriting NO too much text in too small a space NO poorly re-produced copies WHY??? LOWER THE RELIABILITY

How can performance of candidates be made reliable? 6- Candidates should be familiar with format and testing techniques *Ask familiar questions and format and structures that they have been familiar with 7- Provide uniform and non-distracting conditions of administration *administrating the difference in uniformity, timing (the same strictness)- acoustic conditions – a quiet setting with no distracting sounds or movements

How can score reliability be achieved? 1- Use items that permit scoring which is as objective as possible 2- Make comparison between candidates as direct as possible 3-Provide a detailed scoring key 4-Train scorer

How can score reliability be achieved? 5-Agree acceptable responses and appropriate scores at outset of scoring 6- Identify candidates by number not name 7- Employ multiple, independent scoring

How can score reliability be achieved? 1- Use items that permit scoring which is as objective as possible *How can objectiveness be accomplished? -Multiple choice (not easy to prepare) Open-ended questions with has a unique, possibly one-word correct response candidates produce (spelling may be problematic) Example: In a listening test: Q: what was different about the results? A)_________________________________________. B)_______was more closely associated with_______.

How can score reliability be achieved? 2- Make comparisons between candidates as direct as possible *How can this be done? *Asking the same questions and expecting the same answers *Limit the students’ way of answering and choosing the quesitons NO MUCH FREEDOM ;)

How can score reliability be achieved? 3- Provide a detailed scoring key -A clear explanations for possible acceptable answers -A clear scoring for the correct answers -Prepare as much detailed answer key as possible: partially or totally accepted and how grading will be done for them half or full point?

How can score reliability be achieved? 4- Train scorers *If the scoring will be subjective , take writing courses, SCORERS should be familiar with the material and questions because it needs special training. *Double marking should be done.

How can score reliability be achieved? 5- Agree acceptable responses and appropriate scores at outset of scoring *Every scorer should all agree on the scores for the test or else scoring can’t start *open to new possible answers and be flexible *Remind the possible answers and make everybody agree on the scoring *If so, all scorer should be given knowledge about it and those should be added to the answer key

How can score reliability be achieved? 6- Identify candidates by number not name Knowing students’ name, gender, nationality, physical appearance affect scoring process Blind reading = to be objective 7- Employ multiple / independent scoring Scoring should be done by two different independent scorers =Double markers And the difference should be observed and a final decision is made by a third person

Gürşen SARIALTIN & Tuba DEMİR THANK YOU Gürşen SARIALTIN & Tuba DEMİR