BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY

Slides:



Advertisements
Similar presentations
Assessing Student Performance
Advertisements

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Topics: Quality of Measurements
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
VALIDITY AND RELIABILITY
Lesson Six Reliability.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Testing What You Teach: Eliminating the “Will this be on the final
Reliability and Validity of Research Instruments
Assessment: Reliability, Validity, and Absence of bias
RESEARCH METHODS Lecture 18
TOPIC 3 BASIC PRINCIPLES OF ASSSESSMENT
Cognitive and Academic Assessment
Chapter 7 Evaluating What a Test Really Measures
Classroom Assessment A Practical Guide for Educators by Craig A
Understanding Validity for Teachers
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Chapter 4. Validity: Does the test cover what we are told (or believe)
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Technical Issues Two concerns Validity Reliability
Measurement and Data Quality
Validity and Reliability
Principles of Language Assessment
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Technical Adequacy Session One Part Three.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Validity & Practicality
Principles in language testing What is a good test?
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Measuring Complex Achievement
Chap. 2 Principles of Language Assessment
Validity Is the Test Appropriate, Useful, and Meaningful?
EDU 8603 Day 6. What do the following numbers mean?
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
Measurement Validity.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Chapter 4 Validity Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
 A test is said to be valid if it measures accurately what it is supposed to measure and nothing else.  For Example; “Is photography an art or a science?
Evaluation, Testing and Assessment June 9, Curriculum Evaluation Necessary to determine – How the program works – How successfully it works – Whether.
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Language Assessment. Evaluation: The broadest term; looking at all factors that influence the learning process (syllabus, materials, learner achievements,
PRINCIPLES OF LANGUAGE ASSESSMENT Riko Arfiyantama Ratnawati Olivia.
Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Language Assessment.
Principles of Language Assessment
VALIDITY by Barli Tambunan/
Concept of Test Validity
Validity and Reliability
Reliability & Validity
Classroom Assessment Validity And Bias in Assessment.
پرسشنامه کارگاه.
VALIDITY Ceren Çınar.
TOPIC 4 STAGES OF TEST CONSTRUCTION
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY TSL 3112 – LANGUAGE ASSESSMENT BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY

LECTURE’S OBJECTIVES Explain the difference between validity and reliability. Distinguish the different types of validity and reliability in tests and other instruments in language assessment. Suggest ways to ensure reliability and validity in in language assessment. (Main reference - Brown, H. Douglas, 2004. Language Assessment: Principles and classroom practices. )

WHAT IS RELIABILITY & VALIDITY Reliability is the degree to which an assessment tool produces stable and consistent results. Validity refers to how well a test measures what it is purported to measure Why is it necessary? While reliability is necessary, it alone is not sufficient.  For a test to be reliable, it also needs to be valid.  For example, if your scale is off by 3 kg, it reads your weight every day with an excess of 3 kg.  The scale is reliable because it consistently reports the same weight every day, but it is not valid because it adds 3 kg to your true weight.  It is not a valid measure of your weight.

1- Reliability “a reliable test is consistent and dependable: this means that if you give the test to the same students but in two different occasions, the test should yield=give the same results. There are four factors that might contribute to the unreliability of a test : Students related reliability Rater reliability (scoring) Test administration reliability Test reliability

1- Reliability a. Students-related reliability is related to factors that might contribute to the students themselves such as illness, fatigue, anxiety or any other physical and psychological factors which might make the observed score unreflective to the actual students performance. Also includes test-wiseness or strategies for efficient test-taking b. Rater reliability is related to human subjectivity, biased and error. inter-rater reliability is when two or more scorers yield inconsistent scores of the same test. Due to lack of attention, inexperience, or even biases. Intra-rater reliability is common in the classroom teachers because of unclear criteria or biases toward particular bad or good students. (how to solve this problem? PAGE 28)

1- Reliability Applying the reliability principle – read page 41 c. Test administration reliability is related to the condition in which that test is administered. factors such as heat, light, noise, confusing directions, and different testing time allowed to different students can affect students’ scores d. Test reliability is related to the test itself. Is it too long?, the time limit, poorly written test is there more than one answer to the question?, is the wording of the test clear? Subjective or objective test? Applying the reliability principle – read page 41

1- Reliability Test length factors Factors that can affect the reliability of a test Test length factors longer tests produce higher reliabilities. Teacher-Student factors any good teacher-student relationship would help increase the consistency of the results. Environment factors favourable environment will improve the reliability of the test Test administration factors test administrators should strive to provide clear and accurate instructions Marking factors

2- Validity Tests themselves are not valid or invalid. Instead, we validate the use of a test score. Validity is a matter of degree, not all or none. Test validity refers to “ the extent to which inferences made from he assessment rests are appropriate , meaningful , and useful in terms of purpose of the assessment” A test of reading ability must actually measure reading ability. Does the test content/ items match the course content or unit being taught? (read Brown page 30)

2- Validity a. Content-related Validity is achieved when the students were asked to perform the behavior that is being measured. For example, when you are assessing a student's ability to speak a second language but you are asking him/her to answer a paper and pen multiple choice questions that require grammatical judgment, then you are not achieving the content validity. (direct vs indirect testing) Does the test follow the logic of the lesson or unit? Are the objectives of the lessons clear and present in the test? Test should assess real course objectives and test performance directly. Read page 30-32

2- Validity b – Criterion-related validity when the test has demonstrated its effectiveness in predicting criterion or indicators of a construct. looks at the relationship between a test score and an outcome. For example, SAT scores are used to determine whether a student will be successful in college. First-year grade point average becomes the criterion for success. Looking at the relationship between test scores and the criterion can tell you how valid the test is for determining success in college. The criterion can be any measure of success for the behaviour of interest.

2- Validity A criterion-related validation study can be either predictive of later behaviour or a concurrent measure of behavior or knowledge. i) Concurrent validity its results are supported by other concurrent performance beyond the assessment itself. Statements of concurrent validity indicate the extent to which test scores may be used to estimate an individual’s present standing on a criterion. Eg: a high score on the final exam of English language will be substantiated by actual proficiency in the language.

2- Validity ii) Predictive validity refers to the form of criterion-related validity that is an index of the degree to which a test score predicts some criterion measure obtained at a future time. To assess a test-taker’s likelihood of future success. It is important in the case of placement tests

2- Validity c - Construct-related validity Construct validity refers to the degree to which a test or other measure assesses the underlying theoretical construct it is supposed to measure (i.e., the test is measuring what it is purported to measure). Concerned with the theoretical relationships among constructs and The corresponding observed relationships among measures ( read page 33)

2- Validity c - Construct-related validity As an example, think about a general knowledge test of basic algebra. If a test is designed to assess knowledge of facts concerning rate, time, distance, and their interrelationship with one another, but test questions are phrased in long and complex reading passages, then perhaps reading skills are inadvertently being measured instead of factual knowledge of basic algebra. In order to demonstrate construct validity, evidence that the test measures what it purports to measure (in this case basic algebra) as well as evidence that the test does not measure irrelevant attributes (reading ability) are both required.

2- Validity d. Consequential validity (impact) Some professionals feel that, in the real world, the consequences that follow from the use of assessments are important indications of validity. Two levels of impact: i) macro level: the effect on society and educational systems ii) micro level: the effect of individual test takers.

2- Validity e - Face validity A test is said to have face validity if it "looks like" it is going to measure what it is supposed to measure Face validity is not empirical; one is saying that the test “appears it will work,” as opposed to saying “it has been shown to work.” refers “ is to the extent students view the assessment as fair, relevant , and useful for improving learning” Students will judge a test to be face valid if Directions are clear Organized in a logical way No surprises Time appropriate Appropriate in difficulty

2- Validity These validity are summarized by Davies (1968) in the following way: Type of Validity Test Face Looks like a good one to learner/layman Content Accurately reflects the syllabus it is based on Predictive Accurately predicts future performance Concurrent Gives similar results to already validated tests or other immediate external criteria (eg. Teacher’s subjective assessment) Construct Reflects closely a valid theory of foreign language learning that it takes as its model.

Tutorial Read Exercise 6 page , in small group evaluate the scenarios provided. Based on samples of formative and summative assessments, discuss aspects of reliability / validity that must be considered in these assessments. Discuss measures that a teacher can take to ensure high validity of language assessment for the primary classroom.

References Brown, H. Douglas, 2004. Language Assessment: Principles and classroom practices. Pearson Education, Inc. Chitravelu, Nesamalar, 2005. ELT Methodology: Principles and Practice. Penerbit Fajar Bakti, Sdn, Bhd. http://www.indiana.edu/~best/bweb3/test-reliability/ https://www.uni.edu/chfasoa/reliabilityandvalidity.htm http://research.collegeboard.org/services/aces/validity/handbook/test-validity