NRTs and CRTs Group members: Camila, Ariel, Annie, William.

Slides:

Advertisements

Similar presentations

Assessing Student Performance

Advertisements

Assessment types and activities

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Using Test Item Analysis to Improve Students’ Assessment

Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)

Item Analysis Ursula Waln, Director of Student Learning Assessment

STANDARDIZED TESTING MEASUREMENT AND EVALUATION  All teaching involves evaluation. We compare information to criteria and ten make judgments.  Measurement.

Lesson Seven Item Analysis. Contents Item Analysis Item Analysis Item difficulty (item facility) Item difficulty (item facility) Item difficulty Item.

Uses of Language Tests.

Lesson Nine Item Analysis.

Lesson Three Kinds of Test and Testing. Yun-Pi Yuan 2 Contents Kinds of Tests: Based on Purposes  Classroom use Classroom use  External examination.

ANALYZING AND USING TEST ITEM DATA

How to Take Tests I Background On Testing.

Classroom Assessment A Practical Guide for Educators by Craig A

Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.

LG675 Session 5: Reliability II Sophia Skoufaki 15/2/2012.

Assessment in Language Teaching: part 2 Today’s # 24.

1 The New York State Education Department New York State’s Student Reporting and Accountability System.

Formative and Summative Assessment

COMPASS National and Local Norming Sandra Bolt, M.S., Director Student Assessment Services South Seattle Community College February 2010.

Chap. 3 Designing Classroom Language Tests

Topic 4: Formal assessment

Office of Institutional Research, Planning and Assessment January 24, 2011 UNDERSTANDING THE DIAGNOSTIC GUIDE.

Aptitude Tests - LAB  Paul Pimpleur developed Language Aptitude Battery in 1960s with 6 subtests Grade-Point Average in academic areas other than foreign.

1 KINDS of TESTS Pertemuan 14 Matakuliah: >/ > Tahun: >

Classroom Assessments Checklists, Rating Scales, and Rubrics

Identifying and Addressing Student Learning Difficulties in Calorimetry and Thermodynamics Ngoc-Loan Nguyen and David E. Meltzer Department of Physics.

The Genetics Concept Assessment: a new concept inventory for genetics Michelle K. Smith, William B. Wood, and Jennifer K. Knight Science Education Initiative.

Winda Needs analysis Goals and objectives Testing Curriculum design.

Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.

Eileen Boyce Toni Tessier Waterford Public Schools Literacy Specialists.

Grading and Reporting Chapter 15

Data Vocabulary Language Arts Summer Cadre 2006 Migdalia Rosario Varsity Lakes Middle Jennifer Miller Varsity Lakes Middle Pat Zubal Dunbar Middle School.

Techniques to improve test items and instruction

Teaching Today: An Introduction to Education 8th edition

Lesson Three Kinds of Test and Testing. Contents Kinds of Tests: Based on Purposes  Classroom use Classroom use  External examination Kinds of Testing:

Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –

Assessment and Evaluation. Assessment Formal assessment is often criticized for relying on numerical scores without knowing a student’s underlying reasoning,

Item specifications and analysis

Module 6 Testing & Assessment Part 1

Assessment What is it? Collection of relevant information for the purpose of making reliable curricular decisions and discriminations among students (Gallahue,

Descriptive Statistics Prepared by: Asma Qassim Al-jawarneh Ati Sardarinejad Reem Suliman Dr. Dr. Balakrishnan Muniandy PTPM-USM.

Grading and Analysis Report For Clinical Portfolio 1.

The Teaching Process. Problem/condition Analyze Design Develop Implement Evaluate.

Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.

Assessment. Workshop Outline Testing and assessment Why assess? Types of tests Types of assessment Some assessment task types Backwash Qualities of a.

A Study on Junior High Students ’ English Learning Achievement in Taiwan CamilaShelly January, 15, 2010.

An Integral Part of Professional Learning Communities.

ASSESSMENT CRITERIA Jessie Johncock Mod. 2 SPE 536 October 7, 2012.

The Great Divide a norm-referenced test (NRT)

Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,

Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.

Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.

Norm Referenced Your score can be compared with others 75 th Percentile Normed.

Ch4-1. Testing.

Information for Parents Key Stage 3 Statutory Assessment Arrangements

ARDHIAN SUSENO CHOIRUL RISA PRADANA P.

Assessment in Language Teaching: part 1 Lecture # 23

Kind of Test Based on Purposes

Greg Miller Iowa State University

Classroom Assessment Ways to improve tests.

Using statistics to evaluate your test Gerard Seinhorst

Jayhawkville Central High School

Analyzing test data using Excel Gerard Seinhorst

Statistical Analysis and Unit Improvement Plan Book pgs

Why do we assess?.

Tests are given for 4 primary reasons.

Presentation transcript:

NRTs and CRTs Group members: Camila, Ariel, Annie, William

NRTs—Norm-referenced Item Analysis  Item facility  Item discrimination

Why do we do item analysis?  Assemble a large number of items of the type you want on the test  make sure the items are well written and clear  Pilot the items  Analyze the results of the pilot testing  Select the most effective items/get rid of the ineffective items or revise the weak ones

The basic purpose of NRTs  To spread students out along a general continuum of language abilities making aptitude, proficiency, or placement decisions

Two item statistics used in IA of NRTS  Item facility(IF): the proportion of students who answered a particular item correctly. 45/50= % of the students answered the item correctly — the item is very easy  Item discrimination(ID):Caculate IF for the upper group and the lower group using AVERAGE (C2:C6) and AVERAGE (C15:C19) ID=IFu-IFl ex: IFu-IFl=.20

Item Discrimination (ID)  can be calculated by first figuring out who the upper and lower students are on the test  using their total scores to sort them from the highest score to the lowest  Equal numbers of students in three groups — high/middle/low  ID=If upper - If lower

Ideal items in an NRT  Should have an average IF of.50 50% of the students answered correctly and 50% of them answered incorrectly  In reality, items rarely have an IF of exactly.50  Those that fall in a range between.30 and.70 are usually considered acceptable for NRT purpose

Items within.30 to.70 range  The items among them that have the highest IDs should be further selected for inclusion in the revised test.  This process would help the test designer to keep only those items that are well centered and discriminate well between the high and the low scoring students

Conclusion for NRT  IF and ID are only appropriate for developing and analyzing norm- referenced tests  Used at the institutional level overall English language proficiency tests or placement tests  Not appropriate for developing and analyzing classroom oriented criterion- referenced tests like the diagnostic, progress, and achievement tests

CRTs-Criterion-referenced Item Analysis  Purpose of CRTs — to measure the amount (or percent) of material in a course or program of study that students know  Usually for purposes of making diagnostic, progress, or achievement

Two item statistics used in IA of CRTs 1.The difference index(DI): the item facility on the particular item for the posttest minus the item facility for that same item on the pretest pretest:10/50=.20; posttest:45/50=.90 DI= =.70  DI tells how much the students are improving between the pretest and posttest on each item  The higher the value of the DI, the better.  A value of 1.00 is a perfect difference index

Two item statistics used in IA of CRTs 2.The B-index: the item facility on the particular item for the students who passed the test minus the item facility for the students who failed  The B-index show how well each item is contributing to the pass/fail decisions that are often made with CRTs B-index=IF pass - IF fail =14/14-0/6= =1

P.21  The B-index tell us how well each item is contributing to the pass/fail decision on this test at the cut-point.  Like ID and DI statistics, the higher the B-index, the better.  A perfect B-index would be 1.00.

Conclusion-CRTs  When should these indices be used? To analyze the items on a CRTs for purposes of revising the test  The items with the highest values should generally be kept in both cases, DI and BI  Making these decisions not as simple as it is for NRT development, because a CRT item may not be performing well in terms of these statistics for many reasons.

Reasons that a CRT item may not be performing well  The item is written/ working poorly  The objective the item is testing is vague  Ss are not ready to learn this particular objective  One/ all of the teachers did not teach this particular objective or teaching it poorly  The materials are confusing with regard to this particular objective  The statistics cannot tell you exactly what is wrong even though they can point you to places in your curriculum where something is not working well

 Some common-sense analysis of the entire situation needs to be done  To revise the CRT or other aspects of your curriculum such as the objectives, the materials, the teaching, etc.,

What the statistics help you?  Help you figure out where to focus your energies  The DI will tell you how well each item fits the objectives of your curriculum  The BI will tell you how each item is contributing to the pass/fail decision that you must make at whatever cut- point you are using.

2 Definitions for “Criterion” in CRT 1.Refers to the material being taught in the course --CRT would assess the particular learning points of a particular course or program  This definition fits very well with the difference index, which indicates how well each item fits objectives of the curriculum

2 Definitions for “Criterion” in CRT 2. The standard of performance (or cut-point for decision making) that is expected for passing the test/course --CRTs – used to assess whether students pass or fail at a certain criterion level (or cut-point)  Fits very well with the B-index, which indicates how well each item is contributing to the pass/fail decision that you must make at whatever cut- point your are making

What should you focus?  If you are interested in the degree to which your items are reflecting the material in your course ….DI  If you are interested in the degree to which your items are helping you make decisions at a certain cut- point … BI  If you are interested in both statistics … DI and BI

What should they not be used?  Not used to analyze the effectiveness of norm-referenced items

What is the Ultimate Goal?  To produce a curriculum and CRTs that match each other such that you get high difference indexes and high B-indexes.