Session A: Psychometrics 101: The Foundations and Terminology of Quality Assessment Design Date: May 14 th from 3-5 PM Session B: Psychometrics 101: Test.

Slides:



Advertisements
Similar presentations
Michigan Assessment Consortium Common Assessment Development Series Putting Together The Test Blueprint.
Advertisements

National 5 Added Value Unit (Business Report)
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Evaluate 1. Assessment of value ٠ the act of considering or examining something in order to judge its value, quality, importance, extent, or condition.
Effect Size and Meta-Analysis
TWS Aid for Scorers Information on the Background of TWS.
Michigan Assessment Consortium Common Assessment Development Series Module 6 – The Test Blueprint.
Some Practical Steps to Test Construction
Reliability and Validity
Chapter 5 Instrument Selection, Administration, Scoring, and Communicating Results.
INTRODUCTION TO ASSESSMENT DESIGN. INTRODUCTION & PURPOSE.
Classroom Assessment A Practical Guide for Educators by Craig A
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Scholastic Reading Inventory-Interactive Training Module
ALIGNMENT. INTRODUCTION AND PURPOSE Define ALIGNMENT for the purpose of these modules and explain why it is important Explain how to UNPACK A STANDARD.
SCORING. INTRODUCTION & PURPOSE Define what SCORING means for the purpose of these modules Explain how and why you should use well-designed tools, such.
The Math Studies Project for Internal Assessment A good project should be able to be followed by a non-mathematician and be self explanatory all the way.
Oscar Vergara Chihlee Institute of Technology July 28, 2014.
Data Collection & Processing Hand Grip Strength P textbook.
REFLECTING ON ASSESSMENT DESIGN. INTRODUCTION & PURPOSE.
Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.
Designing and evaluating good multiple choice items Jack B. Monpas-Huber, Ph.D. Director of Assessment & Student Information.
Smarter Balanced Accommodations – Knowing and Using Allowed Resources Presenters: Donna Gearns Alicia Skelly 8/20/2014.
Poetry Assessment Analysis & Critique Krissa Loretto EDUC 340 Spring 2014.
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
Reliability Presented By: Mary Markowski, Stu Ziaks, Jules Morozova.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Technical Adequacy Session One Part Three.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Classroom Assessments Checklists, Rating Scales, and Rubrics
SPP Indicators B-7 and B-8: Overview and Results to Date for the Florida Prekindergarten Program for Children with Disabilities PreK Coordinators Meeting.
CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CONSTRUCTING OBJECTIVE TEST ITEMS: MULTIPLE-CHOICE FORMS CHAPTER 8 AMY L. BLACKWELL JUNE 19, 2007.
PRECISION. KEY CONCEPTS INTRODUCTION & PURPOSE Describe what PRECISION means for the purpose of these modules Make an IMPRECISE ITEM MORE PRECISE.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
EDU 8603 Day 6. What do the following numbers mean?
Scientific Communication
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Skills Development Using Word’s grammar checker to determine if writing is at an appropriate reading level for the intended audience.
Smarter Balanced Interim Assessment System. Session Overview What are the interim assessments? How to access? How to score? Using the THSS and the scoring.
Self-Assessing Locally-Designed Assessments Jennifer Borgioli Learner-Centered Initiatives, Ltd.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Designing Your Selected Response Assessment. Create a Cover Page Include: 1.A statement of purpose Is this an assessment FOR or OF learning (formative.
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
March 11, 2013 Chicago, IL American Board of Preventive Medicine American Board of Preventive Medicine Clinical Informatics Examination Committee Measurement.
APA NJ APA Teacher Training 2 What is the Purpose of the APA? To measure performance of students with the most significant cognitive disabilities.
Basic steps to get the easy marks
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. SELECTED.
Assessment and the Institutional Environment Context Institutiona l Mission vision and values Intended learning and Educational Experiences Impact Educational.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
How to Use These Modules 1.Complete these modules with your grade level and/or content team. 2.Print the note taking sheets. 3.Read the notes as you view.
Writing Reading Items Module 2 Activity 4.
Writing Reading Items Module 2 Activity 4.
PSY 614 Instructor: Emily Bullock, Ph.D.
Links for Academic Learning: Planning An Alignment Study
Validity and Reliability II: The Basics
AACC Mini Conference June 8-9, 2011
Designing Your Extended Written Response Assessment
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Session A: Psychometrics 101: The Foundations and Terminology of Quality Assessment Design Date: May 14 th from 3-5 PM Session B: Psychometrics 101: Test Blueprints (Standards Alignment, Working Backwards, and documentation) Date: May 16 th from 3-5 PM Session C: Psychometrics 101: Understanding and Decreasing Threats to Validity AKA What’s the purpose of my assessment? Date:: May 30 th from 3-5 PM Session D: Psychometrics 101: Understanding and Decreasing Threats to Reliability AKA What does it mean to have noisy assessments? Date June 4 th from 3-5 PM Session E: Designing Quality Qualitative Measures: Understanding, Interpreting, and Using Survey Data Date: June 18 th from 3-5 PM Session G: Putting the cart behind the horse: Designing action research and inquiry questions to inform teaching and learning Date: June 25 th from 3-5 PM

Reliability Indication of how consistently an assessment measures its intended target and the extent to which scores are relatively free of error. Low reliability means that scores cannot be trusted for decision making. Necessary but not sufficient condition to ensure validity.

How consistent are my assessment results? It applies to strength, or consistency, of the assessment when given at different times, scored by different teachers, or given in a different way.

There are multiple ways of assessing reliability alternate form reliability, split-halves reliability coefficients, Spearman-Brown double length formula, Kudar-Richardson Reliability Coefficient, Pearson Product-Moment Correlation Coefficient, etc.

and three general ways to collect evidence of reliability Stability: How consistent are the results of an assessment when given at two time-separated occasions? Alternate Form: How consistent are the results of an assessment when given in two different forms?; Internal Consistency: How consistently do the test’s items function?

Noise

1. Formatting Do students have enough space to write their response?

Text or features that pull the student out of the test create noise. Question stem on one page, choices on another Three choices on one page, fourth choice on second page

2. Typos Typos popped up in every department. They happen. “Final Eyes” are the best way to avoid them.

Test from Period 1 Test from Period 2

What accommodations can be made to ensure there is quality control?

3. Having to hunt for the right answer

Compare with...

4. Using the question to answer the question Two options in word bank were two word phrases – so I know they are the right answer for one of these two items

Don’t need to know the answer to know it’s not her... or her... and we can be pretty sure the president of France isn’t like Bono

5. Not having one clear answer

6. Unclear Questions As compared to what? If a student needs to infer what you want, there’s noise.

One assessment does not an assessment system make.

Fairness and Bias Fair tests are accessible and enable all students to show what they know. Bias emerges when features of the assessment itself impede students’ ability to demonstrate their knowledge or skills.

In 1876, General George Custer and his troops fought Lakota and Cheyenne warriors at the Battle of the Little Big Horn. In there had been a scoreboard on hand, at the end of that battle which of the following score-board representatives would have been most accurate? A.Soldiers > Indians B.Soldiers = Indians C.Soldiers < Indians D.All of the above scoreboards are equally accurate

My mother’s field is court reporting. Chose the sentence below in which the word field means the same as it does in the sentence above. A.The first basemen knew how to field his position. B.Farmer Jones added fertilizer to his field. C.What field will you enter when school is complete? D.The doctor checked my field of vision?

What are other attributes of quality assessments?

Implications Generally speaking, schools should perform at least two statistical tests to document evidence of reliability: a correlation coefficient and SEM. Nitko and Brookhart (2011) recommend 0.85 – 0.95 for only MC and for extended response. In the example above, the user will need to understand that coefficient of 0.80 is low for MC but high for extended response.

Standard Error of Measurement An estimate of the consistency of a student’s score if the student had retaken the test innumerable times

How is the SEM calculated?: The SEM is calculated by dividing the SD by the square root of N. This relationship is worth remembering, as it can help you interpret published data. Calculating the SEM with Excel Excel does not have a function to compute the standard error of a mean. It is easy enough to compute the SEM from the SD, using this formula. =STDEV()/SQRT(COUNT()) For example, if you want to compute the SEM of values in cells B1 through B10, use this formula: =STDEV(B1:B10)/SQRT(COUNT(B1:B10)) The COUNT() function counts the number of numbers in the range. If you are not worried about missing values, you can just enter N directly. In that case, the formula becomes: =STDEV(B1:B10)/SQRT(10)