Assessing the Assessments: Physics An Initial Analysis of the June 2003 NYS Regents Physics Exam J. Zawicki, SUNY Buffalo State College M. Jabot, SUNY.

Slides:



Advertisements
Similar presentations
An Introduction to Test Construction
Advertisements

Assessing Student Performance
Test Development.
MCR Michael C. Rodriguez Research Methodology Department of Educational Psychology.
Design and Development of Pathways to Inquiry MERLOT, August 7-10, 2007 New Orleans, Louisiana Yiping Lou, Pamela Blanchard, Jaime Carnaggio & Srividya.
VALIDITY AND RELIABILITY
Using Multiple Choice Tests for Assessment Purposes: Designing Multiple Choice Tests to Reflect and Foster Learning Outcomes Terri Flateby, Ph.D.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Writing High Quality Assessment Items Using a Variety of Formats Scott Strother & Duane Benson 11/14/14.
Classroom Assessment (1)
Susan Malone Mercer University.  “The unit has taken effective steps to eliminate bias in assessments and is working to establish the fairness, accuracy,
Faculty Research & Creativity Forum October 30, 2008 D. MacIsaac, Physics J. Zawicki, Earth Sciences and Science Education L. Gomez, Physics SUNY Buffalo.
New York State Physical Setting: Physics A Review of the June 2006 Exam NYS AAPT Fall Meeting Binghamton, NY J. Zawicki SUNY Buffalo State College T. Johnson.
An Analysis of NYS Regents Physics Exam Issues J. Zawicki Science Education, BSC STANYS DAL, Physics Mentor.
BASIC TERMINOLOGY n Test n Measurement n Evaluation.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Using statistics in small-scale language education research Jean Turner © Taylor & Francis 2014.
Classroom Assessment A Practical Guide for Educators by Craig A
Standardized Test Scores Common Representations for Parents and Students.
Student Achievement and Predictors of Student Achievement in a State Level Agricultural Mechanics Career Development Event Edward Franklin Glen Miller.
Up-Date on Science and SS February 2015 Faculty Meeting.
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
Information on New Regents Examinations for SCDN Presentation September 19, 2007 Steven Katz, Director Candace Shyer, Bureau Chief Office of Standards,
New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.
NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,
Instrument Validity & Reliability. Why do we use instruments? Reliance upon our senses for empirical evidence Senses are unreliable Senses are imprecise.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Emily H. Wughalter, Ed.D. Measurement & Evaluation Spring 2010.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Computation-based versus concept-based test questions: High school teachers’ perceptions Computation-based versus concept-based test questions: High school.
ASSESSMENT OF STUDENT LEARNING Manal bait Gharim.
Webb’s Depth of Knowledge (DOK) Aligning Assessment Questions to DOK Levels Assessing Higher-Order Thinking.
June  – Last Year  ELA (UPK-9 th Grade)  Mathematics (5 th – Algebra)  – This Year  ELA (10 th Grade)  Mathematics (UPK-4.
Traditional vs. Alternative Assessment
Reliability vs. Validity.  Reliability  the consistency of your measurement, or the degree to which an instrument measures the same way each time it.
Characterizing Noyce Scholars Physics Classrooms Using RTO P Joseph L. Zawicki, Kathleen Falconer, Dan MacIsaac, Catherine Lange, Griffin Harmon SUNYBuffalo.
Psychometrics & Validation Psychometrics & Measurement Validity Properties of a “good measure” –Standardization –Reliability –Validity A Taxonomy of Item.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
EDM 311 – Educational Testing, Measurement and Program Evaluation
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Designs and Reliability Assessing Student Learning Section 4.2.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Chapter 9 Correlation, Validity and Reliability. Nature of Correlation Association – an attempt to describe or understand Not causal –However, many people.
Welcome to MMS MAP DATA INFO NIGHT 2015.
Measurement MANA 4328 Dr. Jeanne Michalski
Review: Alternative Assessments Alternative/Authentic assessment Real-life setting Performance based Techniques: Observation Individual or Group Projects.
L ESSONS FROM THE J UNE 2009 NYS R EGENTS P HYSICS E XAMINATION – N OTES FOR THE C LASSROOM NYSS AAPT Fall Conference Syracuse, New York – October 17,
Traditional vs. Alternative Assessment Assessment is the process of finding out how well students have mastered the curriculum.
The Relationship Between Conceptual Learning and Teacher Self-Reflection Kathleen Falconer Dept of Elementary Education and Reading, SUNY- Buffalo State.
FSM NSTT Teaching Competency Test Evaluation. The NSTT Teaching Competency differs from the three other NSTT tests. It is accompanied by a Preparation.
You Can’t Afford to be Late!
Principles of Language Assessment
Assessment in Science Rodney Doran, SUNY Buffalo, Emeritus
Assessment and Evaluation
Formative and Summative Assessment
Test Design & Construction
Introduction to the Validation Phase
Using the Michigan Focal Points to Guide Instruction and Assessment
ASSESSMENT OF STUDENT LEARNING
Week 3 Class Discussion.
Learning About Language Assessment. Albany: Heinle & Heinle
Partial Credit Scoring for Technology Enhanced Items
Prepared by: Toni Joy Thurs Atayoc, RMT
HS Physical Science Spring 2017
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

Assessing the Assessments: Physics An Initial Analysis of the June 2003 NYS Regents Physics Exam J. Zawicki, SUNY Buffalo State College M. Jabot, SUNY Fredonia

J. Zawicki, M. Jabot2 July 2004 Assessment Purposes Measure knowledge Measure gain in knowledge Measure preparation (predict success) Sorting (Grading) Degree requirements (benchmarks) …

J. Zawicki, M. Jabot3 July 2004 Curriculum, Assessment and Instruction Frameworks Syllabi Guides Blueprints Benchmarks Objective tests Performance assessments Portfolios Teacher Observations Group Activities Program Evaluations Curriculum Standards Assessment/Evaluation SystemInstructional Program alignment validity correlation Instructional styles Print materials Equipment Facilities Technology Community

J. Zawicki, M. Jabot4 July 2004 Concepts (Continued) Difficulty – (Percentage or proportion that are successful on an item)  Facility  Difficulty Discrimination – (How well does the item differentiate between students who understand the subject and those who do not?)

J. Zawicki, M. Jabot5 July 2004 Concepts Validity – how well the item measures match the target construct. May be qualified as:  Construct  Content (Face)  Criterion Related Typically determined by a panel of experts

J. Zawicki, M. Jabot6 July 2004 Concepts (Continued) Reliability – can the results be replicated?  Inter-rater (Do two or more raters agree on the score for an item?)  Test/Re-test (Will a student earn similar scores on different administrations?)  Internal Consistency Criterion referenced tests – have the students met the “standard”

J. Zawicki, M. Jabot7 July 2004 Concepts (Continued) Latency – (How long do students take to complete the test?) Equitable (Fair) Timed tests (Power tests)

J. Zawicki, M. Jabot8 July 2004 Types of Analysis Traditional (difficulty, discrimination) Rasch Analysis (item difficulty is equated to student ability) Cognitive Level (Bloom’s taxonomy simplified: knowing, using, integrating)

J. Zawicki, M. Jabot9 July 2004 Initial Analysis Physics 2004FormatCreditKeyR/C 0R/C 1R/C 2R/C 3R/C 4R/C (%)FacilityDifficultyDEP Item # 01MC Item # 39MC Item # 46MC Item # 20MC Item # 60CR Item # 63CR Item # 68CR Item # 55CR

J. Zawicki, M. Jabot10 July 2004 Difficulty Rankings Easier MC: 1, 39 More Difficult MC: 46, 20 Easier CR: 60, 63 More Difficult CR 68, 55

J. Zawicki, M. Jabot11 July 2004 Easier Multiple Choice

J. Zawicki, M. Jabot12 July 2004 Easier Multiple Choice

J. Zawicki, M. Jabot13 July 2004 More Difficult Multiple Choice

J. Zawicki, M. Jabot14 July 2004 More Difficult Multiple Choice

J. Zawicki, M. Jabot15 July 2004 Easier Constructed Response

J. Zawicki, M. Jabot16 July 2004 Easier Constructed Response

J. Zawicki, M. Jabot17 July 2004 More Difficult Constructed Response

J. Zawicki, M. Jabot18 July 2004 More Difficult Constructed Response

J. Zawicki, M. Jabot19 July 2004 Common Threads – Easier Items Graphing Using charts, tables, graphs Classification (vectors, scalars, …)

J. Zawicki, M. Jabot20 July 2004 Common Threads – More Difficult Items Mathematical relationships (Modeling)  Direct  Inverse  Linear  Exponential Electric, Magnetic, and Gravitational Interactions; Fields (CSEM, Modeling, Castle, Knight) Energy vs. Force (FCI, PET, Modeling)

J. Zawicki, M. Jabot21 July 2004 Next Steps Questions Implications for classrooms (program review) Additional Resources  BSC: SCI685, Evaluation in Science Education  SUNY Fredonia: EDU503, Evaluation in the Schools  SUNY Buffalo: LAI534, Measurement & Evaluation Of Science Instruction  STANYS 2004 Annual Conference, November 7-9, 2004  Rochester Science Educator’s Conference  NYSSELA Perspectives