C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles, September 10, 2002 UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation, Standards, and Student Testing

C R E S S T / U C L A “High stakes should not be associated with the results of any assessment until the qualities of validity, reliability, and fairness have been addressed.” Raising Standards for American Education, National Council on Education Standards and Testing, 1992 (p. 27)

C R E S S T / U C L A Tests and Assessments Are Intended to Be:  The operational arm of reform directing attention to standards  The target productively motivating student, teacher, and administrator performance  The basis on which rewards, help, and sanctions are based  The systematic signs to the public that their schools are providing quality education  An integral part of the process of educational and instructional design and improvement: A major validity issue

C R E S S T / U C L A Theory of Action of Assessment Systems: “Knowledge Is Power” Assessments are standards-based, sensitive to quality instruction, and responsive to legitimate changes in actions The results reported are accurate The results are validly interpreted The responsible individuals are willing to act and can motivate action by team members Practical actions to improve the situation are known and available

C R E S S T / U C L A Theory of Action for Assessment Systems (Cont’d) Cognizant individuals and team members possess the requisite knowledge to apply alternative methods The selected actions are adequately implemented The actions will improve subsequent results Barriers to improvement have lower strength than the desire to achieve goals, and clear and powerful incentives support positive actions

C R E S S T / U C L A Checking How Well Tests and Assessments Represent the Underlying Reality of Learning and Performance  How well do the tests extract key elements known to be essential for competence in the domain?  What is the relationship of the test design to other evidence of learning in the domain?  Does performance, or some of its attributes, transfer to other subject matters (generalize)?  Does performance really predict next level and/or exit criteria (vertical transfer)?

C R E S S T / U C L A Imagine that Generalization and Transfer Were Our Real Goals (They Are!)  Most tests used for accountability are general and lightly sample content  Are tests results valid for the “Standards” rather than just for the included items?  Are tests designed using domain-independent and domain-specific research knowledge as well as magical psychometric properties?  How do multiple measures get used?

C R E S S T / U C L A Using Same Measures for Different Purposes  Instruction, monitoring, accountability  Too much testing, too much cost  Little evidence of multiple validity  Can situation be fixed?  Options Design multi-purpose tests Aggregating up from teacher assessment (NRC, 2001)— Capacity built within districts and supported by technology

C R E S S T / U C L A Using Multiple Measures to Improve Validity  Multiple ways to measure standards: validity, fairness, transfer  Common framework for all assessments  Multiple levels: classroom, district, state  Technology options: models, templates, objects

C R E S S T / U C L A Ideal Assessment Design Requirements  Operational specification of the domain Domain-independent cognitive demands Domain-dependent learning model Well-sampled content, including prior knowledge requirements Task templates and situation descriptions Process and criteria for open-ended performance

C R E S S T / U C L A Ideal Assessment Design Requirements (Cont’d)  Evidence Horizontal transfer across situations, formats, similar content (standards) Vertical relationships predicting development and progress Standards set (cut scores) at real boundaries Differential sensitivity to test prep vs. teaching significant content and intellectual skills

C R E S S T / U C L A Research Rarely Used in Validity Discussions  Summary of findings from research on learning and instruction  Learning is highly specific  If transfer is expected, it must be taught  No procedures vs. strategies provided for teachers or learners

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

Similar presentations

Presentation on theme: "C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

Similar presentations

Presentation on theme: "C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,"— Presentation transcript:

Similar presentations

About project

Feedback