1 Measuring the Link Between Learning and Performance Eva L.Baker UCLA Graduate School of Education & Information Studies National Center for Research.

Slides:

Advertisements

Similar presentations

National Accessible Reading Assessment Projects Defining Reading Proficiency for Accessible Large Scale Assessments Principles and Issues Paper American.

Advertisements

TWS Aid for Supervisors & Mentor Teachers Background on the TWS.

The Teacher Work Sample

Intelligence Step 5 - Capacity Analysis Capacity Analysis Without capacity, the most innovative and brilliant interventions will not be implemented, wont.

Bringing it all together!

Test Automation Success: Choosing the Right People & Process

The Network of Dynamic Learning Communities C 107 F N Increasing Rigor February 5, 2011.

Eva L. Baker and Girlie Delacruz UCLA / CRESST Council of Chief State School Officers National Conference on Student Assessment Session: A Vital Goal:

Leaders in Asset Management Establishing a Property Training Program How do we get Property Officials trained AND….motivated?

American Educational Research Association Annual Meeting New York, NY - March 23, 2008 Eva L. Baker and Girlie C. Delacruz What Do We Know About Assessment.

Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI International Robert J. Mislevy & Min Liu University of Maryland Geneva Haertel SRI.

© Tefko Saracevic, Rutgers University 1 EVALUATION in searching IR systems Digital libraries Reference sources Web sources.

C R E S S T / U C L A Improving the Validity of Measures by Focusing on Learning Eva L. Baker CRESST National Conference: Research Goes to School Los Angeles,

C R E S S T / U C L A Center for the Study of Evaluation National Center for Research on Evaluation, Standards, and Student Testing (CRESST) New Models.

Human Resource Management: Gaining a Competitive Advantage

Cognitive Science Overview Design Activity Cognitive Apprenticeship Theory Cognitive Flexibility Theory.

Planning, Instruction, and Technology

performance INDICATORs performance APPRAISAL RUBRIC

Purpose Program The purpose of this presentation is to clarify the process for conducting Student Learning Outcomes Assessment at the Program Level. At.

Principles of Assessment

Introduction to Conducting Research. Defining Characteristics of Conducting Research An inquiry process that involves exploration. Taps into the learner's.

Mental Skills Project LTA Senior Coach Course. Mental skills project consists of a presentation that has to be delivered during module 6 of the course.

Meeting SB 290 District Evaluation Requirements

COPYRIGHT WESTED, 2010 Calipers II: Using Simulations to Assess Complex Science Learning Diagnostic Assessments Panel DRK-12 PI Meeting - Dec 1–3, 2010.

ACADEMIC PERFORMANCE AUDIT

Striving for Quality Using continuous improvement strategies to increase program quality, implementation fidelity and durability Steve Goodman Director.

NEXT GENERATION BALANCED ASSESSMENT SYSTEMS ALIGNED TO THE CCSS Stanley Rabinowitz, Ph.D. WestEd CORE Summer Design Institute June 19,

Human Resource Management Gaining a Competitive Advantage

Chapter 6 Training and Development in Sport Organizations.

Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.

Focus on Learning: Student Outcomes Assessment and the Learning College.

Standards-Based Science Instruction. Ohio’s Science Cognitive Demands Science is more than a body of knowledge. It must not be misperceived as lists of.

GUIDELINES ON CRITERIA AND STANDARDS FOR PROGRAM ACCREDITATION (AREA 1, 2, 3 AND 8)

Forum - 1 Assessments for Learning: A Briefing on Performance-Based Assessments Eva L. Baker Director National Center for Research on Evaluation, Standards,

© Regents of University of California 1 Functional Validity: Extending the Utility of State Assessments Eva L. Baker, Li Cai, Kilchan Choi, Ayesha Madni.

March 26-28, 2013 SINGAPORE CDIO Asian Regional Meeting and Workshop on Engineering Education and Policies for Regional Leaders Programme Evaluation (CDIO.

UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing V3, 4/9/07 SIG: Technology.

TWS Aids for Student Teachers & Interns Overview of TWS.

Performance and Portfolio Assessment. Performance Assessment An assessment in which the teacher observes and makes a judgement about a student’s demonstration.

Baker ONR/NETC July 03 v.4  2003 Regents of the University of California ONR/NETC Planning Meeting 18 July, 2003 UCLA/CRESST, Los Angeles, CA ONR Advanced.

ONR/NSF Technology Assessment of Web-Based Learning, v3 © Regents of the University of California 6 February 2003 ONR/NSF Technology Assessment of Web-Based.

ACADEMIC PERFORMANCE AUDIT ON AREA 1, 2 AND 3 Prepared By: Nor Aizar Abu Bakar Quality Academic Assurance Department.

Fourth session of the NEPBE II in cycle Dirección de Educación Secundaria February 25th, 2013 Assessment Instruments.

1/27 CRESST/UCLA DIAGNOSTIC/PRESCRIPTIVE USES OF COMPUTER- BASED ASSESSMENT OF PROBLEM SOLVING San-hui Sabrina Chuang CRESST Conference 2007 UCLA Graduate.

LEARNER CENTERED APPROACH

Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.

Bridge Year (Interim Adoption) Instructional Materials Criteria Facilitator:

1 Assessing Student Understanding David Niemi UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards,

Knowing What Students Know: The Science and Design of Educational Assessment Committee on the Foundations of Assessment Board on Testing and Assessment,

The Leadership Challenge in Graduating Students with Disabilities Guiding Questions Joy Eichelberger, Ed.D. Pennsylvania Training and Technical Assistance.

1 Chapter 18: Selection and training n Selection and Training: Last lines of defense in creating a safe and efficient system n Selection: Methods for selecting.

1 Science, Learning, and Assessment: (Eats, Shoots, and Leaves) Choices for Comprehensive Assessment Design Eva L. Baker UCLA Graduate School of Education.

21 st Century Learning and Instruction Session 2: Balanced Assessment.

UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing V4, 1/18/07 Research.

FLORIDA EDUCATORS ACCOMPLISHED PRACTICES Newly revised.

Chapter 6 Assessing Science Learning Updated Spring 2012 – D. Fulton.

Logistics Community Competency Initiative Update November 20, 2006 Mark Tregar, CNAC Judith Bayliss, DAU “The Total Force must continue to adapt to different.

Using Cognitive Science To Inform Instructional Design

Smarter Balanced Assessment Results

Teaching and Learning with Technology

Assist. Prof.Dr. Seden Eraldemir Tuyan

ASSESSMENT OF STUDENT LEARNING

THE JOURNEY TO BECOMING

Human Resources Competency Framework

Standard Setting for NGSS

Section VI: Comprehension

Using the 7 Step Lesson Plan to Enhance Student Learning

TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1

LEARNER-CENTERED PSYCHOLOGICAL PRINCIPLES. The American Psychological Association put together the Leaner-Centered Psychological Principles. These psychological.

Presentation transcript:

1 Measuring the Link Between Learning and Performance Eva L.Baker UCLA Graduate School of Education & Information Studies National Center for Research on Evaluation, Standards, and Student Testing (CRESST) Supported by the Naval Education and Training Command, the Office of Naval Research, and the Institute of Education Sciences July 27, 2005 – Arlington, VA The findings and opinions expressed in this presentation do not reflect the positions or policies of the Naval Education and Training Command, the Office of Naval Research, or the Institute of Education Sciences

2 Goals for the Presentation n Consider methods to strengthen the link between learning and performance n Use cognitively based assessment to structure and measure objectives during instruction, post-training and on the job n Emphasize design of core architecture reusable tools to build & measure effective, life-long competencies n Identify benefits and savings for the Navy

3 National Center for Research on Evaluation, Standards, and Student Testing (CRESST) Consortium of R&D performers led by UCLA: n USC, Harvard, Stanford, RAND, UC Santa Barbara, Colorado n CRESST partners with other R&D organizations

4 National Center for Research on Evaluation, Standards, and Student Testing (CRESST) [Cont’d] n Mission –R&D in measurement, evaluation, and technology leading to improvement in learning and performance settings –Set the national agenda in R&D in the field –Validity, usability, credibility –Focus on rapidly usable solutions and tools –Tools allow reduced cycle time from requirements to use

5 National Center for Research on Evaluation, Standards, and Student Testing (CRESST) [Cont’d] n President-Elect AERA; 7 former presidents n Chair, Board on Testing and Assessment, National Research Council, The National Academies n Standards for Educational and Psychological Testing (1999) n Army Science Board, Defense Science Board task forces n History of DoD R&D, ONR, NETC, OSD, ARI, TRADOC, ARL, U.S. Marine Corps; NATO n Congressional councils and testimony n Multidisciplinary staff

6 Assessment in Practice © 1965 Fantasy Records

7 State of Testing in the States n External, varying standards and tests from States n Range of targets (AYP) n Short timeline to serious sanctions n Raised scores only “OK” evidence of learning n Are there incentives to measure “high standards”? n Are there incentives to create assessments that respond to quality instruction ? n Growing enthusiasm for use of classroom assessment for accountability n Benchmark tests n Need for new ways to think about the relationship of accountability, long-term learning and performance

8 Language Check n Cognitive Model: research synthesis used to create architecture for tests and measures (and for instruction) n Ontology: formal knowledge representation (in software) of a domain of knowledge, showing relationships—sources, experts, text, observation; used in tools for assessment design n Formative assessment: assessment information to pinpoint needs (gaps, misconceptions) for improvement in instruction or on-the-job n Transfer: ability to use knowledge in different contexts

9 Learning Research n Efficient learning demands understanding of principles or big ideas (schema) and their relationships (mental models) n Learning design needs to take into account limits of working memory n Strong evidence for formative assessment: motivated practice with informative feedback n Assessment design needs to link pre-, formative, end-of-training, and refresher measures n Specification of full domain and potential transfer areas

10 Measure Design: Learning Research n Focus first on what is known about improved learning as the way to design measures: acquisition, retention, expertise, automaticity, transfer n Science-based, domain-independent cognitive demands (reusable) objects—paired with content and context to achieve desired knowledge and skills n Criterion performance is based on expertise models (not simply rater judgments) n Design and arrangement of objects is architecture for learning and measurement

11 Measurement Purposes System or Program n Needs sensing n System monitoring n Evaluation n Improvement n Accountability Individual/Team n Selection/Placement n Opt out n Diagnosis n Formative/Progress n Achievement n Certification/Career n Skill retention n Transfer of learning 5 Vector

12 Changes in Measurement/ Assessment Policy and Practices n From: One purpose, one measure n To: Multiple purposes—well-designed measure(s) with proficiency standards n Difficult to retrofit measure designed for one purpose to serve another n Evidence of technical quality? Methods of aggregation? Scaling? Fairness

13 5-Vector Implications n More than one purpose for data from tests, performance records, assessments –improvement of trainee KSAs –improvement of program effectiveness; evaluation of program or system readiness/effectiveness –certification of individual/team performance –personnel uses  Challenge: comparability

14 Multipurpose Measurement/ Metrics* n Place higher demands on technical quality of measures n Suggest more front-end design, to support adaptation and repurposing n Full representation (in ontologies or other software-supported structures) to link goals, enabling objectives, and content n A shift in the way to think about learning and training * Metrics are measures in a framework for interpretation; a ratio of achievement to time, cost, benchmarks

15 CRESST Model-Based Assessment n Reusable measurement objects to be linked to skill objects n First, depends upon cognitive analysis (domain independent, e.g., problem solving) n Essential to institute in a well-represented content or skill area (strategies and knowledge developed from experts* n May use different forms of cognitive analysis n May behavioral formats, templates –multiple choice, simulated performance, AAR, game settings, written responses, knowledge representations (maps), traces of procedures in technology, checklists

16 Cognitive Human Capital Model-Based Assessment Content Understanding Problem Solving Teamwork and Collaboration MetacognitionCommunication Learning

17 CRESST Approach n Summarize scientific knowledge about learning n Find cognitive elements that can be adapted and reused in different topics, subjects and age levels. These elements make a “family” of models n Embed model in subject matter n Focus on “Big” content ideas to support learning and application n Create templates, scoring schemes, training, and reporting systems (authoring systems available) n Conduct research (we do) to assure technical quality and fairness

18 Alignment Weak copyright 2004 DK Cavanaugh U.S. Department of Energy Human Genome Program,

19 Generally, How HCMBA Works n Understanding a procedure  Knowing what the components of the procedure are  Knowing when to execute the procedure, including symptom detection, and search strategies to confirm problem  Knowing principles underlying procedure  Knowing how to execute the procedure  Knowing when the procedure is off task or not working  Repair options l Ability to explain task completed AND describe steps for a different system (transfer)  Embed in content and context  Worked example  Executing procedure with feedback loops  Criterion testing—comparison benchmarks

20 Content/ Skill Ontology

21 Examples of Model-Based Assessment n Risk Assessment EDO –Cognitive demands of skill include problem identification, judging urgency, constraints and costs –Content demands involve prior knowledge in task, e.g., ship repair, knowledge needed to find alternatives, vendors, conflicting missions, etc., principles of optimization vs cycle time

22 EDO Risk Management Simulation* *CRESST/ USC/BTL’s iRides

23 Ontology of M-16 Marksmanship

24 Model-Based Example: M-16 Marksmanship Marksmanship Inventory Knowledge Assessment Knowledge Mapping Evaluation of Shooter Positions Shot-to-Shot Analysis Cognitive Demand Fidelity Current Work: Performance Sensing Diagnosis/ Prescription... using technologies – sensors, ontologies, and Bayes nets – to identify knowledge gaps and determine remediation and feedback Building on the science of measures of performance...

25 M-16 Marksmanship Example Scenario “The shooter is calling right but his rounds are hitting left of the target.” Task “Diagnose and then correct the shooter's problem” Information sources Position Target Shooter’s notebook Rifle Mental state, gear, fatigue, anxiety Wind flags

26 M-16 Marksmanship Improvement Diagnosis and prescription individualized feedback and content Sensing and assessment information content

27 Language Check n Validity: appropriate inferences are drawn from test(s) n Reliability: assessments give consistent and stable findings n Accuracy: respondents are placed in categories where they belong

28 CRESST Evidence-Based Validity Criteria for HC Assessment Models* n Cognitive complexity n Reliable or dependable n Accuracy of content/skill domain n Instructionally sensitive n Transfer and generalization n Learning focused n Validity evidence reported for each purpose n Fair n Credible * Baker, O’Neil, & Linn, American Psychologist, 1993

29 Interplay of Model-Based Design, Development, and Validity Evidence n Experiment on prompt specificity n Studies of extended embedded assessments n Studies of rater agreement and training n Studies of collaborative assessment n Studies of utility across age ranges and subjects n Reusable models (without CRESST hands-on) n Scaling-up to thousands of examinees in a formal context n Experimental studies of prior knowledge n Criterion validity studies n Studies of generalizability within subject domains n Studies of L1 impact n Studies of OTL n Studies of instructor’s knowledge n Cost and feasibility studies* n Prediction of distal outcomes n Experimental studies of instructional sensitivity

30 Report Objects

31 Measure Authoring ScreenShot

32 Summary of Tools n Tools include cognitive demands for particular classes of KSAs, to be applied in templates, objects, or other formats represented in authoring systems n Specific domain or task ontology (knowledge representation of content) n Ontological knowledge fills slots in the templates or objects n Commercial ontology systems available n Measurement authoring systems for HC Assessment Models (with evidence)

33 OUTCOME 1: Coherence n Coherent macro architecture for training and operations and measurement n Coherent view from the sailor, management and system views-to support training, retraining, assessment occurs in new environments (distance learning) 5 vector

34 OUTCOME 2: Cost Savings n Each model has reusable templates and objects, empirically validated, to match cognitive requirements n Freestanding measures do not need to be designed and revalidated anew for each task n Cost of design drops, cost of measures drops, throughout life cycle n Common framework supports retention and transfer of learning n Common HCA objects will simplify demands on trainer n Multiple-purposed measures will need different reporting metrics but should have common reporting framework

35 OUTCOME 3: More Trustworthy Evidence of Effectiveness, Readiness, or Individual or Team Performance n Common frameworks for assessment n Ontology (full representation of content) n Instructional strategies to support learning and transfer n Aggregation of outcomes using common metrics n Standard reporting formats for each assessment purpose

36 OUTCOME 4: Flexibility and Reduced Volatility Within a General Structure n Plenty of room for differential preferences by leaders of different configurations or those with different training goals n Evidence in Navy projects, engineering courses, academic topics, across trainees with different backgrounds, in different settings, with different levels of skill of instructor n Easy-to-use guidelines and tools as exemplars

37 Trust EfficacyNetworks EffortTransparency Learning Organization Teamwork Skills Social/Organizational Capital in Knowledge Management-5 Vector Implications

38 Revolution = Opportunities and Constraints n Navy needs common framework so that their work can be easily integrated n Navy needs common metrics to assess their effectiveness and tools to interpret data n Navy needs to provide vendors with framework to permit achievement and performance integration of HCMA from multiple sources

39 CRESST Web Site

40 Back Up

41 Marksmanship Knowledge Inventory Diagnosis and prescription Output of the recommender: areas needing remediation and prescribed content