RADAR EVALUATION Goals, Targets, Review & Discussion Jaime Carbonell & soon Full SRI/CMU/IET RADAR Team 1-February-2005 School of Computer Science Supported.

Slides:



Advertisements
Similar presentations
Performance Assessment
Advertisements

WASH Cluster – Emergency Training S WASH STRATEGY Session 3 Strategic Planning S3 1.
IAEA Training in Emergency Preparedness and Response Development of Simulation Exercise Work Session (Drill) Module WS-012.
Functional Skills Support Programme OfQual Functional Skills Qualifications Criteria – Issued November 2009.
CREDIT RECOVERY AND COLLECTION. CHALLENGERS 1.Longer repayment period 2.Higher loan limits 3.Higher monthly installments 4.Many cases handling cash in.
Copyright 2010, The World Bank Group. All Rights Reserved. Statistical Project Monitoring Section B 1.
1. Module 1 Assemble the WSP team Session Structure Overview Actions Challenges Outputs Exercises 2.
Screen 1 of 24 Reporting Food Security Information Understanding the User’s Information Needs At the end of this lesson you will be able to: define the.
Scheduling with uncertain resources Search for a near-optimal solution Eugene Fink, Matthew Jennings, Ulaş Bardak, Jean Oh, Stephen Smith, and Jaime Carbonell.
Scheduling with uncertain resources Elicitation of additional data Ulaş Bardak, Eugene Fink, Chris Martens, and Jaime Carbonell Carnegie Mellon University.
Knowledge Translation Curriculum Module 3: Priority Setting Lesson 2 - Interpretive Priority Setting Processes.
Peer assessment of group work using WebPA Neil Gordon Symposium on the Benefits of eLearning Technologies University of Manchester, in conjunction with.
Scheduling with Uncertain Resources Reflective Agent with Distributed Adaptive Reasoning RADAR.
Action Implementation and Evaluation Planning Whist the intervention plan describes how the population nutrition problem for a particular target group.
1RADAR – Scheduling Task © 2003 Carnegie Mellon University RADAR – Scheduling Task May 20, 2003 Manuela Veloso, Stephen Smith, Jaime Carbonell, Brett Browning,
Training and assessing. A background to training and learning 1.
Project Management Basics
IS&T Project Management: How to Engage the Customer September 27, 2005.
Software Architecture. Agenda " Why architect? " What is architecture? " What does an architect do? " What principles guide the process of architecting?
The LCVP is funded by the Department of Education and Science under the National Development Plan Preparing students for Work Experience.
Formulating objectives, general and specific
Data Analysis in the Water Industry: A Good-Practice Guide with application to SW Deborah Gee, Efthalia Anagnostou Water Statistics User Group - Scottish.
Today’s website:
2008 Annual Meeting Assemblée annuelle Annual Meeting Assemblée annuelle Annual Meeting ● Assemblée annuelle 2008 Québec 2008 Annual.
Unit 5:Elements of A Viable COOP Capability (cont.)  Define and explain the terms tests, training, and exercises (TT&E)  Explain the importance of a.
Evaluating the Options Analyst’s job is to: gather the best evidence possible in the time allowed to compare the potential impacts of policies.
Using Business Scenarios for Active Loss Prevention Terry Blevins t
System Analysis and Design Dr. Taysir Hassan Abdel Hamid Lecture 5: Analysis Chapter 3: Requirements Determination November 10, 2013.
RI Educator Evaluation System Design ACEES Meeting October 25, 2010.
Organization Development and Change Organization Process Approaches.
Software Project Management With Usage of Metrics Candaş BOZKURT - Tekin MENTEŞ Delta Aerospace May 21, 2004.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
BIS3324 Group Assignment – Schedule, guideline & templates (2015-Jan Semester)
BSBPMG505A Manage Project Quality Manage Project Quality Project Quality Processes Diploma of Project Management Qualification Code BSB51507 Unit.
Automated Assistant for Crisis Management Reflective Agent with Distributed Adaptive Reasoning RADAR.
Strategic Planning Models EDU 572 Systems, Change and Planning Cardinal Stritch University Kristine Kiefer Hipp, Ph.D.
Lecture 7: Requirements Engineering
Integrated Risk Management Charles Yoe, PhD Institute for Water Resources 2009.
MTSS & Formative Assessment Mitch Fowler August 2013
1 Business Planning Park View Business & Enterprise School.
Analysis, Scoping and Costing. Analysis The purpose of analysis is to confirm the current needs of the business or marketplace. It defines – The current.
The Development of BPR Pertemuan 6 Matakuliah: M0734-Business Process Reenginering Tahun: 2010.
FMC Briefing Papers D Brown, Pols 321 Fall Briefing Paper assignment Goal: to prepare briefing papers relevant to your role (or your team’s role)
Apply Quality Management Techniques Project Quality Processes Certificate IV in Project Management Qualification Code BSB41507 Unit Code BSBPMG404A.
Business Analysis. Business Analysis Concepts Enterprise Analysis ► Identify business opportunities ► Understand the business strategy ► Identify Business.
0 For Government Use Only Central Performance Metric: Scenario and Tasks Scenario ……..... ….. ….. … Test Questions ……..... ….. ….. … T1.
Assessment Ice breaker. Ice breaker. My most favorite part of the course was …. My most favorite part of the course was …. Introduction Introduction How.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Erman Taşkın. Information security aspects of business continuity management Objective: To counteract interruptions to business activities and to protect.
RADAR May 5, RADAR /Space-Time Assistant: Crisis Allocation of Resources.
Directions for this Template  Use the Slide Master to make universal changes to the presentation, including inserting your organization’s logo –“View”
Analysis of Uncertain Data: Tools for Representation and Processing Bin Fu Eugene Fink Jaime G. Carbonell.
RADAR February 15, RADAR /Space-Time Learning.
Scheduling with Uncertain Resources Eugene Fink, Jaime G. Carbonell, Ulas Bardak, Alex Carpentier, Steven Gardiner, Andrew Faulring, Blaze Iliev, P. Matthew.
Info-Tech Research Group1 Manage IT Budgets & Cost World Class Operations - Impact Workshop.
Improve Own Learning and Performance. Progression from levels 1-3 Progression from levels 1-3 At all levels, candidates are required to show they can.
Automated Assistant for Crisis Management (Reflective Agent with Distributed Adaptive Reasoning) RADAR.
Marketing Essentials Mark Davis Senior Examiner Exam briefing December 2013.
RECOVER Role/Science Guidance during Design, Construction and Implementation Phases RECOVER Science Meeting March 1, 2016.
Scheduling with uncertain resources Collaboration with the user Eugene Fink, Ulaş Bardak, Brandon Rothrock, Jaime Carbonell Carnegie Mellon University.
Valorisation: getting added value from projects Elli Georgiadou Middlesex University School of Science and Technology.
Training Trainers and Educators Unit 8 – How to Evaluate
PROGRAM & POLICY EVALUATION ‘Goal Attainment Scaling’
RADAR/Space-Time: Allocation of Rooms and Vendor Orders
Training Trainers and Educators Unit 8 – How to Evaluate
Project Charter START IT! By Catherine B. Calio, PMP
Conduction of a simulation considering cascading effects
Scheduling with uncertain resources Search for a near-optimal solution
M3 D2 Effectively lead a team & evaluate your leadership abilities
GL 51 – Statistical evaluation of stability data
Presentation transcript:

RADAR EVALUATION Goals, Targets, Review & Discussion Jaime Carbonell & soon Full SRI/CMU/IET RADAR Team 1-February-2005 School of Computer Science Supported By DARPA IPTO PAL Program: “Personalized Assistant That Learns”

Carnegie Mellon University 2 Outline: Radar Evaluation Brief Review of Radar Challenge Task Evaluation Objectives: Obligation and Desiderata Evaluation Components: Radar Tasks Radar Metrics: Tasks  Meaningful Measures Putting it all together: Tin-man formula proposal

Carnegie Mellon University 3 The resolver needs to replan: gather information, commandeer other rooms, change schedules, post to websites, inform participants. The original plan has been disrupted. Conference wing A is no longer available. Other rooms may be affected. Test: Radar will assist a conference planner in a crisis situation. The test will be evaluated on quality and completeness of the new plan and on the successful completion of related tasks. Crisis Resolver RADAR NLP Planning & Scheduling Handler Learning Knowledge Base Conference Participants Website Conference Organizers Wing A Wing B

Carnegie Mellon University 4 Conference Re-planning Tasks Situation Assessment –Which resources have become unavailable –What alternative resources exist and at what price Tentative re-planning of conference schedule –Elicit and satisfy as many preferences as possible Validating conference schedule & resource allocation –Securing buy-in from key stakeholders (requires meeting) –Awaiting external confirmations (or default assumptions) –Modifying plan as/when needed Informing all stakeholders –Briefings to VIPs, Update website for participants Cope with background tasks (time permitting)

Carnegie Mellon University 5 Scoring Criteria (Adapted from Garvey) Task Realism –Must reflect RADAR challenge performance Sensitive to Learning –Must allow headroom beyond Y2 (no low ceiling) –Must include measurement of learning effects Auditable with Pride –Objective, Simple, Clear, Transparent, Statistically Sound, Replicable, … Comprehensive & Research-Useful –All RADAR modules included, albeit differentially –Responsive to RADAR scientific objectives

Carnegie Mellon University 6 Evaluation Components All RADAR Modules (Sched quality) –Time-Space Planning (TSP): Schedule quality –Meeting Scheduling (CMRadar): Meetings, bumps –Webmaster + Briefing Assistant (VIO) – + NLP: Other tasks completed: background Additional Learning Targets (?) –Relevant facts & preferences acquired –Strategic knowledge (when/how to apply K) Combination Function (Utility-like) –Linear weighted sum with +/- terms

Carnegie Mellon University 7 Example: Schedule Quality Metric W = Weight = importance of the session (e.g. keynote > posters) P = Penalty for distance from ideal (e.g. room smaller than target), linear or step fn f = factors of sessions (e.g. room size, duration, equipment, …) r = resource (e.g. ballroom at Flagstaff)

Carnegie Mellon University 8 Putting It All Together Normalizing components: Summing: or

Carnegie Mellon University 9 Next Steps for Evaluation Metrics Metrics for Other components Metrics for Learning Boost Discuss/Refine/Redo Combination –True open-ended scale? –Something other than weighted sum? –Quality metric w/o penalties (+ ’s only) Test in a full walk-through scenario –Refine the details –Don’t loose sight of objectives