Justification/ExplanationEvaluation Breakout Session 6/13/02 Stefano BertoloRichard Fikes AQUAINT PI Meeting Monterey, California June 11-13, 2002.

Slides:



Advertisements
Similar presentations
Here is how close you are to the knowledge or skills you are trying to develop, and heres what you need to do next.
Advertisements

Taking the Password Test for English Language Schools.
Integrating the gender aspects in research and promoting the participation of women in Life Sciences, Genomics and Biotechnology for Health.
Formative assessment of the Engineering Design process
1 The Basics of Capital Budgeting: Evaluating and Estimating Cash Flows Corporate Finance Dr. A. DeMaskey Should we build this plant?
5/8/ True Colors What’s Your True Color? Are You a… Gold? Orange? Green? Blue ? “Do Unto Others As They Would Have.
Draft Operational procedures for registry systems 09 November 2004 Bonn, Germany Technical Breakout Group.
EQuIP Rubric and Quality Review Curriculum Council September 26, 2014.
Title III Notice of Proposed Interpretations & Implications for California’s Accountability System Robert Linquanti Cathy George Project Director & Sr.
Secondary Data in Marketing Research External market data – trend analysis, competitive information, industry trends and leaders External Customer data.
1 Introduction to SOA. 2 The Service-Oriented Enterprise eXtensible Markup Language (XML) Web services XML-based technologies for messaging, service description,
Higher English Listening Assessment. Internally Assessed Units Creation and Production ▫creating at least one written text using detailed and complex.
How can blood type be used to exclude paternity?
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The difference between a course, curriculum, and syllabus. 2.
Presentation Title runs here l 00/00/00 Welcome to MyEconLab
Presentation slide 1.1 Aims of the literacy module – the main features and teaching strategies used during English lessons – the role of the TA in supporting.
Teaching literacy in TECHNOLOGY (MANDATORY) Stage 4 - Year 7
The Best Customer Service is No Customer Service? CSBA Website Assessment for Councils - Confidential - Presented August 2013.
HL Psychology Internal Assessment
© 2013 Cengage Learning. Outline  Types of Cross-Cultural Research  Method validation studies  Indigenous cultural studies  Cross-cultural comparisons.
Please check, just in case…. APA Tip of the Day: Paragraph and page formatting Your page margins should be 1” all around. You should NOT have any extra.
Reviewing the 2015 AmeriCorps Applications & Conducting the Review AmeriCorps External Review.
1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.
Evaluation and Testing course: Exam information 6 th semester.
Adequate Yearly Progress (AYP) Academic Performance Index (API) and Assessing California Standards Test (CST) Data.
Publication in scholarly journals Graham H Fleet Food Science Group School of Chemical Engineering, University of New South Wales Sydney Australia .
Chris Luszczek Biol2050 week 3 Lecture September 23, 2013.
How classroom talk supports reading comprehension.
ORDER OF OPERATIONS x 2 Evaluate the following arithmetic expression: x 2 Each student interpreted the problem differently, resulting in.
Lots of Squares: An Example from the Digital Library Sean Nank, PhD Tiffany Obrien Del Lago Academy October 25, 2014.
11 PJJ Course Outline Session: EDU 3230: Content-Based Second Language Instruction Nooreen Noordin (Dr.) Faculty of Educational Studies Universiti.
Adequate Yearly Progress (AYP) Academic Performance Index (API) and Analysis of the Mathematics Section of the California Standards Test (CST) Data Elementary.
Unit 1 Activity 2B Communication Barriers Report
Assessment Specifications Gronlund, Chapter 4 Gronlund, Chapter 5.
Course Outcomes, Assessments, and Activities AGENDA.
Final Project and Term Paper Requirements Qiang Yang, MTM521 Material.
Faculty Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) (Director) (Emeritus) Knowledge Systems Laboratory Stanford University “In the knowledge.
PSYCHOLOGY IA THE RESULTS. RATIONALE/PURPOSE The results section is where you report the results that you have found from your experiment. The results.
GRADE 10 REVIEW FOR FINAL EXAM June Part 1: Literary Terms (24 points)  Matching: Match the following words with the correct definition. Note:
FHN Guidance Needs Assessment
Learning to use the Interactive Online Classroom Classroom Activities.
Quick Write Reflection How will you implement the Engineering Design Process with your students in your classes?
What Does the User Really Want ? Relevance, Precision and Recall.
Learning Development Centre
Creating Performance Assessments “When students try to solve real-life problems, they see the relevance of schoolwork and are more likely to transfer the.
The FDES revision process: progress so far, state of the art, the way forward United Nations Statistics Division.
1 NJ Dept. of Health Decision Tree for eIRB Submission Revised: 01/25/2016 Is this research defined as: A systematic investigation which includes research.
1 Evaluation of Multi-Media Data QA Systems AQUAINT Breakout Session – June 2002 Howard Wactlar, Carnegie Mellon Yiming Yang, Carnegie Mellon Herb Gish,
Monitoring and Assessment Presented by: Wedad Al –Blwi Supervised by: Prof. Antar Abdellah.
1 Evaluation of Opinion Questions ä Session leaders: Ed Hovy, Kathy McKeown ä Topics ä Is evaluating opinion questions feasible at all? How can we construct.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
The context is: Special Places
Revision exercise - Improving the text May Horverak 2014.
Please check, just in case…
WALT: TALK ABOUT MY OWN ROOM.
الوحدة 20 مهارات التواصل مع الآخرين
What’s Your Behavioral Style?
What’s Your Social Style?
Chapter Six Training Evaluation.
IB Environmental Systems and Societies
Alignment of Part 4B with ISAE 3000
Quick Accounts Walkthrough.
LAW112 Assessment 3 Haley McEwen.
For x = -1, determine whether f is continuous from the right, or from the left, or neither. {image}
For x = 5, determine whether f is continuous from the right, or from the left, or neither. {image}
June 2012 Living Environment Regents
August 5, 2015 – Proposal Level
Performance analysis assessment – analysis and evaluation
Given that {image} {image} Evaluate the limit: {image} Choose the correct answer from the following:
Presentation transcript:

Justification/ExplanationEvaluation Breakout Session 6/13/02 Stefano BertoloRichard Fikes AQUAINT PI Meeting Monterey, California June 11-13, 2002

Knowledge Systems Laboratory, Stanford University2 Straw Man Proposal  General Evaluation Principles  Scope of the evaluation  Independence of correctness and justification  Required Characteristics  Accountability  Meaningful ranking of justifications  Understandability of justifications  Desirable Characteristics  Natural language presentation  Justification clustering  Justification persistence  Agent-accessible API

Knowledge Systems Laboratory, Stanford University3 General Evaluation Principles  Scope of the evaluation  Evaluating the quality of the justification(s) the system provides in support of the answer(s) it has returned for a given question  Not evaluating answers  Would be an add-on to other evaluations  Independence of correctness and justification  Evaluate justifications whether or not the answer they justify is correct

Knowledge Systems Laboratory, Stanford University4 Required Characteristics  Source Identification  A justification must identify the sources on which it depends  If a justification has multiple "steps" (where the meaning of "step" is system-dependent), it must identify the sources on which each step depends  Understandability  Justifications should be easily and quickly understandable  Understandability will be assessed by a panel of human evaluators  The modality of the presentation is left undetermined and need not be fluent English  Interpretation of question  Must provide understandable description of system’s interpretation of the question

Knowledge Systems Laboratory, Stanford University5 Evaluation Principles  What are we evaluating?  Quality of description of system’s rationale?  Quality of system’s rationale?  Quality of description  Query: Is aluminum a metal? Answer: Yes  Justification: >Source: National Enquirer …    Justification: >Text: “Aluminum is not a metal.” >Source: … >Method: Match words in query to words in sentence in source

Knowledge Systems Laboratory, Stanford University6 Evaluation Principles  What are we evaluating?  Quality of description of system’s rationale?  Quality of system’s rationale?  Quality of rationale  Query: Is aluminum a metal? Answer: Yes  Justification: >Text: “Aluminum is not a metal.” >Source: … >Method: Match words in query to words in sentence in source  Justification: >Source: National Enquirer …  

Knowledge Systems Laboratory, Stanford University7 Evaluation Principles  What are we evaluating?  Quality of description of system’s rationale?  Quality of system’s rationale?  Quality of rationale  How well does the justification support the answer?  General criteria: >Relevancy of sources >Quality of sources >Strength of inferential links Yes Yes

Knowledge Systems Laboratory, Stanford University8 Next Steps  Write proposal for 2003 evaluation  Determine likely participants in an evaluation  Establish an list for proposal discussions Send subscribe message to Stefano Send subscribe message to Stefano