AQUAINT Pilot evaluation for knowledge-oriented systems April 20, 2005.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

“Putting the pieces together – as a community”. Certification recognizes the experience, knowledge and skill of an individual as measured against a standard.
Eva Sørensen Department of Chemical Engineering University College London Experiences of using peer assessment in a 4th year design module.
Item Writing Techniques KNR 279. TYPES OF QUESTIONS Closed ended  Checking yes/no, multiple choice, etc.  Puts answers in categories  Easy to score.
Designing Content Targets for Alternate Assessments in Science: Reducing depth, breadth, and/or complexity Brian Gong Center for Assessment Web seminar.
End of Waste – Experiences from the UK Roger Hoare, Environment Agency 3rd October 2013 Malta 2013 IMPEL Conference on Implementation and Enforcement of.
Presenting DFA Results to Decision Makers 2003 CAS Research Working Party: Executive Level Decision Making using DFA Nathan Babcock, ACAS.
The CEMS Faculty Information System Project 23 June 2006.
Item Analysis What makes a question good??? Answer options?
National Center on Educational Outcomes (NCEO) Overview of Existing Alternate Assessments Based on Modified Academic Achievement Standards (AA-MAS) Sheryl.
Lesson Eight Standardized Test. Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized Test Scaling.
Copyright © 2006 Software Quality Research Laboratory DANSE Software Quality Assurance Tom Swain Software Quality Research Laboratory University of Tennessee.
Social Constructionism: Knowledge is a Creation Based on Social Interests Science is not the objective accumulation of facts, because “facts” are answers.
1 Natural Language Processing for the Web Prof. Kathleen McKeown 722 CEPSR, Office Hours: Wed, 1-2; Tues 4-5 TA: Yves Petinot 719 CEPSR,
Reality Check For Extension Programs Deborah J. Young Associate Director University of Arizona Cooperative Extension.
Lesson Thirteen Standardized Test. Yuan 2 Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized.
EGR 106 – Project Intro / While Loop Project description Project schedule Background information / terminology A simple example Examples of similar programs.
Mediation Forum Welcome to York. Mediation – our journey Making the difference Setting up and doing mediation Belief in mediation.
1 An Analytical Evaluation of BPMN Using a Semiotic Quality Framework Terje Wahl & Guttorm Sindre NTNU, Norway Terje Wahl, 14. June 2005.
The Learning Manager Simulation: An Interactive Exercise to Improve Decision Making in a Learning & Development Function Developed By Tata Interactive.
Assessment Literacy Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Selected-response Tests.
Lynn Thompson Center for Applied Linguistics Startalk Network for Program Excellence Chicago, Illinois October 16-18, 2009 Formative and Summative Assessment.
Assessing Students With Disabilities: IDEA and NCLB Working Together.
Get Certified as a Base and Advanced SAS Programmer Pinchao Ma June 12 th 2014.
TESTING PRINCIPLES BY K.KARTHIKEYAN. PRINCIPLES Principle 1. Testing is the process of exercising a software component using a selected set of test cases,
LT Leading Teams E.S. Tunis and Associates Inc. Welcome to ESTA’s learning design workshop. The ideas developed in this workshop will set the foundation.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Session 2 Traditional Assessments Session 2 Traditional Assessments.
Status meeting 3 – 28/29 April 2008 Sampler Education Pilot Project Deliverables: methods, templates and formats for examination material Anke Oberender.
REAL WORLD RESEARCH THIRD EDITION Chapter 8: Designs for Particular Purposes: Evaluation, Action and Change 1©2011 John Wiley & Sons Ltd.
Acknowledgements Contact Information Objective An automated annotation tool was developed to assist human annotators in the efficient production of a high.
Encouraging translator autonomy through discovery-based learning Andrew Rothwell.
Project 3 Moving to the Next Level: A Scoping Study EPSRC EP/D503981/1 Dolores Anon Higon (Aston), Giuliana Battisti (Aston), Neil Burns (Loughborough),
The Why (Waiver & Strategic Plan) Aligned to research: MET Study Components: Framework/Multiple Measures Pilot Requirements Timeline.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
A ssessment & E valuation. Assessment Answers questions related to individuals, “What did the student learn?” Uses tests and other activities to determine.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Introduction to the Course January.
Final Project and Term Paper Requirements Qiang Yang, MTM521 Material.
Basic Science Terms  Observation: using the five senses to gather information, which can be proven (facts)  Inference: an opinion based on facts from.
Project Management Michael L. Collard Department of Computer Science Kent State University.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
1 FIMS Data Validations And Lessons Learned Phil Dalby, P.E. Office of Engineering and Construction Management Facilities and Infrastructure Team FIMS.
Acos2010.wikispaces.com. ACT Provides the Following:  a standards-based system of assessments to monitor progress toward college and career readiness.
COM362 Knowledge Engineering Exam Revision 1 John MacIntyre
Justification/ExplanationEvaluation Breakout Session 6/13/02 Stefano BertoloRichard Fikes AQUAINT PI Meeting Monterey, California June 11-13, 2002.
Do not on any account attempt to write on both sides of the paper at once. W.C.Sellar English Author, 20th Century.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
1 AN INTRODUCTION TO TREAT.INFO WORKSHOPS FOR FACILITATORS.
Session 4 Workshop Sand Art Performance ALLOTED TIME: 30 MINUTES ONLY Designing Scoring Rubrics As a team (4 members), create your own class project rubric.
Knowledge Management Challenges for Question Answering Vinay K. Chaudhri SRI International White Paper Co-authors: Ken Barker (UT), Tom Garvey (SRI), Ken.
Models and Instruments for CDIO Assessment Content Validity – mapping the CDIO syllabus to questionnaire items  The role of specificity: Difficulty, context,
Tender Briefing Session COR34/2016
Reading Comprehension
District Test Coordinator
Take-Home Message: Principles Unique to Alternate Assessments
DUMMIES RELIABILTY AND VALIDITY FOR By: Jeremy Starkey Lijia Zhang
Basic Science Terms Observation: using the five senses to gather information, which can be proven (facts) Inference: an opinion based on facts from observations.
ECE361 Engineering Practice
Lietta Scott, PhD Arizona Department of Education
Running an external review of a Queensland government RTI decision
Federal Policy & Statewide Assessments for Students with Disabilities
District Test Coordinator
Memories are made of these: ambiguity and sense-making
Graphics Practice.
Style You need to demonstrate knowledge and understanding beyond undergraduate level and should also reach a level of scope and depth beyond that taught.
Welcome to an Introduction to Technology at Illinois State University!
A LEVEL Paper Three– Section A
Social Research Methods
Format of Presenting Analysis-based Design Decisions
Welcome to Medical Resourcing
Presentation transcript:

AQUAINT Pilot evaluation for knowledge-oriented systems April 20, 2005

AQUAINT Represented Projects Arizona Brandeis Cycorp ICSI Illinois ISI LCC MIT Monmouth PARC Stanford UT Dallas

AQUAINT Tentative Agenda 10: :15 Welcome 10: :15 Survey/prioritization of general issues and controversies 11: :15 Discussion/resolution of most significant disagreements 12:15 - 1:15 Lunch 1:15 - 2:00 Resolution of other general issues (if needed) 2:00 - 3:00 Examination/discussion of particular examples (about 5 examples per team) 3:00 - 3:15 Break 3:15 - 4:15 Continue examination/discussion of team examples 4:15 - 4:45 Technical issues: data formats, answer coding, scoring 4:45 - 5:30 Next steps, goals for June PI meeting 5:30 Depart

AQUAINT Decisions Form of challenge: Inference-based questions (yes-no, choice, wh, …) Development and test sets: –Open domain, but no reliance on specialist knowledge; general common sense –Include long chains of inference, but shorter ones too –Allow constructed and natural examples –Not aiming to produce training data System output: –Mandatory: Response, strict/plausible –Optional: System justifications (e.g. linguistic/world-knowledge), human explanations, system confidence Annotation: –Mandatory: Passage, Question, Response, strict/plausible, linguistic/world- knowledge, True/False/Unknown (to allow for distractors) –Optional: Characterize the particular knowledge (world, linguistic) that an answer depends on, assumptions (including which interpretation if ambiguous), context-type (belief, plan,…), annotator’s confidence, provenance

AQUAINT Next Steps Annotation guidelines: May 20 subcommittee: Crouch (PARC), Sauri (Brandeis), Fowler (LCC) Format in uniform way: XML specified by Kaplan (PARC), Fowler(LCC) May 25 Data validation: Small number of examples passed around by June 1 Evaluation and Scoring: Discussion at June PI meeting –What do we want to learn? –How is it done? –How is it presented?

AQUAINT