Talk Schedule Question Answering from Email Bryan Klimt July 28, 2005.

Slides:



Advertisements
Similar presentations
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Advertisements

Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
Partitioned Logistic Regression for Spam Filtering Ming-wei Chang University of Illinois at Urbana-Champaign Wen-tau Yih and Christopher Meek Microsoft.
Search Engines and Information Retrieval
Information Retrieval in Practice
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Copyright © 2001 eMotion, Inc. All Rights Reserved r Metadata Issues Sharon Flank eMotion, Inc.
Information Retrieval
Introduction to Machine Learning Approach Lecture 5.
/* iComment: Bugs or Bad Comments? */
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
A Tool for Supporting Integration Across Multiple Flat-File Datasets Xuan Zhang, Gagan Agrawal Ohio State University.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
SaariStory: A framework to represent the medieval history of Saarland Michael Barz, Jonas Hempel, Cornelius Leidinger, Mainack Mondal Course supervisor:
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Bug Localization with Machine Learning Techniques Wujie Zheng
Applying the KISS Principle with Prior-Art Patent Search Walid Magdy Gareth Jones Dublin City University CLEF-IP, 22 Sep 2010.
NCSU Libraries Kristin Antelman NCSU Libraries June 24, 2006.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
© RightNow Technologies, Inc. Ask The Experts: Getting the most out of Smart Assistant David Fulton, Product Manager, Web Experience Center Of Excellence,
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Presenter: Shanshan Lu 03/04/2010
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Ranking of Web Services Eyhab Al-Masri. Outline Discovery of Web Services 1 Ranking of Web Services 2 Approaches 3 Conclusion 4 Q & A 5.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Querying Web Data – The WebQA Approach Author: Sunny K.S.Lam and M.Tamer Özsu CSI5311 Presentation Dongmei Jiang and Zhiping Duan.
Chapter 23: Probabilistic Language Models April 13, 2004.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Medical Information Retrieval: eEvidence System By Zhao Jin Mar
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Chapter 10 Algorithmic Thinking. Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
NTU & MSRA Ming-Feng Tsai
Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Introduction to Biometrics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #6 Guest Lecture + Some Topics in Biometrics September 12,
Natural Language Processing Vasile Rus
Information Retrieval in Practice
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Information Retrieval (in Practice)
N. Capp, E. Krome, I. Obeid and J. Picone
CS246: Information Retrieval
Information Retrieval and Web Design
Introduction to Search Engines
USING NLP TO MAKE UNSTRUCTURED DATA HIGHLY ACCESSABLE
Presentation transcript:

Talk Schedule Question Answering from Bryan Klimt July 28, 2005

Project Goals To build a practical working question answering system for personal To learn about the technologies that go into QA (IR,IE,NLP,MT) To discover which techniques work best and when

System Overview

Dataset 18 months of (Sept 2003 to Feb 2005) 4799 total 196 are talk announcements hand labelled and annotated 478 questions and answers

A new arrives… Is it a talk announcement? If so, we should index it.

Classifier Data Logistic Regression Decision Logistic Regression Combo

Classification Performance precision = 0.81 recall = 0.66 (previous works had better performance) Top features: –abstract, bio, speaker, copeta, multicast, esm, donut, talk, seminar, cmtv, broadcast, speech, distinguish, ph, lectur, ieee, approach, translat, professor, award

Annotator Use Information Extraction techniques to identify certains types of data in the s –speaker names and affiliations –dates and times –locations –lecture series and titles

Annotator

Rule-based Annotator Combine regular expressions and dictionary lookups defSpanType date =:...[re('^\d\d?$') ai(dayEnd)? ai(month)]...; matches “ 23rd September ”

Conditional Random Fields Probabilistic framework for labelling sequential data Known to outperform HMMs (relaxation of independence assumptions) and MEMMs (avoid “label bias” problem) Allow for multiple output features at each node in the sequence

Rule-based vs. CRFs

Both results are much higher than in previous study For dates, times, and locations, rules are easy to write and perform extremely well For names, titles, affiliations, and series, rules are very difficult to write, and CRFs are preferable

Template Filler Creates a database record for each talk announced in the This database is used by the NLP answer extractor

Filled Template Seminar { title = “Keyword Translation from English to Chinese for Multilingual QA” name = Frank Lin time = 5:30pm date = Thursday, Sept. 23 location = 4513 Newell Simon Hall affiliation = series = }

Search Time Now the is index The user can ask questions

IR Answer Extractor Where is Frank Lin’s talk? txt search[468:473]: "frank" search[2025:2030]: "frank" search[474:477]: "lin” txt search[580:583]: "lin” txt search[2283:2286]: "lin" Performs a traditional IR (TF-IDF) search using the question as a query Determines the answer type from simple heuristics (“Where”- >LOCATION)

IR Answer Extractor

NL Question Analyzer Uses Tomita Parser to fully parse questions to translate them into a structured query language “Where is Frank Lin’s talk?” ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”)))

NL Answer Extractor Simply executes the structured query produced by the Question Analyzer ((FIELD LOCATION) (FILTER (NAME “FRANK LIN”))) select LOCATION from seminar_templates where NAME=“FRANK LIN”;

Results NL Answer Extractor -> IR Answer Extractor -> 0.755

Results Both answer extractors have similar (good) performance IR based extractor –easy to implement (1-2 days) –better on questions w/ titles and names –very bad on yes/no questions NLP based extractor –more difficult to implement (4-5 days) –better on questions w/ dates and times

Examples “Where is the lecture on dolphin language?” –NLP Answer Extractor: Fails to find any talk –IR Answer Extractor: Finds the correct talk –Actual Title: “Natural History and Communication of Spotted Dolphin, Stenella Frontalis, in the Bahamas” “Who is speaking on September 10?” –NLP Extractor: Finds the correct record(s) –IR Extractor: Extracts the wrong answer –A talk “10 am, November 10” ranks higher than one on “Sept 10th”

Future Work Add an annotation “feedback loop” for the classifier Add a planner module to decide which answer extractor to apply to each individual question Tune parameters for classifier and TF-IDF search engine Integrate into a mail client!

Conclusions Overall performance is good enough for the system to be helpful to end users Both rule-based and automatic annotators should be used, but for different types of annotations Both IR-based and NLP-based answer extractors should be used, but for different types of questions

DEMO