INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Processing of large document collections Part 6 (Text summarization: discourse- based approaches) Helena Ahonen-Myka Spring 2006.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Information Retrieval in Practice
Mining and Summarizing Customer Reviews
AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Ling 570 Day 17: Named Entity Recognition Chunking.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
A semantic based methodology to classify and protect sensitive data in medical records Flora Amato, Valentina Casola, Antonino Mazzeo, Sara Romano Dipartimento.
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Alexey Kolosoff, Michael Bogatyrev 1 Tula State University Faculty of Cybernetics Laboratory of Information Systems.
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
1 Fine-grained and Coarse-grained Word Sense Disambiguation Jinying Chen, Hoa Trang Dang, Martha Palmer August 22, 2003.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
8 December 1997Industry Day Applications of SuperTagging Raman Chandrasekar.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Introduction to Machine Learning, its potential usage in network area,
Automatic Writing Evaluation
Queensland University of Technology
Exam Practice Paper 1 AO1: Apply appropriate methods of language analysis, using associated terminology and coherent written expression. AO2: Demonstrate.
Approaches to Machine Translation
Coarse-grained Word Sense Disambiguation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Sentiment analysis algorithms and applications: A survey
Learning Attributes and Relations
STANAG for Non-Specialists
Natural Language Processing (NLP)
张昊.
Improving a Pipeline Architecture for Shallow Discourse Parsing
Social Knowledge Mining
Extracting Semantic Concept Relations
Topic Oriented Semi-supervised Document Clustering
Automatic Detection of Causal Relations for Question Answering
Approaches to Machine Translation
CSE 635 Multimedia Information Retrieval
Text-to-Text Generation
Text Mining & Natural Language Processing
Natural Language Processing (NLP)
CS565: Intelligent Systems and Interfaces
Information Retrieval
Extracting Why Text Segment from Web Based on Grammar-gram
Natural Language Processing (NLP)
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

iNAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering

Brief Introduction In collaboration with iNAGO Inc. YorkU Team Elnaz Delpisheh (Post Doc) Heidar Davoudi (Ph.D.) Emad Gohari (Masters)

Automatic Q/A generation iNago Project Automatic Q/A generation

Steps and timeline Sentence Simplification Named Entity Information Semantic Role Labeling Generate Questions and Answers Importance of Generated Questions Context issues Human Evaluations

Sentence Simplification Sentences may have complex grammatical structure with multiple embedded clauses. We simplify the complex sentences with the intention to generate more accurate questions. Pre-processing and data cleaning is done. Complex Sentence (s): Apple’s first logo, designed by Jobs and Wayne, depicts Sir Isaac Newton sitting under an apple tree. Simple Sentence: Apples first logo depicts Sir Isaac Newton sitting under an apple tree. Apples first logo is designed by Jobs and Wayne. 

Named Entity Information NE tagger that tags a plain text with named entities (people, organizations, locations, things). Once we tag the body of text, we use some general purpose rules to create some basic questions. Example: Apples first logo depicts Sir [PER Isaac Newton] sitting under an apple tree . Apples first logo is designed by [PER Jobs] and [PER Wayne. ] Questions: Who is Isaac Newton? Who is Jobs? Who is Wayne?

Semantic Role Labeling Semantic Role Labeling: Giving semantic labels to phrases. Provides a Structured Representation for text’s meaning Semantic Role Labeling Knowledge-bases PropBank FrameNet

Semantic Role Labeling The NYSE is prepared to open tomorrow on generator power if necessary," the statement said. 0: [ARG0 The NYSE] is prepared to [TARGET open ] [ARGM-TMP tomorrow] on generator power [ARGM-ADV if necessary] the statement said 0: [ARG1 The NYSE is prepared to open tomorrow on generator power if necessary] [ARG0 the statement] [TARGET said ]

Q/A from Semantic Role Labeling

Generate Questions and Answers Given the Named Entity Information and Semantic Role Labels, Questions/Answers are generated.

Importance of Generated Questions Find the topic of each section. Compute topic-question similarity and prune Q/A

CoReferencing Coreference resolution is the task of finding all expressions that refer to the same entity in a text.

Problem: Vague noun phrases Noun phrases can refer to previous information in the discourse, leading to potentially vague questions. The show boosted the studio to the top of the TV cartoon field . . . . Q: What boosted the studio to the top of the TV cartoon field? A: The show.

Solution 1: Vague noun phrases (In progress) Paragraph segmentation: Assumption: The content within the same topic is interrelated. Hearst’s TextTiling algorithm Text clustering using Topic Modeling (Hierarchical LDA)

Solution 2: Vague noun phrases (In progress) Identifying intents of sentences. “Before starting your vehicle, adjust your seat, adjust the inside and outside mirrors, fasten your seat belt.” Intent: Things to do before starting your car. We propose to classify intent into six categories: State (Internal or external state) Parts (Part of a vehicle) Feature (Specific mode of a vehicle) Problem Procedures

Human Evaluations(In progress) We use some native English-speaking people to judge the quality of the top-ranked 20% questions using two criteria: topic relevance, clarity and syntactic correctness.

Criteria-Value Extraction iNago Project Criteria-Value Extraction

Criteria-value extraction A semantic representation of Q/A dataset in form of Attribute-Value pairs Goals: Complete representation of questions' different aspects Enabling interactive conversation for question answering

Steps and timeline Phrase mining and concept identification Question clustering and question intent detection Identifying frames from patterns Evaluation of generated criteria-values

Phrase mining and concept identification Phrase mining: finding topical phrases from large text corpus Finding domain-specific phrases Entity recognition Enhancing parsing results Concept identification To identify set of terms representing a concept in questions Detecting important terms from words and phrases in questions Using clustering algorithm for finding concepts Concept pruning and labeling

Phrase mining and concept identification Concept identification process Measuring similarity with word embedding

Question clustering and question intent detection Clustering questions based on similar intent Extracting features with Semantic and syntactic parsing Heuristic question patterns Entity recognition Constituent and dependency parse trees Semantic role labeling

Identifying frames from patterns (in progress) Frame: Grouping criteria-values for questions with same intent into a generalized form Finding semantic patterns in question clusters Detecting patterns based on shallow semantic parsing and SRL Using external resources like FrameNet semantic dictionary Generalization of semantic patterns for frame identification

Evaluation of criteria-values (in progress) Defining quality metrics for criteria-values Completeness: Possibility of reconstructing a unique question from criteria-values Informativeness: No redundancy in criteria-values Consistency: Criteria should be consistent across all Q/A dataset Designing a user study for measuring above qualities