INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

iNAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering

Brief Introduction In collaboration with iNAGO Inc. YorkU Team
Elnaz Delpisheh (Post Doc) Heidar Davoudi (Ph.D.) Emad Gohari (Masters)

Automatic Q/A generation
iNago Project Automatic Q/A generation

Steps and timeline Sentence Simplification Named Entity Information
Semantic Role Labeling Generate Questions and Answers Importance of Generated Questions Context issues Human Evaluations

Sentence Simplification
Sentences may have complex grammatical structure with multiple embedded clauses. We simplify the complex sentences with the intention to generate more accurate questions. Pre-processing and data cleaning is done. Complex Sentence (s): Apple’s first logo, designed by Jobs and Wayne, depicts Sir Isaac Newton sitting under an apple tree. Simple Sentence: Apples first logo depicts Sir Isaac Newton sitting under an apple tree. Apples first logo is designed by Jobs and Wayne.

Named Entity Information
NE tagger that tags a plain text with named entities (people, organizations, locations, things). Once we tag the body of text, we use some general purpose rules to create some basic questions. Example: Apples first logo depicts Sir [PER Isaac Newton] sitting under an apple tree . Apples first logo is designed by [PER Jobs] and [PER Wayne. ] Questions: Who is Isaac Newton? Who is Jobs? Who is Wayne?

Semantic Role Labeling
Semantic Role Labeling: Giving semantic labels to phrases. Provides a Structured Representation for text’s meaning Semantic Role Labeling Knowledge-bases PropBank FrameNet

Semantic Role Labeling
The NYSE is prepared to open tomorrow on generator power if necessary," the statement said. 0: [ARG0 The NYSE] is prepared to [TARGET open ] [ARGM-TMP tomorrow] on generator power [ARGM-ADV if necessary] the statement said 0: [ARG1 The NYSE is prepared to open tomorrow on generator power if necessary] [ARG0 the statement] [TARGET said ]

Q/A from Semantic Role Labeling

Generate Questions and Answers
Given the Named Entity Information and Semantic Role Labels, Questions/Answers are generated.

Importance of Generated Questions
Find the topic of each section. Compute topic-question similarity and prune Q/A

CoReferencing Coreference resolution is the task of finding all expressions that refer to the same entity in a text.

Problem: Vague noun phrases
Noun phrases can refer to previous information in the discourse, leading to potentially vague questions. The show boosted the studio to the top of the TV cartoon field Q: What boosted the studio to the top of the TV cartoon field? A: The show.

Solution 1: Vague noun phrases (In progress)
Paragraph segmentation: Assumption: The content within the same topic is interrelated. Hearst’s TextTiling algorithm Text clustering using Topic Modeling (Hierarchical LDA)

Solution 2: Vague noun phrases (In progress)
Identifying intents of sentences. “Before starting your vehicle, adjust your seat, adjust the inside and outside mirrors, fasten your seat belt.” Intent: Things to do before starting your car. We propose to classify intent into six categories: State (Internal or external state) Parts (Part of a vehicle) Feature (Specific mode of a vehicle) Problem Procedures

Human Evaluations(In progress)
We use some native English-speaking people to judge the quality of the top-ranked 20% questions using two criteria: topic relevance, clarity and syntactic correctness.

Criteria-Value Extraction
iNago Project Criteria-Value Extraction

Criteria-value extraction
A semantic representation of Q/A dataset in form of Attribute-Value pairs Goals: Complete representation of questions' different aspects Enabling interactive conversation for question answering

Steps and timeline Phrase mining and concept identification
Question clustering and question intent detection Identifying frames from patterns Evaluation of generated criteria-values

Phrase mining and concept identification
Phrase mining: finding topical phrases from large text corpus Finding domain-specific phrases Entity recognition Enhancing parsing results Concept identification To identify set of terms representing a concept in questions Detecting important terms from words and phrases in questions Using clustering algorithm for finding concepts Concept pruning and labeling

Phrase mining and concept identification
Concept identification process Measuring similarity with word embedding

Question clustering and question intent detection
Clustering questions based on similar intent Extracting features with Semantic and syntactic parsing Heuristic question patterns Entity recognition Constituent and dependency parse trees Semantic role labeling

Identifying frames from patterns (in progress)
Frame: Grouping criteria-values for questions with same intent into a generalized form Finding semantic patterns in question clusters Detecting patterns based on shallow semantic parsing and SRL Using external resources like FrameNet semantic dictionary Generalization of semantic patterns for frame identification

Evaluation of criteria-values (in progress)
Defining quality metrics for criteria-values Completeness: Possibility of reconstructing a unique question from criteria-values Informativeness: No redundancy in criteria-values Consistency: Criteria should be consistent across all Q/A dataset Designing a user study for measuring above qualities

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Similar presentations

Presentation on theme: "INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Similar presentations

Presentation on theme: "INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering."— Presentation transcript:

Similar presentations

About project

Feedback