iNAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering
Brief Introduction In collaboration with iNAGO Inc. YorkU Team Elnaz Delpisheh (Post Doc) Heidar Davoudi (Ph.D.) Emad Gohari (Masters)
Automatic Q/A generation iNago Project Automatic Q/A generation
Steps and timeline Sentence Simplification Named Entity Information Semantic Role Labeling Generate Questions and Answers Importance of Generated Questions Context issues Human Evaluations
Sentence Simplification Sentences may have complex grammatical structure with multiple embedded clauses. We simplify the complex sentences with the intention to generate more accurate questions. Pre-processing and data cleaning is done. Complex Sentence (s): Apple’s first logo, designed by Jobs and Wayne, depicts Sir Isaac Newton sitting under an apple tree. Simple Sentence: Apples first logo depicts Sir Isaac Newton sitting under an apple tree. Apples first logo is designed by Jobs and Wayne.
Named Entity Information NE tagger that tags a plain text with named entities (people, organizations, locations, things). Once we tag the body of text, we use some general purpose rules to create some basic questions. Example: Apples first logo depicts Sir [PER Isaac Newton] sitting under an apple tree . Apples first logo is designed by [PER Jobs] and [PER Wayne. ] Questions: Who is Isaac Newton? Who is Jobs? Who is Wayne?
Semantic Role Labeling Semantic Role Labeling: Giving semantic labels to phrases. Provides a Structured Representation for text’s meaning Semantic Role Labeling Knowledge-bases PropBank FrameNet
Semantic Role Labeling The NYSE is prepared to open tomorrow on generator power if necessary," the statement said. 0: [ARG0 The NYSE] is prepared to [TARGET open ] [ARGM-TMP tomorrow] on generator power [ARGM-ADV if necessary] the statement said 0: [ARG1 The NYSE is prepared to open tomorrow on generator power if necessary] [ARG0 the statement] [TARGET said ]
Q/A from Semantic Role Labeling
Generate Questions and Answers Given the Named Entity Information and Semantic Role Labels, Questions/Answers are generated.
Importance of Generated Questions Find the topic of each section. Compute topic-question similarity and prune Q/A
CoReferencing Coreference resolution is the task of finding all expressions that refer to the same entity in a text.
Problem: Vague noun phrases Noun phrases can refer to previous information in the discourse, leading to potentially vague questions. The show boosted the studio to the top of the TV cartoon field . . . . Q: What boosted the studio to the top of the TV cartoon field? A: The show.
Solution 1: Vague noun phrases (In progress) Paragraph segmentation: Assumption: The content within the same topic is interrelated. Hearst’s TextTiling algorithm Text clustering using Topic Modeling (Hierarchical LDA)
Solution 2: Vague noun phrases (In progress) Identifying intents of sentences. “Before starting your vehicle, adjust your seat, adjust the inside and outside mirrors, fasten your seat belt.” Intent: Things to do before starting your car. We propose to classify intent into six categories: State (Internal or external state) Parts (Part of a vehicle) Feature (Specific mode of a vehicle) Problem Procedures
Human Evaluations(In progress) We use some native English-speaking people to judge the quality of the top-ranked 20% questions using two criteria: topic relevance, clarity and syntactic correctness.
Criteria-Value Extraction iNago Project Criteria-Value Extraction
Criteria-value extraction A semantic representation of Q/A dataset in form of Attribute-Value pairs Goals: Complete representation of questions' different aspects Enabling interactive conversation for question answering
Steps and timeline Phrase mining and concept identification Question clustering and question intent detection Identifying frames from patterns Evaluation of generated criteria-values
Phrase mining and concept identification Phrase mining: finding topical phrases from large text corpus Finding domain-specific phrases Entity recognition Enhancing parsing results Concept identification To identify set of terms representing a concept in questions Detecting important terms from words and phrases in questions Using clustering algorithm for finding concepts Concept pruning and labeling
Phrase mining and concept identification Concept identification process Measuring similarity with word embedding
Question clustering and question intent detection Clustering questions based on similar intent Extracting features with Semantic and syntactic parsing Heuristic question patterns Entity recognition Constituent and dependency parse trees Semantic role labeling
Identifying frames from patterns (in progress) Frame: Grouping criteria-values for questions with same intent into a generalized form Finding semantic patterns in question clusters Detecting patterns based on shallow semantic parsing and SRL Using external resources like FrameNet semantic dictionary Generalization of semantic patterns for frame identification
Evaluation of criteria-values (in progress) Defining quality metrics for criteria-values Completeness: Possibility of reconstructing a unique question from criteria-values Informativeness: No redundancy in criteria-values Consistency: Criteria should be consistent across all Q/A dataset Designing a user study for measuring above qualities