Automatic Detection of Causal Relations for Question Answering

Automatic Detection of Causal Relations for Question Answering
2003 Roxana Girju Baylor University

content Background Main Task Result Analysis Application in QA

Background Causation relations expressed in English
Explict (cause, lead to, kill, dry…) Implict (no keyword) Previous Work Using knowledge-based inferences

Main Task classifier A classifier based in machine learning
input: sentence(<NP1, verb, NP2>) vectorization main task output: YES or NO classifier

Vectorization Training Example (entityNP1,psychological-featureNP1,abstractionNP1, stateNP1,eventNP1,actNP1,groupNP1,possessionNP1,phenomeno nNP1; verb; entityNP2,psychological-featureNP2,abstractionNP2, stateNP2,eventNP2,actNP2,groupNP2,possessionNP2,phenomeno nNP2; target)

Sentence: Earthquake generates Tsunami
Vector: <f, f, f, f, f, f, f, f, t, generate, f, f, f, f, f, t, f, f, f> The complete training example： <f, f, f, f, f, f, f, f, t, generate, f, f, f, f, f, t, f, f, f, YES>

How to build the training set?
Step1: find the sentences (Where did the data come from?) Step2: select features (How to vectorization?)

Find the Sentences Step1: find NP pairs contain causation relationship
WordNet 1.7 contains 429 such NP pairs, the most frequent being medicine.(about 58.28%)

Step2: For each pair of causation nouns determined above, search the Internet and retain only sentences containing the pair. From these sentences,determine automatically all the parterns <NP1 verb/verb_expression NP2>

Step3: searching the text collection and retained 120 sentences containing the verb
60*120 = 7200 ( corpus A) Step4: extracting 6523 relationships of the type <NP1 verb NP2> from sentences. 2101 are causal relations, while 4422 are not (manually annotate)

Select Features Both lexical and semantic features
Lexical features: verb/verb_expression Semantic features: 9 noun hierarchies in WordNet: entity, psychological feature, abstraction, state, event, act, group, possession, and phenomenon.

Training Algorithm C4.5 decision tree
Inductive bias: a preference for the shorter tree that places high information gain attributes closer to the root

Result Analysis 683 relationships of the type <NP1 verb NP2> in corpus B 102/(115+38) = 66.67%

Reasons for Errors Mostly the fact that the causal pattern is very ambiguous Incorrect parsing of noun phrases The use of the rules with smaller accuracy(e.g. 63%) The lack of named entities in WordNet

Application in QA 50 questions tested, 61% precision for QA system with the causation module, and 36% precision for QA system without the module.

Thanks!

Automatic Detection of Causal Relations for Question Answering

Similar presentations

Presentation on theme: "Automatic Detection of Causal Relations for Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Detection of Causal Relations for Question Answering

Similar presentations

Presentation on theme: "Automatic Detection of Causal Relations for Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback