Download presentation
Presentation is loading. Please wait.
Published byBarrie Goodman Modified over 9 years ago
1
Recognizing Textual Entailment Progress towards RTE 4 Scott Settembre University at Buffalo, SNePS Research Group ss424@cse.buffalo.edu
2
Recognizing Textual Entailment Challenge (RTE) - Overview The task is to develop a system to determine if a given pair of sentences has the first sentence “entail” the second sentence The pair of sentences is called the Text-Hypothesis pair (or T- H pair) Participants are provided with 800 sample T-H pairs annotated with the correct entailment answers The final testing set consists of 800 non-annotated samples
3
Development set examples Example of a YES result As much as 200 mm of rain have been recorded in portions of British Columbia, on the west coast of Canada since Monday. British Columbia is located in Canada. Example of a NO result Blue Mountain Lumber is a subsidiary of Malaysian forestry transnational corporation, Ernslaw One. Blue Mountain Lumber owns Ernslaw One.
4
Entailment Task Types There are 4 different entailment tasks: –“IE” or Information Extraction Text: “An Afghan interpreter, employed by the United States, was also wounded.” Hypothesis: “An interpreter worked for Afghanistan.” –“IR” or Information Retrieval Text: “Catastrophic floods in Europe endanger lives and cause human tragedy as well as heavy economic losses” Hypothesis: “Flooding in Europe causes major economic losses.”
5
Entailment Task Types - continued The two remaining entailment tasks are: –“SUM” or Multi-document summarization Text: “Sheriff's officials said a robot could be put to use in Ventura County, where the bomb squad has responded to more than 40 calls this year.” Hypothesis: “Police use robots for bomb-handling.” –“QA” or Question Answering Text: “Israel's prime Minister, Ariel Sharon, visited Prague.” Hypothesis: “Ariel Sharon is the Israeli Prime Minister.”
6
RTE3 - 2007 Results Our two runs submitted this year (2007) scored: –%62.62 (501 correct out of 800) –%61.00 (488 correct out of 800) For the 3 nd RTE Challenge of 2007, a %62.62 ties for 12 th out of 26 teams. –Top scores were %80, %72, %69, and %67. –Median: %61.75 –Range: %49 to %80 (up from %75.38 last year)
7
RTE3 - 2007 Results Category breakdown consistent with last year –QA (question answering) average was %71[%75] –IR (information retrieval) average was %66[%63] –Summary average was %58[%61.5] –IE (information extraction) average was %52[%51] This relationship between the entailment categories was consistent between the groups as well.
8
Best Performers Hickl, one of the top performers, used techniques like these: –Lexical relationships, using Wordnet * –N-gram, word similiarity * –Anaphora resolution –Machine learning techniques * Entailment corpora, more than provided by RTE –Logical inference Using background knowledge *Also used by our submission
9
Best Performers Another top performer Tatu (%72), focused mainly on these techniques –Lexical relationships, using Wordnet –Anaphora resolution –Logical inference Using background knowledge A good performances came out of LSA, Lexical Semantic Analysis –%67 score came out of using LSA (top 4 performer) –Only 3 teams used LSA, 2 scored low (%58,%55)
10
List of Techniques Used Lexical similarity, using a dictionary/thesaurus source –Wordnet, DIRT, and MSOffice dictionary used n-gram, word similarity (also “bag of words”) Syntactic matching and aligning Semantic role labeling –Framenet, Probank, Verbnet Corpus (web-based) statistics –LSA – Latent Semantic Analysis Machine Learning Classification –ANNs (Neural networks), HMMs, SVM (Support Vector machines) Anaphora resolution Entailment corpora, background knowledge Logical Inference
11
Logical Inference Techniques Used SNePS should be here! Extended Wordnet or Wordnet 3.0 –Expresses word relationships in logic rules DIRT – a paraphrase database of world knowledge –Expresses equivalent paraphrases in terms of rules –i.e. X kills Y X attacks Y –note: this rule did not contain (“and Y dies”) Framenet –Uses a Frame to express a relationship between a “objects” in a script along with other “objects”, like roles, situations, events Use specifically developed semantic inference modules Oddly, no one used OpenCyc
12
New Technique for our RTE 4 Submission Latent Semantic Analysis – LSA LSI technique developed back in 1988, addressing search indexing LSA improved upon LSI in 1990's, applied to summary and evaluation Important for us because result can be expressed as a metric or a feature vector, fits right into the RTE Tool Helps overcome the “poverty of the stimulus” problem, by “accommodating a very large number of local co-occurrence relations simultaneously” [Landauer, T. K., Foltz, P. W., & Laham, D. 1998]
13
How LSA Works The process includes –Setting up a matrix of words to words or words to documents –Performing a Singular Value Decomposition (SVD) on that matrix –Reducing the resulting three smaller matrixes by removing rows of 0 coefficients –We then reconstruct the original matrix, which essentially relates words (cells) that had not been directly related to each other initially, and redistributes the correlation between them –Then, depending on what relationship one is trying to find, we can extract the feature vectors we wish to compare and calculate the cosine between them –Uh huh, so what does this all mean… Let’s look at an oversimplified example
14
LSA - Oversimplified Example – part 1 We have two documents: D1 is about dogs, D2 cats D1 contains the words “dog” “pet” “leash” “walk” “bark” D2 contains the words “cat” “roam” “jump” “purr” “pet” At this level, we may not know if any of these words are related, especially if we have many documents and many words But, we can see that the word “pet” is in both documents This “may” imply that there is a relationship between some words in D1 and D2, simply because “pet” occurred in both
15
LSA - Oversimplified Example – part 2 If we construct a matrix, it would look like this In the “pet” column we can see there is a commonality After applying SVD, the matrix may smooth to something like this We can see now there is some relation between the documents, no longer concentrated on just one common word dogpetleas h wal k bar k catroa m jum p pur r D1111110000 D2010001111 dogpetleas h wal k bar k catroa m jum p pur r D1.7.6.7.1 D2.1.6.1.7
16
LSA – How I Plan to Apply I will be creating two matricies –One matrix will contain data ONLY from entailed sentence pairs –Second matrix will be for non-entailed pairs Each matrix vector will contain word to word comparisons –Each row will contain a word that has been used in successful entailment –Each column will contain the passage to be entailed from –SVD performed on each, reduced, then combined again Now, to determine if a new pair is entailed –We calculate the feature vector associated with each word and each matrix –We then calculate the COS between each word/matrix vector –Then “combine” the COS for each vector set (vector set from each matrix) –Perform a linear discriminant function to classify entailment (from the RTE Tool of RTE3)
17
LSA – Progress Developing LSA in ACL Using Matrix package from http://matlisp.sourceforge.net/http://matlisp.sourceforge.net/ Benefits of using LSA –No need to program rules or compare sentence structures –Mimics performance that humans have [see ref from before] –No need to consider all information, since correlations can be created between words even if a specific word has not been seen before Drawbacks of using LSA –I need a lot more data, unsure how much (I may be able to calculate later) –Linear algebra is complicated for my small symbolic brain –I’m at the mercy of the literature, though I made some innovation in LSA use
18
RTE Challenge - Final Notes See the continued progress at: http://www.cse.buffalo.edu/~ss424/rte3_challenge.html RTE Web Site: http://www.pascal-network.org/Challenges/RTE3/ Textual Entailment resource pool: http://aclweb.org/aclwiki/index.php?title=Textual_Entailment_Resource_Pool Actual ranking released in June 2007 at: http://acl.ldc.upenn.edu/W/W07/W07-14.pdf November 15, 2007 SNeRG MeetingScott Settembre
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.