Download presentation
Presentation is loading. Please wait.
Published byKellie Sims Modified over 8 years ago
1
Progress Update Lin Ziheng 6/26/2016
2
Outline Update Summarization Opinion Summarization Discourse Analysis 6/26/2016
3
Update Summarization TAC 2008 update summarization task slightly differ from the DUC 2007 update task The documents will be from the AQUAINT-2 collection rather than the AQUAINT collection Cluster format: There will only be two sets per cluster (Set A and Set B) Each document set will have exactly 10 documents The summary for document Set A should be a regular topic-focused summary The summary for Set B should be written under the assumption that the user has already read all the documents in Set A 6/26/2016
4
Tarsqi: a tool for event/time anchoring/ordering Recognizes events and times Creates event/event, event/time, time/time temporal links John fell after Mary pushed him. They heard an explosion on Monday, but not in 2007. This reminded them of the 1968 war, which ravaged the countryside in 1969. He slept on Friday night. She hopes to succeed before noon. Gonzalez said he would resign on Tuesday. He thought it was a great deal. John leaves today. John does not leave today. 6/26/2016
5
1 2 3 4 5 6 7 8 9 D1 D2 1 2 3 4 5 6 7 8 9 Tarsqi 1 34 2 6 57 8 9 Graph Layering 6/26/2016
6
D0703A-A 6/26/2016
7
BFS 6/26/2016
8
Topmost layering 6/26/2016
9
Optimal layering 6/26/2016
10
Opinion Summarization Input: Output: a summary for each target that summarizes the answers to the questions Why did readers support Time's inclusion of Bono for Person of the Year? Why did readers not support the inclusion of Bill Gates as Person of the Year? Why did readers not support the inclusion of Melinda Gates as Person of the Year? 6/26/2016
11
Existing opinion corpus: Movie Review corpus Document level: 1000 +ve documents and 1000 –ve documents Problem: coarse grain level Sentence level: 5331 +ve sentences and 5331 –ve sentences Problem: not enough data We collected data from productreview.com.au and rateitall.com Fine grain: Productreview.com.au: each review has pros, cons, overall, and a rating Rateitall.com: each review has a rating Large datasets Productreview.com.au: 2.4G Rateitall.com: 2.0G http://wing.comp.nus.edu.sg/~hung/productreview/ http://wing.comp.nus.edu.sg/~hung/productreview/ http://wing.comp.nus.edu.sg/~hung/rateitall/ http://wing.comp.nus.edu.sg/~hung/rateitall/ 6/26/2016
12
Discourse Analysis Penn Discourse Treebank 2.0 Based on PTB 2 18459 Explicit relations,16053 Implicit relations TEMPORAL(950::3696) Asynchronous (697::2090) precedence succession Synchronous (251::1594) CONTINGENCY (4255::3417) Cause (4172::2240) reason Result Pragmatic Cause (83::13) Justification Condition (1::1416) hypothetical general unreal present unreal past factual present factual past Pragmatic Condition (1::67) relevance implicit assertion COMPARISON (2503::5589) Contrast (2120::3928) juxtaposition opposition Pragmatic Contrast (4::32) Concession (223::1213) expectation contra-expectation Pragmatic Concession (1::15) EXPANSION (8861::6423) Conjunction (3534::5320) Instantiation (1445::302) Restatement (3206::162) specification equivalence generalization Alternative (185::351) conjunctive disjunctive chosen alternative Exception (2::14) List (400::250) 6/26/2016
13
Marcu and Echihabi baseline Used word-pairs in a Naive Bayes model Wellner et al. baseline Used totally 7 feature classes Claimed that proximity and connective are the most useful feature classes prox: 0.60 prox + conn: 0.7677 I only implemented prox and conn in the baseline system Accuracy exp0.3466 imp0.5474 exp+imp0.4119 proxconnprox+conn exp0.34880.94040.9414 imp0.5435 exp+imp0.43730.76 6/26/2016
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.