Progress Update Lin Ziheng 6/26/2016
Outline Update Summarization Opinion Summarization Discourse Analysis 6/26/2016
Update Summarization TAC 2008 update summarization task slightly differ from the DUC 2007 update task The documents will be from the AQUAINT-2 collection rather than the AQUAINT collection Cluster format: There will only be two sets per cluster (Set A and Set B) Each document set will have exactly 10 documents The summary for document Set A should be a regular topic-focused summary The summary for Set B should be written under the assumption that the user has already read all the documents in Set A 6/26/2016
Tarsqi: a tool for event/time anchoring/ordering Recognizes events and times Creates event/event, event/time, time/time temporal links John fell after Mary pushed him. They heard an explosion on Monday, but not in This reminded them of the 1968 war, which ravaged the countryside in He slept on Friday night. She hopes to succeed before noon. Gonzalez said he would resign on Tuesday. He thought it was a great deal. John leaves today. John does not leave today. 6/26/2016
D1 D Tarsqi Graph Layering 6/26/2016
D0703A-A 6/26/2016
BFS 6/26/2016
Topmost layering 6/26/2016
Optimal layering 6/26/2016
Opinion Summarization Input: Output: a summary for each target that summarizes the answers to the questions Why did readers support Time's inclusion of Bono for Person of the Year? Why did readers not support the inclusion of Bill Gates as Person of the Year? Why did readers not support the inclusion of Melinda Gates as Person of the Year? 6/26/2016
Existing opinion corpus: Movie Review corpus Document level: ve documents and 1000 –ve documents Problem: coarse grain level Sentence level: ve sentences and 5331 –ve sentences Problem: not enough data We collected data from productreview.com.au and rateitall.com Fine grain: Productreview.com.au: each review has pros, cons, overall, and a rating Rateitall.com: each review has a rating Large datasets Productreview.com.au: 2.4G Rateitall.com: 2.0G 6/26/2016
Discourse Analysis Penn Discourse Treebank 2.0 Based on PTB 2 Explicit relations,16053 Implicit relations TEMPORAL(950::3696) Asynchronous (697::2090) precedence succession Synchronous (251::1594) CONTINGENCY (4255::3417) Cause (4172::2240) reason Result Pragmatic Cause (83::13) Justification Condition (1::1416) hypothetical general unreal present unreal past factual present factual past Pragmatic Condition (1::67) relevance implicit assertion COMPARISON (2503::5589) Contrast (2120::3928) juxtaposition opposition Pragmatic Contrast (4::32) Concession (223::1213) expectation contra-expectation Pragmatic Concession (1::15) EXPANSION (8861::6423) Conjunction (3534::5320) Instantiation (1445::302) Restatement (3206::162) specification equivalence generalization Alternative (185::351) conjunctive disjunctive chosen alternative Exception (2::14) List (400::250) 6/26/2016
Marcu and Echihabi baseline Used word-pairs in a Naive Bayes model Wellner et al. baseline Used totally 7 feature classes Claimed that proximity and connective are the most useful feature classes prox: 0.60 prox + conn: I only implemented prox and conn in the baseline system Accuracy exp imp exp+imp proxconnprox+conn exp imp exp+imp /26/2016