Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

Slides:



Advertisements
Similar presentations
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Advertisements

Characteristic Identifier Scoring and Clustering for Classification By Mahesh Kumar Chhaparia.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
An Introduction to Rhetoric: Using the Available Means
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
HOW TO EXCEL ON ESSAY EXAMS San José State University Writing Center Dr. Jim Lobdell.
Recommender systems Ram Akella November 26 th 2008.
©2007 Pearson Education, Inc. publishing as Longman Publishers Breaking Through: College Reading, 8/e by Brenda Smith Chapter 5: Supporting Details and.
Writing a Synthesis Essay
Modern Systems Analysis and Design Third Edition
Patterns for Developing Ideas in Writing
Description; compare-contrast; narrative; definition; opinion; cause- effect; classification; process.
Dr. MaLinda Hill Advanced English C1-A Designing Essays, Research Papers, Business Reports and Reflective Statements.
RECOGNIZING AUTHORS’ WRITING PATTERNS
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Synthesising Identify supporting ideas and contradictory ideas. Check the grouping of ideas? Synthesis is how you integrate and combine materials gathered.
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
Unit 3 Narrative Essay.
 Person Name Disambiguation by Bootstrapping SIGIR’10 Yoshida M., Ikeda M., Ono S., Sato I., Hiroshi N. Supervisor: Koh Jia-Ling Presenter: Nonhlanhla.
Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,
Organizing Your Information
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Querying Structured Text in an XML Database By Xuemei Luo.
Chapter 5: Patterns of Organization
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
FINDING RELEVANT INFORMATION OF CERTAIN TYPES FROM ENTERPRISE DATA Date: 2012/04/30 Source: Xitong Liu (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
The Modes of Discourse. Bell Work: Parts of speech A noun is person, place, animal, thing, or idea. A verb shows action. For example: Ms. Dorra.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Narrative Narrative Tips: Set the scene Who or What, When, Where
English Language Services
ANALYSIS OF DRAMA In Preparation for a Production of Beowulf Mrs. Martyn Comp/Lit 12.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Patterns of Development The arrangement of an essay, speech, or story according to its purpose. These notes cover the wide range of logical ways to organize.
1 A Fuzzy Logic Framework for Web Page Filtering Authors : Vrettos, S. and Stafylopatis, A. Source : Neural Network Applications in Electrical Engineering,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
LEARNING FROM THE PAST: ANSWERING NEW QUESTIONS WITH PAST ANSWERS Date: 2012/11/22 Author: Anna Shtok, Gideon Dror, Yoelle Maarek, Idan Szpektor Source:
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
NARRATIVES - STORIES A narrative can be written in the first or the third person and describes a series of events, either imaginary or based on your own.
Rhetorical Modes of Delivery AKA Patterns of Development.
Unit 3 Narrative Essay Part 1. Nov. 3, What is Narrative Essay? A narrative essay is a story. A narrative essay is a piece of writing that recreates.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CHAPTER SEVEN Becoming an Effective Reader PowerPoint by Mary Dubbé Thomas Nelson Community College PART ONE Transitions and Thought Patterns 7 7 Copyright.
Investigate Plan Design Create Evaluate (Test it to objective evaluation at each stage of the design cycle) state – describe - explain the problem some.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
State Modeling. Introduction A state model describes the sequences of operations that occur in response to external stimuli. As opposed to what the operations.
Academic Writing Fatima AlShaikh. A duty that you are assigned to perform or a task that is assigned or undertaken. For example: Research papers (most.
Getting the Most from Writing
Using Friendship Ties and Family Circles for Link Prediction
Transitions in Narrative Writing
Clustering Algorithms for Noun Phrase Coreference Resolution
Getting the Most from Writing
Edexcel – GCSE History – Paper 1
How does a speaker achieve purpose?
Chapter 5: Patterns of Organization
Identifying the Elements of A Plot Diagram
Organizational structures
Text Features 2A.
FCAT Boot Camp Week 2.
Presentation transcript:

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14

Outline Introduction Definition Implementations of Incident Threading Evaluation Experiment Result

Introduction Associated with the fast development of modern technologies, the amount of accessible information is increasing in an exponential manner.

Introduction A good system should group news according to the main topic discussed. It is not advisable that the system provide duplicate information. It would be preferable if the system takes similar actions to link related (but not duplicate) events, because people are very likely to be interested in both (or neither).

Introduction

Definition 1. News story: Any news in text format is disseminated in units, and each of them is a news story. A story has a unique ID and a series of characters containing its content. A news story usually describes one or more real-world occurrences (events). 2. Main characters (WHO): The most important named entities that show who or what is involved in the description of an event. 3. Time stamps (WHEN): Two time features are considered for a news report. One is called publication time, which is when the news is released. The other is activity time that contains the time stamp of the described occurrence, which may be a time point or a period. 4. Location (WHERE): It describes where the occurrence happened, which is usually an absolute geographical position or a relative reference towards another location. 5. Action (WHAT): The key verb(s) in the description of the event. Sometimes it may also be a noun.

Definition Definition : An incident is something that happens in the real world. It involves certain main characters, occurs at a definite time or during a period, happens at a geographical location, and includes a specific action. Definition 2: An incident network is one or more incidents connected by edges that represent certain types of contextual dependency.

Definition Definition : Incident threading is the process of identifying incidents in a news stream and generating an incident network.

Definition There are three main classes of connections in an incident network. The first class is logical relations. ◦ A connection of this type is established between two incidents that have logical causal relations. ◦ This class includes Prediction, Comment, Reaction, Analysis, Background, and Consequence. The second class called progressions. ◦ The only relation type in this class is named follow-up, and the sequence is decided by the time order of the incidents. The third class is called weak relations. ◦ From the usual perspective, two incidents with a link in this class do not have any direct relation, except that they mention something in common. The overlapping factor may be the same main character, the same geographical location, the same type of occurrence, etc.

Implementations of Incident Threading In an incident threading system, there are two important decisions to make. The first is the selection of the basic text unit. The second choice is on the contextual links.

Implementations of Incident Threading Story threading selects a news story as the basic semantic unit and ignores link types. When type information is considered, it becomes relation-oriented story threading. Passage threading analyzes news at a smaller granularity, and limits the range of news to a specific subject (violent actions in the experiment).

Evaluation We evaluate the clustering performance using concentration and purity scores. n: the number of clusters I: an incident p i : the number of passages in a cluster i. m: the number of annotated incident C: a cluster q i : the number of passages in a cluster C in an incident i.

Evaluation

Evaluation Members of the same incidents Link to Link from Unrelated

Experiments The baseline algorithm basic semantic units: passage. A passage is represented by a tf·idf vector. The most similar cluster pair, evaluated by average link, is merged. When to stop: All pairs have similarity below a predefined threshold. The final clusters become incidents. Links are generated among the incidents, based on their similarity.

Experiments The baseline algorithm Basic semantic units: passage. A passage is represented by a tf·idf vector. The most similar cluster pair, evaluated by average link, is merged. When to stop: All pairs have similarity below a predefined threshold. The final clusters become incidents.

Experiments Links are generated among the incidents, based on their similarity. If the average similarity between two incidents is over the threading threshold, a directed arrow is formed that points from the earlier incident to the later one, where the order is determined by the time stamp of the earliest passage in each.

Three-Stage Algorithm The first step tells if there exists any violent action in the current paragraph The second annotates these actions in detail and shows their co-reference. The last step creates links between the key incidents and all others.

Three-Stage Algorithm Passage Classification Number of terms in the passage Number of terms that appear in the main characters Number of terms describing locations Number of terms in the time stamps Percentage of action verbs that describe violence-related Events Percentage of terms for all variances of “be,” “do,” “have” and “say” Percentage of terms that express certain extent of uncertainty, e.g., likely, may, can, often, sometimes, etc. Combinations of the three features above. The full text of the passage. It is available only to BoosTexter.

Three-Stage Algorithm Incident Formation The second stage runs a clustering algorithm on the “violent” passages. ◦ Similarity of all terms ◦ Similarity of main characters ◦ Similarity of geographical locations ◦ Match between time stamps. In the earlier work that utilizes multiple features, a weighted sum of similarities from various elements is calculated to determine the resemblance between two stories. matches in all attributes must be achieved for two passages to be declared similar

Three-Stage Algorithm Linking Incidents A link is created between two incidents only when all thresholds are met. Most contextual links in the violence subject belong ◦ either to consequence or reaction in the logical category ◦ or follow-up in the progressional form. Links usually contain two incidents, which happen at different yet close time, ◦ involve the same geographical location ◦ mention similar main characters, but often show poor term overlap.

Experiment Result

Fleiss’ Kappa among 4 annotators is Cohen’s Kappa for any two annotators is between Fleiss’ Kappa in a similar annotation attempt for general incidents is only Cohen’s Kappa ranges from to

Experiment Result The first step is for the annotator to walk through each paragraph and identify if it contains any description of a violent action. The second step is to mark the individual violent actions and find co-references of the same activity. The last step scans an incomplete list of incident pairs and the annotator is required to determine if there is any logical or progressional relation in each pair, and mark the direction if such a link exists.

Conclusion In this paper we ant to create an incident threading by passage level. We choose two kinds of algorithm and compare the result. With a richer background, type-specific relation analysis is a foreseeable outcome, and it will certainly help the comprehension of news evolution at a higher level.