Download presentation
Presentation is loading. Please wait.
Published byPhoebe Pitts Modified over 8 years ago
1
Bursty Event Detection from Text Streams for Disaster Management 2012-04-17 Sungjun Lee, Sangjin Lee, Kwanho Kim, and Jonghun Park goalwisk@snu.ac.kr Information Management Lab. Dept. of Industrial Engineering Seoul National University
2
Introduction Identify disaster related bursty events from multiple text streams. Characterize bursty terms in terms of -Skewness, consistency, periodicity, and variation. 2 normaldisaster Stream 1 Stream 2Stream K … Real world states Streams Observations happy good nice fine bad die catastrophe nightmare Normal termsDisaster related terms Scoring a term to determine whether or not it is bursty term.
3
Motivation example The distribution of the frequency of terms observed in AP news stream on Feb. 27, 2010 and Mar. 1, 2010. 3 On Mar. 1, 2010, the trial about a Bosnian politician, Radovan Karadzic, began. On Feb. 27, 2010, earthquake hit Chile.
4
Skewness feature A bursty term appears intensively in a specific time period during the corresponding event occurs. 4 Term frequency during L days Probability Term frequency during L days Probability The change of the term frequency distribution of “tsunami”
5
Consistency feature The frequency of a bursty term soars across multiple streams. 5 Stream 1 Stream 2 Stream K … The change of the term appearance of “tsunami” Twitterer focusing on tsunami research Twitterer focusing on travel Article not containing “tsunami” Article containing “tsunami”
6
Periodicity feature Periodic terms are less likely to be bursty terms. Penalize terms exhibiting the periodicity. 6 period=6.8966 period=3.4843 Periodicity of “Sunday”Periodicity of “earthquake”
7
Variation feature To cope with different writing styles among streams. Reduce the possibility of identifying a term with high frequency only in a specific stream as a bursty term. 7 Stream 1 Stream 2 Stream K … The change of the term appearance of “AP” AP news Article not containing “AP” Article containing “AP” Start to publish articles with a fixed signature “AP news”
8
Putting them all together to measure burstyness Combine the four scores of different features based on different rationale and scales. The final term weighting scheme, burst, as follows: 8
9
Experiment setting 6 news channels are collected -Sources: CNN, AP, Reuters, Times Online, Wall Street Journal, New York Times -Category: World news -Period: 1 Oct. 2009 – 15 Mar. 2010 -Source Type: RSS feed 9 Google Reader Repository Data channels Google reader API Experiment DB
10
Example of bursty terms 10 1 A strong aftershock to Chile's deadly earthquake provoked a brief panic in the city of Concepcion, but no tsunami warning was issued and no injuries or damage have been reported.... 2 Tsunami waves of up to 1.5 meters (5 feet) hit far-flung Pacific regions from the Russian far east and Japan to New Zealand's Chatham Islands on Sunday after a powerful earthquake struck Chile, but there were no reports of injuries or serious damage. 3 Former member of the Bosnian wartime presidency Ejup Ganic was arrested at London's Heathrow airport on Monday on behalf of Serbian authorities, British police said. 4 A tsunami generated by a 8.8 magnitude earthquake in Chile hit beaches in eastern Australia on Sunday, witnesses and officials said, but there were no initial reports of damage. 5 British police arrested a former senior Bosnian leader in London Monday on a Serbian warrant alleging he committed war crimes, to the outrage of Bosnian leaders who said the move undermined Bosnian sovereignty.... …
11
Experiment results Comparison of bursty term detection results with methods proposed by Whitney et al. (2009), Fung et al. (2005), Chen et al. (2007), and He et al. (2005). Bold terms: bursty terms assumed to be correct. Underlined terms: topical terms. Starred terms: general terms. 11
12
Experiment results Comparison of the performance of retrieving documents related with bursty events. 12
13
Further work 13 Chi-Square MI CV KL Divergence Skewness Self- Similarity Chernoff Divergence Union of “Statistically Sufficient” Conditions Bursty terms
14
Conclusion Focus on identifying bursty terms to detect disaster related bursty events. Bursty terms can help people in properly reacting in decision critical situations. Bursty terms can be characterized by using four perspectives. -Skewness, consistency, periodicity, and variation. The final scoring function to detect bursty terms is proposed. The experiment results showed that the proposed approach is effective to detect bursty terms compared to the existing alternatives. 14
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.