Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rui Yan, Yan Zhang Peking University

Similar presentations


Presentation on theme: "Rui Yan, Yan Zhang Peking University"— Presentation transcript:

1 Rui Yan, Yan Zhang Peking University
Evolutionary Timeline Summarization: a Balanced Optimization Framework via Iterative Substitution Rui Yan, Yan Zhang Peking University

2 Evolutionary Timeline Summarization
Motivation: Given the massive collection of time-stamped web documents related to a general news query, ETS aims to return the evolution trajectory along the timeline, consisting of individual but correlated summaries of each date. ETS Optimization problem via iterative substitution Balance coherence/diversity measurement and local/global summary quality Four key requirements: relevance, coverage, coherence, diversity

3 Outline 1 Related Work 2 Problem Formulation 3 Optimization Framework
4 Experiments and Evaluation

4 Multi-document Summarization (MDS)
Related Work Multi-document Summarization (MDS) extractive/abstractive extractive summarization method centroid-based/graph-based ranking method P: miss the temporal dimension timeline construction method Clusters of noun phrases and named entities; usefulness and novelty; interest and burstiness P: evolutionary characteristics are not considered ETS improvement Generate component summaries which have influence on “neighbors”.

5 Topic detection and tracking (TDT)
Understanding News Topic detection and tracking (TDT) Lexical similarity, temporal proximity, query relevance, clustering techniques, etc. News correlation Named entities, data or place information, domain knowledge ETS Not seek to cluster “topics” like in TDT but to utilize evolutionary correlations of news coherence/diversity for summarization

6 Problem Formulation Input: Given a general query Q={q1, q2, , q|Q|} from users where qi is a query word, we obtain a sentence collection C from query related documents. We cluster the sentences into {C1,C2, ,C|T|} by associated publish dates T={t1, t2, , t|T|}. ti is the timestamp of sub-collection Ci. Output: A evolutionary timeline which consists of a series of individual but correlated summary items, i.e. I={I1, I2, , I|T|}, where Ii on date ti is a subset of Ci (Ii ⊆ Ci).

7 4 theoretical measures An effective summary should properly consider the following four key requirements: Relevance: be related to the query Coverage: keep alignment with the source collection Coherence: consistency among component summaries Diversity: few redundant sentences Related formula:

8 Relevance Coverage Coherence Diversity

9 Objective Function Utility: Given the source collection, the utility of an individual summary item Ii is evaluated based on the weighted combination of these requirements. The ETS task is to predict the optimized sentence subset of Ii* from the space of all combinations for all dates. The objective function is as follows:

10 Sentence Selection for Summaries
Ii(n-1) : sentences generated in the (n-1)-th iteration Sin: top ranked sentences in the n-th iteration an intersection set: a substitutable sentence set: a candidate sentence set: During every iteration, our goal is to find a substitutive pair <xi,yi> for Ii : The performance of such substitution can be measured by the utility gain function:

11 Balanced Optimization
The objective function changes into maximization of utility gain by substitute <xi,yi> during each iteration, formally, To make a tradeoff between the global optimization and local optimization, the utility for Ii can be rewritten as follows:

12 Interpolative Optimization

13 Does the algorithm exist the extreme situations: significant rise in local utility which offsets much global utility loss still makes an available selection and vice versa. Local Optimization Global Optimization

14 A new balanced maximization framework enforcing both local and global optimization is proposed.

15 A straightway understanding is that we find a maximized overall utility at the j-th status space on data ti, while at the same time global utility and local utility satisfy the four constraints. : all possible <x,y> pairs ML : MG: A[a][b][c] = max{Mj,a} : where a is to record the processing column, b is to record how many Mj,iG<0 before column a on the path and c is to record the sum of Mj,iG before column a on the path. P[a][b][c]: record the path information

16

17 Experiment——Dataset 10251 news articles from 10 selected sources.
6 topics belong to different categories

18 Experimental System Setup
Preprocessing: discarding non-event texts and filtering events non-relevant to any query words. Compression Rate: the compression rate on ti is set as Off-line Systems vs On-line System: off-line system are optimized based on neighboring summaries on dates before and after them while on-line system is to consider neighboring summaries previously generated.

19 Algorithms for Comparison
Random: select sentences randomly. Centroid: extract sentences according to the parameters(centroid value, positional value, first-sentence overlap) GMDS: graph-based method which constructs connectivity graph among sentences and applies the graph-based ranking algorithm to rank sentence. Chieu: a similar timeline system, utilizing interest and burstiness ranking. ETS: ETS1 for the off-line system and ETS2 for the on-line system.

20 Overall Performance

21

22 Stratege Selection

23

24 Constraints Selection
From Figure 4, we notice Constraint 1 and Constraint 2 are useful. Both Constraint 3 and Constraint 4 are beneficial in iteration count performance because they reduce the available search space and facilitate early pruning for state paths in Algorithm 2.

25 Conclusion Advantage:
The objective function is measured by four properties fully. Especially, coherence are taken into account which indicating neighboring information is essential in evolutionary timeline trajectory. Disadvantage: Time of each sub collection is not flexible. Burstiness may be applied to decide the deadline of each sub collection.

26 Thank You !

27 Web-based Event Detecting, Tracking and Analyzing (EDTA) (Present)
Research of Yan Zhang Yan's general research areas are in databases, massive information processing and Web technologies, with particular emphasis on Web information processing systems. Specifically, his research work includes the following. (Please look at his publications page to read some of his recent papers)Search Precisely and Accurately (Present) We propose a new technique, "search wikily", which can help users to understand search results more logically and holistically. Furthermore, we try to search semantically with the help of semantic networks, such as WordNet and Wikipedia.   This work is currently supported by the National Key Technology R&D Pillar Program in the 11th Five-year Plan of China (Research No.: 2009BAH47B05). Web-based Event Detecting, Tracking and Analyzing (EDTA) (Present) The goal of this project is to help people to better understand what happened and what are happening in the real world. This work is currently supported by NSFC (with Grant No ), Guangdong - MOE Cooperation Funding Scheme (Project No. 2009B ). Large-scale and Distributed Searching (Present) Searching results are frustrated by the rapid increasing of web page amount. So far there are some approaches- clustering, vertical searching and user behavior analysis, just to name a few. Yan and his group are interested in the improvement of the fundamental algorithms.  CLARITY is a Science Foundation Ireland (SFI) Centre for Science, Engineering and technology (CSET) and is a partnership between University College Dublin and Dublin City University, supported by research at the Tyndall National Institute, Cork. CLARITY is a research centre that will focus on the intersection between two important research areas: Adaptive Sensing and Information Discovery, to develop innovative new technologies of critical importance to Ireland's future industry base and contribute to improving the quality of life of people in areas such as personal health, digital media and management of our environment. The overarching theme of CLARITY's research programme -bringing information to life- refers to the harvesting and harnessing of large volumes of sensed information, from both the physical world in which we live, and the digital world of modern communications & computing.

28 Research Rui Yan has a broad interest in real world problems related to text information, social networks, web application, scientific literature, and multimedia. Rui's research focuses on Information Retrieval, Natural Language Processing/Computational Linguistics, Knowledge Managment and Artificial Intelligence. More specifically, he is now conducting research into summarization, social network mining and event detection.


Download ppt "Rui Yan, Yan Zhang Peking University"

Similar presentations


Ads by Google