Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

Similar presentations


Presentation on theme: "1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,"— Presentation transcript:

1 1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University, Japan ishikawa@itc.nagoya-u.ac.jp

2 2 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

3 3 Background  Time-series of documents Example: news articles delivered on the Internet, online academic journals Continually delivered everyday  Problems A large number of documents: appropriate summarization is required Topics will change: topic detection/tracking and trend extraction are useful

4 4 Objectives  Development and evaluation of T-Scroll (Trend/Topic-Scroll) User interface for visualizing the transition of topics extracted from a time-series documents  System Features Constructed over a document clustering system that outputs new clustering results periodically Clusters are displayed along the time axis like a scroll Links are shown between related clusters to represent topic transition Some useful features for interactive exploratory analysis

5 5

6 6 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

7 7 Visualization of a time-series of documents  A few systems for visualization of trends in a time- series of documents  ThemeRiver (Havre et al, IEEE Trans. VCG, 2002) [4] Visualizes topic streams like a river Focuses on providing visual impacts No features for analysis and browsing  TimeMine (Swan and Allan, SIGIR ’ 00) [5] Extracts topics from a time-series of documents Displays timelines to represent topics on the screen

8 8 ThemeRiver Analysis of the articles related to Cuba (1960 – 1961)

9 9 TimeMine  Swan & Allan (U. of Massachusetts)

10 10 Analysis of time-dependent clusters  Mei & Zhai (KDD ’ 05) [6] Statistical approach for discovering major topics from a time-series of documents Probabilistic modeling  MONIC (Spiliopoulou et al., KDD ’ 06) [7] Detects various types of patterns from cluster transitions  Examples: splitting/merging of clusters, cluster size changes Based on the analysis of historical snapshots of clusters

11 11 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

12 12 Novelty-based document clustering (1)  Developed by our group (ECDL ’ 01 [8], WWW Journal 2007 [10] etc.)  Clusters documents incrementally based on their similarity and novelty  Features Similarity considers novelty  Assign high weights to recent documents, low weights to old ones  Document weights decay as time passes: Based on the concept of obsolescence (aging)  Delete old documents whose weights are smaller than the threshold Incremental processing: low update cost

13 13 Novelty-based document clustering (2) time New President Sarkozy Yeltsin ’ s Death Other articles Blair to Resign “Yeltsin’s Death” and other documents are obsolete!  Periodical clustering processes are performed on a time-series of documents

14 14 Document similarity (1) acquisition time of document of document d i 1  dw i TiTi t Current time (0 < < 1) : forgetting factor determines the forgetting speed The weight of a document exponentially decreases as time passes.  Assumption: each delivered document gradually loses its value as time passes  dw i : the weight of a document d i at time 

15 15 Document similarity (2)  Similarity score of documents d i and d j Based on novelty of documents and word occurrence patterns in the documents. Extension of the tf-idf method  New documents have high impact on the clustering result  Document clustering: k-means method

16 16 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

17 17 T-Scroll: Idea  Periodical clustering results are displayed like a scroll  Links represents related cluster pairs

18 18

19 19 System functionalities (1)  Cluster labels: selected based on the formula Pr(d i ) : document weight, tf ij : term frequency count  Cluster sizes: ellipse size roughly corresponds to the number of documents  Links: If the score is greater than the threshold, links are shown

20 20 System functionalities (2)  Cluster quality: visualized using different colors for the cluster border lines red (good)  purple (bad) High score can be achieved if (1) the cluster size is large, and (2) documents contained in the cluster are similar

21 21 System functionalities (3)  Drill-down/roll-up: user can specify the interval of between two consecutive clustering interactively (e.g., one day, one week)  Displaying keyword list: user can browse the keyword list for a specified cluster  Access to original documents  Keyword-based emphasis: clusters that contain a user-specified keyword are emphasized

22 22 Demo

23 23 System implementation  T-Scroll module Written by Perl: generates an SVG file Browser displays the generated SVG file SVG file includes scripts (JavaScript)  Used for interactive manipulation  Clustering module Written by Ruby Novelty-based incremental document clustering

24 24 System architecture SVG Control Module T-Scroll Main Module SVG Output Module (JavaScript) SVG file (includes JavaScript) (Perl) ( Perl ) Plug-in Outputs T-Scroll ------- Browser ------- News articles InputOutput ------- Clustering result Input Command inputs Cluster display Interactive manipulation User Clustering Module RSS Feed Module

25 25 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

26 26 Evaluation  10 Users  Data set Japanese news articles collected from news web sites from Sept. 2006 to Feb. 2007 100 articles per day Clustering was performed at six-hour intervals  Evaluation criteria Overall impressions Evaluation of each function Obervability of topics Comparison with ThemeRiver

27 27 Overall impression  User specifies scores between 0 to 5

28 28 Evaluation on each function

29 29 Observability of topics (1)  Can users observe major topics in Nov. 2006? Five major topics are specified by ours: user gives scores how clearly he or she can observe the topic

30 30 Observability of topics (2)  10 users (different from former experiments)  Users should reply observed topics and their scores with no information  Topics 1 to 5 are major topics used in the previous experiments  Topic 2 (big hurricane) was regarded as a normal weather topic

31 31 Comparison with ThemeRiver (1)  ThemeRiver-like display figure was manually created for news articles in Dec. 2006  11 users (different from previous experiments)  Questions to users Overall impressions Obserbability of topics

32 32

33 33 Comparison with ThemeRiver (2)  Overall impression CategoryNo. of replies T-Scroll is better2 T-Scroll is slightly bettrer3 Almost same3 ThemeRiver is slightly better3 ThemeRiver is better0

34 34 Comparison with ThemeRiver (2)  Can users observe five major topics that we selected? CategoryNo. of replies Good0 Possible3 No good4 Impossible4

35 35 Summary of experiments  Overall impressions Good, but improvements required for usability Some users made comments on the response speed  System functionalities Several features (quality info, article lists, etc.) are useful in practice Appropriate labels are necessary: should be improved  Comparison with ThemeRiver ThemeRiver has visual impacts, but its display tends to be complicated for many topics

36 36 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

37 37 Conclusions and future work  Development and evaluation of T-Scroll system Based on novelty-based incremental clustering method Scroll-like display for showing changing trends Several features for interactive analysis  Evaluation Overall impression Observability of topics Comparison with ThemeRiver  Future work Sophisticated keyword (label) selection Improvement of interactive speed


Download ppt "1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,"

Similar presentations


Ads by Google