1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University,

Slides:



Advertisements
Similar presentations
Generation of Multimedia TV News Contents for WWW Hsin Chia Fu, Yeong Yuh Xu, and Cheng Lung Tseng Department of computer science, National Chiao-Tung.
Advertisements

Critical Reading Strategies: Overview of Research Process
Academic Quality How do you measure up? Rubrics. Levels Basic Effective Exemplary.
Automatic Timeline Generation from News Articles Josh Taylor and Jessica Jenkins.
A probabilistic model for retrospective news event detection
Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.
TAILS: COBWEB 1 [1] Online Digital Learning Environment for Conceptual Clustering This material is based upon work supported by the National Science Foundation.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Software Testing and Quality Assurance Lecture 13 - Planning for Testing (Chapter 3, A Practical Guide to Testing Object- Oriented Software)
Stoimen Stoimenov QA Engineer SitefinityLeads,SitefinityTeam6 Telerik QA Academy Telerik QA Academy.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Systems Engineering and Engineering Management The Chinese University of Hong Kong Parameter Free Bursty Events Detection in Text Streams Gabriel Pui Cheong.
PaperLens Understanding Research Trends in Conferences using PaperLens Work by Bongshin Lee, Mary Czerwinski, George Robertson, and Benjamin Bederson Presented.
Information Retrieval in Practice
BuzzTrack Topic Detection and Tracking in IUI – Intelligent User Interfaces January 2007 Keno Albrecht ETH Zurich Roger Wattenhofer.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
Law Enforcement Resource Allocation (LERA) Visualization System Michael Welsman-Dinelle April Webster.
An Intelligent System for Dynamic Online Allocation of Information on Demand from the Internet Thamar E. Mora, Rene V. Mayorga Faculty of Engineering,
Overview of Search Engines
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 41 How Animation on the Web Works.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 17 Software Quality
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
A First Program Using C#
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
Tag-based Social Interest Discovery
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]
Adaptive News Access Daniel Billsus Presented by Chirayu Wongchokprasitti.
1 Sunbelt, 2/18/05 Interactive Visualizations to Explore Dynamic Network Data Jim Blythe USC Info Sciences Institute Cathleen McGrath Loyola Marymount.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Intelligent Tutoring System for CS-I and II Laboratory Middle Tennessee State University J. Yoo, C. Pettey, S. Yoo J. Hankins, C. Li, S. Seo Supported.
Word Weighting based on User’s Browsing History Yutaka Matsuo National Institute of Advanced Industrial Science and Technology (JPN) Presenter: Junichiro.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
UNIT 3 SEMINAR LS504: Applied Research in Legal Studies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Proposal for Term Project J. H. Wang Mar. 2, 2015.
Sept. 7, 2001 ECDL An On-line Document Clustering Method Based on Forgetting Factors Yoshiharu Ishikawa, Yibing Chen Hiroyuki Kitagawa University.
1 CS430: Information Discovery Lecture 18 Usability 3.
Toward A Session-Based Search Engine Smitha Sriram, Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
IEEE Int'l Symposium on Signal Processing and its Applications 1 An Unsupervised Learning Approach to Content-Based Image Retrieval Yixin Chen & James.
Slide 12.1 Chapter 12 Implementation. Slide 12.2 Learning outcomes Produce a plan to minimize the risks involved with the launch phase of an e-business.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Query Based Event Extraction along a Timeline H.L. Chieu and Y.K. Lee DSO National Laboratories, Singapore (SIGIR 2004)
Do Summaries Help? A Task-Based Evaluation of Multi-Document Summarization Kathleen McKeown, Rebecca J. Passonneau David K. Elson, Ani Nenkova, Julia Hirschberg.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
2005/09/13 A Probabilistic Model for Retrospective News Event Detection Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma University of Science and Technology.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
 Problem:  How to discover the latent structure in unstructured data (e.g. Wikipedia articles).  Objective:  Improve the ways people explore and analyze.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Information Retrieval in Practice
How to Develop and Write a Research Paper.
Search Engine Architecture
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Presentation transcript:

1 T-Scroll: Visualizing Trends in a Time-Series of Documents for Interactive User Exploration Yoshiharu Ishikawa and Mikine Hasegawa Nagoya University, Japan

2 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

3 Background  Time-series of documents Example: news articles delivered on the Internet, online academic journals Continually delivered everyday  Problems A large number of documents: appropriate summarization is required Topics will change: topic detection/tracking and trend extraction are useful

4 Objectives  Development and evaluation of T-Scroll (Trend/Topic-Scroll) User interface for visualizing the transition of topics extracted from a time-series documents  System Features Constructed over a document clustering system that outputs new clustering results periodically Clusters are displayed along the time axis like a scroll Links are shown between related clusters to represent topic transition Some useful features for interactive exploratory analysis

5

6 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

7 Visualization of a time-series of documents  A few systems for visualization of trends in a time- series of documents  ThemeRiver (Havre et al, IEEE Trans. VCG, 2002) [4] Visualizes topic streams like a river Focuses on providing visual impacts No features for analysis and browsing  TimeMine (Swan and Allan, SIGIR ’ 00) [5] Extracts topics from a time-series of documents Displays timelines to represent topics on the screen

8 ThemeRiver Analysis of the articles related to Cuba (1960 – 1961)

9 TimeMine  Swan & Allan (U. of Massachusetts)

10 Analysis of time-dependent clusters  Mei & Zhai (KDD ’ 05) [6] Statistical approach for discovering major topics from a time-series of documents Probabilistic modeling  MONIC (Spiliopoulou et al., KDD ’ 06) [7] Detects various types of patterns from cluster transitions  Examples: splitting/merging of clusters, cluster size changes Based on the analysis of historical snapshots of clusters

11 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

12 Novelty-based document clustering (1)  Developed by our group (ECDL ’ 01 [8], WWW Journal 2007 [10] etc.)  Clusters documents incrementally based on their similarity and novelty  Features Similarity considers novelty  Assign high weights to recent documents, low weights to old ones  Document weights decay as time passes: Based on the concept of obsolescence (aging)  Delete old documents whose weights are smaller than the threshold Incremental processing: low update cost

13 Novelty-based document clustering (2) time New President Sarkozy Yeltsin ’ s Death Other articles Blair to Resign “Yeltsin’s Death” and other documents are obsolete!  Periodical clustering processes are performed on a time-series of documents

14 Document similarity (1) acquisition time of document of document d i 1  dw i TiTi t Current time (0 < < 1) : forgetting factor determines the forgetting speed The weight of a document exponentially decreases as time passes.  Assumption: each delivered document gradually loses its value as time passes  dw i : the weight of a document d i at time 

15 Document similarity (2)  Similarity score of documents d i and d j Based on novelty of documents and word occurrence patterns in the documents. Extension of the tf-idf method  New documents have high impact on the clustering result  Document clustering: k-means method

16 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

17 T-Scroll: Idea  Periodical clustering results are displayed like a scroll  Links represents related cluster pairs

18

19 System functionalities (1)  Cluster labels: selected based on the formula Pr(d i ) : document weight, tf ij : term frequency count  Cluster sizes: ellipse size roughly corresponds to the number of documents  Links: If the score is greater than the threshold, links are shown

20 System functionalities (2)  Cluster quality: visualized using different colors for the cluster border lines red (good)  purple (bad) High score can be achieved if (1) the cluster size is large, and (2) documents contained in the cluster are similar

21 System functionalities (3)  Drill-down/roll-up: user can specify the interval of between two consecutive clustering interactively (e.g., one day, one week)  Displaying keyword list: user can browse the keyword list for a specified cluster  Access to original documents  Keyword-based emphasis: clusters that contain a user-specified keyword are emphasized

22 Demo

23 System implementation  T-Scroll module Written by Perl: generates an SVG file Browser displays the generated SVG file SVG file includes scripts (JavaScript)  Used for interactive manipulation  Clustering module Written by Ruby Novelty-based incremental document clustering

24 System architecture SVG Control Module T-Scroll Main Module SVG Output Module (JavaScript) SVG file (includes JavaScript) (Perl) ( Perl ) Plug-in Outputs T-Scroll Browser News articles InputOutput Clustering result Input Command inputs Cluster display Interactive manipulation User Clustering Module RSS Feed Module

25 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

26 Evaluation  10 Users  Data set Japanese news articles collected from news web sites from Sept to Feb articles per day Clustering was performed at six-hour intervals  Evaluation criteria Overall impressions Evaluation of each function Obervability of topics Comparison with ThemeRiver

27 Overall impression  User specifies scores between 0 to 5

28 Evaluation on each function

29 Observability of topics (1)  Can users observe major topics in Nov. 2006? Five major topics are specified by ours: user gives scores how clearly he or she can observe the topic

30 Observability of topics (2)  10 users (different from former experiments)  Users should reply observed topics and their scores with no information  Topics 1 to 5 are major topics used in the previous experiments  Topic 2 (big hurricane) was regarded as a normal weather topic

31 Comparison with ThemeRiver (1)  ThemeRiver-like display figure was manually created for news articles in Dec  11 users (different from previous experiments)  Questions to users Overall impressions Obserbability of topics

32

33 Comparison with ThemeRiver (2)  Overall impression CategoryNo. of replies T-Scroll is better2 T-Scroll is slightly bettrer3 Almost same3 ThemeRiver is slightly better3 ThemeRiver is better0

34 Comparison with ThemeRiver (2)  Can users observe five major topics that we selected? CategoryNo. of replies Good0 Possible3 No good4 Impossible4

35 Summary of experiments  Overall impressions Good, but improvements required for usability Some users made comments on the response speed  System functionalities Several features (quality info, article lists, etc.) are useful in practice Appropriate labels are necessary: should be improved  Comparison with ThemeRiver ThemeRiver has visual impacts, but its display tends to be complicated for many topics

36 Outline  Background and objective  Related work  Novelty-based document clustering  Overview of T-Scroll system  Evaluation  Conclusions and future work

37 Conclusions and future work  Development and evaluation of T-Scroll system Based on novelty-based incremental clustering method Scroll-like display for showing changing trends Several features for interactive analysis  Evaluation Overall impression Observability of topics Comparison with ThemeRiver  Future work Sophisticated keyword (label) selection Improvement of interactive speed