Download presentation
Presentation is loading. Please wait.
Published byAmie Anderson Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 TIARA: A Visual Exploratory Text Analytic System Presenter : Wei-Hao Huang Authors : Furu Wei, Shixia Liu, Yangqiu Song, Shimei Pan Michelle X. Zhou, Weihong Qian, Lei Shi, Li Tan Qiang Zhang SIGKDD 2010
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation The large collection of text to locate needed information or simply deciding is very costly and time-consuming. Although a number of text analysis technologies are often abstract and complex, may not be consumable by users.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To present exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics). To combine text analytics and interactive visualization to help users explore and analyze large collections of text. Documents TIARA System
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology TIARA Topic Analysis Topic Ranking Keyword based Topic Summarization Time-sensitive Keyword Extraction
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. TIARA 6
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. TIARA System architecture 7 DatabaseFile system
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Topic Analysis To use unsupervised learning methods. is the number of Document is word of Document is vocabulary of size K is the number of topic is document-topic distribution matrix is topic-word distribution matrix 8 N1N2 K101 K211 K1K2 V10.30.7 V20.80.1 Term frequencies in each cluster
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Topic Ranking Topic rank is measured by a combination of both topic content coverage and topic variance. 9
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Keyword based Topic Summarization 10
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Time-sensitive Keyword Extraction 11
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Time-sensitive Keyword Extraction
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Time-sensitive keyword extraction procedure Completeness Distinctiveness Response Time Data set : A personal email collection with 8326 email messages. Emergency room data set containing 23,501 patient records. 13
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Completeness Defined as whether we can recover the original keywords of a topic by combining the keywords associated associated with each time segment. 14
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Distinctiveness Defined as whether we can distinguish one topic segment from another based on their associated keywords to avoid redundancy. 15
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Completeness and Distinctiveness Results 16
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Response Time 17
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Conclusions TIARA tightly integrates text analytics with interactive visualization to support effective exploratory text analysis. Future work Add sentence-base summaries Support other languages Improve performance
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Comments Advantages ─ To explore and analyze large text collections with interactive visualization Applications ─ Text mining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.