Download presentation
Presentation is loading. Please wait.
Published byBrook Casey Modified over 8 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Discovering Interesting Usage Patterns in Text Collections: Integrating Text Mining with Visualization Presenter : Wei-Hao Huang Authors : Anthony Don, Elena Zheleva, Machon Gregory, Sureyya Tarkan, Loretta Auvil, Tanya Clement, Ben Shneiderman, Catherine Plaisant CIKM 2007
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Critical interpretation of literary works is difficult. Researchers are rarely to support their interpretation and the development of new hypotheses. Text mining algorithms typically return large number of patterns which are difficult to interpret out of context.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To propose text mining with Visualization results more interpretation to humanities scholars, journalists, intelligence analysts, and other researchers, in order to support the analysis of text collections.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology FeatureLens Frequent expressions Frequent words Frequent closed itemsets of n-grams
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. FeatureLens 6
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Frequent expressions To qualify a word or a longer expression N-gram Support of an expression 7 Ex: This is a book. 2-gram: {“This is”, “is a”, “a book”} 3-gram: {“This is a”, “is a book”}
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Frequent words D2K/T2K provides the means to perform the frequent words analysis with stemming. 8
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Frequent closed itemsets of n-grams X1 is a frequent closed itemset but X2 and X3 are not. 9 I = { “I will improve”, “will improve medical”, “will improve security”, “will improve education”, “improve medical aid”, “improve security in”, “improve education in”, “medical aid in”, “aid in our”, “security in our”, “education in our”, “in our country”} “improve our health care system” “improve our health our citizens”
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 10 With two different types of text The State of the Union Addresses The Making of Americans
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The State of the Union Addresses 11 1.How many times did “terrorist” appear in 2002? The president mentions “the American people” and “terrorist” in the same speeches, did the two terms ever appear in the same paragraph? 2. What was the longest pattern? In which year and paragraphs did it occur? What is the meaning of it?
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. The Making of Americans 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Conclusions These text mining concepts can help the user to analyze the text, and to create insights and new hypotheses. FeatureLens helps to discover and present interesting insights about the text.
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Comments Advantages ─ Text mining with visualization Applications ─ Text mining
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.