Author : Jochen Dijrre, Peter Gerstl, Roland Seiffert Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, California, August 15-18, 1999, Presented by Xxxxxx
Outline Motivation Methodology Feature Extraction Clustering and Categorizing Data Mining VS Text Mining Conclusion
Motivation Problem: Most of data in a company is unstructured or semi-structured Examples: Letters s Phone transcripts Contracts
Definition and Application Text mining: The discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. Applications: Summarizing documents Discovering/monitoring relations among people Customer profile analysis Trend analysis Documents summarization
Methodology Aspect 1: Knowledge Discovery Aspect 2: Information Distillation Approaches: Extraction Analysis
Feature Extraction Recognize and classify significant vocabulary items from the text Categories of vocabulary Proper names Multiword terms Abbreviations Relations Other useful things
Clustering Model
Categorization Model
Data Mining VS Text Mining Data MiningText Mining GoalDiscover hidden modelsDiscover hidden facts MethodTries to generalize all of data into a single model Tries to understand the details, cross reference between individual instances FieldsMarketing, medicine, health care Biosciences, customer profile analysis
Conclusion Introduction of text mining Differences between data mining and text mining Overview of IBM’s Intelligent Miner for Text The tools and methods used in the past