April 23, 2001LBSC 878 Text Data Mining Douglas W. Oard
Outline Knowledge Discovery in Databases Knowledge Discovery in Text Scoping the Problem
What can we find in databases? Data Information Knowledge Wisdom
Knowledge Discovery in Databases Select Preprocess Warehouse Transform Data Mining Presentation Data Convert schema Model noise Remove outliers Handle missing data Convert feature space Reduce dimensionality Synthesize features
The Data Mining Process Choose a model –Classification, clustering, dependency modeling, sequence analysis,... Specify what it means for the model to fit data –Minimum squared error, closest cubic spline,... Find the model parameters that best fit the data –Exhaustive search, heuristic search,...
Knowledge Discovery in Text Information Retrieval Metadata Assignment Warehouse Transform Data Mining Presentation Documents Named entities Anaphora resolution Temporal expressions Slot filling
Text Metadata Mining Bibliometrics –Impact assessment, co-citation analysis,... Web analysis –Clickstreams, link analysis, …
Text-derived Data Mining Theme relationship analysis –Proximity-based phrase clustering Literature-based discovery –Based on associating phrases and index terms New event detection –Cluster then identify outliers Multidocument summarization –Perspective analysis, temporal evolution, …