Download presentation
Presentation is loading. Please wait.
Published byAmya Cockey Modified over 9 years ago
1
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing Introduction Data warehousing and OLAP for data mining Data preprocessing Primitives for data mining Concept description Mining association rules in large databases Classification and prediction Cluster analysis Mining complex types of data Applications and trends in data mining
2
2 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Warehousing: Session 4 Primitives for Data Mining
3
3 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Session 4: Primitives for Data Mining Introduction What defines a data mining task? A data mining query language (DMQL)?? A GUI (graphical user interface) based on a data mining query language Summary
4
4 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Motivation Data mining: an interactive process user directs the mining to be performed Users could use a set of primitives to communicate with the data mining system. By incorporating these primitives in a data mining query language User’s interaction with the system becomes more flexible A foundation for the design of graphical user interface Standardization of data mining industry and practice
5
5 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a What Defines a Data Mining Task ? Task-relevant data Type of knowledge to be mined Background knowledge Pattern interestingness measurements Visualization of discovered patterns
6
6 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Task-Relevant Data Database or data warehouse name Database tables or data warehouse cubes Condition for data selection Relevant attributes or dimensions Data grouping criteria
7
7 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Types of knowledge to be mined Characterization & discrimination Association Classification/prediction Clustering Outlier analysis and so on
8
8 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Background knowledge Concept hierarchies schema hierarchy –eg. street < city < province_or_state < country set-grouping hierarchy –eg. {20-39} = young, {40-59} = middle_aged operation-derived hierarchy –email address, login-name < department < university < country rule-based hierarchy –low_profit (X) <= price(X, P1) and cost (X, P2) and (P1 - P2) < $50 User’s existing knowledge of the data. E.g. structural zero
9
9 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Pattern interestingness measurements Simplicity eg. rule length Certainty eg. confidence, P(A|B) = n(A and B)/ n (B) Utility potential usefulness, eg. support Novelty not previously known, surprising
10
10 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Visualization of Discovered Patterns Different background/purpose may require different form of representation E.g., rules, tables, crosstabs, pie/bar chart etc. Concept hierarchies is also important discovered knowledge might be more understandable when represented at high concept level. Interactive drill up/down, pivoting, slicing and dicing provide different perspective to data. Different knowledge required different representation.
11
11 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Summary: Five primitives for specifying a data mining task task-relevant data database/date warehouse, relation/cube, selection criteria, relevant dimension, data grouping kind of knowledge to be mined characterization, discrimination, association... background knowledge concept hierarchies,.. interestingness measures simplicity, certainty, utility, novelty knowledge presentation and visualization techniques to be used for displaying the discovered patterns rules, table, reports, chart, graph, decision trees, cubes... drill-down, roll-up,....
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.