1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.

1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining  Data preprocessing  Primitives for data mining  Concept description  Mining association rules in large databases  Classification and prediction  Cluster analysis  Mining complex types of data  Applications and trends in data mining

2 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Warehousing: Session 4 Primitives for Data Mining

3 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Session 4: Primitives for Data Mining  Introduction  What defines a data mining task?  A data mining query language (DMQL)??  A GUI (graphical user interface) based on a data mining query language  Summary

4 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Motivation  Data mining: an interactive process  user directs the mining to be performed  Users could use a set of primitives to communicate with the data mining system.  By incorporating these primitives in a data mining query language  User’s interaction with the system becomes more flexible  A foundation for the design of graphical user interface  Standardization of data mining industry and practice

5 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a What Defines a Data Mining Task ?  Task-relevant data  Type of knowledge to be mined  Background knowledge  Pattern interestingness measurements  Visualization of discovered patterns

6 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Task-Relevant Data  Database or data warehouse name  Database tables or data warehouse cubes  Condition for data selection  Relevant attributes or dimensions  Data grouping criteria

7 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Types of knowledge to be mined  Characterization & discrimination  Association  Classification/prediction  Clustering  Outlier analysis  and so on

8 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Background knowledge  Concept hierarchies  schema hierarchy –eg. street < city < province_or_state < country  set-grouping hierarchy –eg. {20-39} = young, {40-59} = middle_aged  operation-derived hierarchy –email address, login-name < department < university < country  rule-based hierarchy –low_profit (X) <= price(X, P1) and cost (X, P2) and (P1 - P2) < $50  User’s existing knowledge of the data.  E.g. structural zero

9 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Pattern interestingness measurements  Simplicity eg. rule length  Certainty eg. confidence, P(A|B) = n(A and B)/ n (B)  Utility potential usefulness, eg. support  Novelty not previously known, surprising

10 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Visualization of Discovered Patterns  Different background/purpose may require different form of representation  E.g., rules, tables, crosstabs, pie/bar chart etc.  Concept hierarchies is also important  discovered knowledge might be more understandable when represented at high concept level.  Interactive drill up/down, pivoting, slicing and dicing provide different perspective to data.  Different knowledge required different representation.

11 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Summary: Five primitives for specifying a data mining task  task-relevant data  database/date warehouse, relation/cube, selection criteria, relevant dimension, data grouping  kind of knowledge to be mined  characterization, discrimination, association...  background knowledge  concept hierarchies,..  interestingness measures  simplicity, certainty, utility, novelty  knowledge presentation and visualization techniques to be used for displaying the discovered patterns  rules, table, reports, chart, graph, decision trees, cubes...  drill-down, roll-up,....

1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.

Similar presentations

Presentation on theme: "1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.

Similar presentations

Presentation on theme: "1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining."— Presentation transcript:

Similar presentations

About project

Feedback