6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany.

Slides:



Advertisements
Similar presentations
UNIT – 1 Data Preprocessing
Advertisements

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
1 Copyright Jiawei Han; modified by Charles Ling for CS411a/538a Data Mining and Data Warehousing  Introduction  Data warehousing and OLAP for data mining.
Outline What is a data warehouse? A multi-dimensional data model Data warehouse architecture Data warehouse implementation Further development of data.
Descriptive Exploratory Data Analysis 9/6/2007 Jagdish S. Gangolly State University of New York at Albany.
Descriptive Exploratory Data Analysis 9/6/2007 Jagdish S. Gangolly State University of New York at Albany.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Information Integration. Modes of Information Integration Applications involved more than one database source Three different modes –Federated Databases.
University of Alberta  Dr. Osmar R. Zaïane, Principles of Knowledge Discovery in Data Dr. Osmar R. Zaïane University of Alberta Fall 2004.
University of Alberta  Dr. Osmar R. Zaïane, Principles of Knowledge Discovery in Data Dr. Osmar R. Zaïane University of Alberta Fall 2004.
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Concept Description and Data Generalization (baseado nos slides do livro: Data Mining: C & T)
Harshad Kamat SB # CSE Data Mining Chapter 4 Data Mining Primitives, Languages, and System Architectures.
Data Mining By Archana Ketkar.
Data Mining – Intro.
DBMiner 2.0 Adnan Rahi Prabhat Vivekanandan. Brief History of DBMiner Technology Inc. Research on data mining since International reputation and.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Lingma Acheson Department of Computer and Information Science, IUPUI
DATA MINING & KNOWLEDGE DISCOVERY
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
Data Mining Techniques
Data Mining Chun-Hung Chou
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Understanding Data Analytics and Data Mining Introduction.
Descriptive Exploratory Data Analysis III Jagdish S. Gangolly State University of New York at Albany.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Integrating GVis, GIS and KDD for Exploring Spatio-Temporal Data Integrating GVis, GIS and KDD for Exploring Spatio-Temporal Data Monica Wachowicz Wageningen.
1 CS599 Spatial & Temporal Database Spatial Data Mining: Progress and Challenges Survey Paper appeared in DMKD96 by Koperski, K., Adhikary, J. and Han,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Concept Description: Characterization and Comparison
9/28/2012HCI571 Isabelle Bichindaritz1 Working with Data Data Summarization.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas.
Elsevier items and derived items © 2007 by Saunders, an imprint of Elsevier Inc. Chapter 9 Statistics.
1 Data Mining Functionalities / Data Mining Tasks Concepts/Class Description Concepts/Class Description Association Association Classification Classification.
 Finding all the patterns autonomously in a database? — unrealistic because the patterns could be too many but uninteresting  Data mining should be.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
UNIT-3 Data Mining Primitives, Languages, and System Architectures LectureTopic ********************************************** Lecture-18Data mining primitives:
Evaluation of DBMiner By: Shu LIN Calin ANTON. Outline  Importing and managing data source  Data mining modules Summarizer Associator Classifier Predictor.
Advanced Database Concepts
Statistics with TI-Nspire™ Technology Module E Lesson 1: Elementary concepts.
August 18, 2009Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 4 — ©Jiawei Han and Micheline.
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
UNIT-4 Characterization and Comparison LectureTopic ************************************************* Lecture-22What is concept description? Lecture-23.
A Decision Tree Approach to Cube Construction Patrick Kelly.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining: Data Prepossessing What is to be done before we get to Data Mining?
UNIT-3 Data Mining Primitives, Languages, and System Architectures
Data Mining Functionalities
Data Mining.
Data Mining – Intro.
Data Mining: Data Preparation
Data Mining.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining II: Association Rule mining & Classification
Data Mining Concept Description
Data Warehouse and OLAP
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Warehousing and Data Mining
Chapter 4: Data Mining Primitives, Languages, and System Architectures
Data Warehousing Data Mining Privacy
UNIT-3 Data Mining Primitives, Languages, and System Architectures
Data Mining: Characterization
UNIT-4 Characterization and Comparison
Data Warehouse and OLAP
Presentation transcript:

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 1 Data Mining I Jagdish Gangolly State University of New York at Albany

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 2 Data Mining What is Data mining? Data mining primitives –Task-relevant data –Kinds of knowledge to be mined –Background knowledge –Interestedness measures –Visualisation of discovered patterns Query language

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 3 Data Mining Concept Description (Descriptive Datamining) –Data generalisation Data cube (OLAP) approach (offline pre-computation) Attribute-oriented induction approach (online aggregation) Presentation of generalisation Descriptive Statistical Measures and Displays

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 4 What is Data mining? Discovery of knowledge from Databases –A set of data mining primitives to facilitate such discovery (what data, what kinds of knowledge, measures to be evaluated, how the knowledge is to be visualised) –A query language for the user to interactively visualise knowledge mined

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 5 Data mining primitives I Task-relevant data: attributes relevant for the study of the problem at hand Kinds of knowledge to be mined: characterisation, discrimination, association, classification, clustering, evolution,… Background knowledge: Knowledge about the domain of the problem (concept hierarchies, beliefs about the relationships, expected patterns of data, …)

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 6 Data mining primitives II Interestedness measures: support measures (prevalence of rule pattern) and confidence measures(strength of the implication of the rule) Visualisation of discovered patterns: rules, tables, charts, graphs, decision trees, cubes,…

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 7 Task-relevant Data Steps: Derivation of initial relation through database queries (data retrieval operations). (Obtaining a minable view) Data cleaning & transformation of the initial relation to facilitate mining Data mining

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 8 Kinds of knowledge to be mined Kinds of knowledge & templates (meta- patterns, meta-rules, meta-queries) –Association An Example: age(X:customer, W) Λ income(X, Y)  buys(X, Z) –Classification –Discrimination –Clustering –Evolution analysis

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 9 Background knowledge Knowledge from the problem domain –usually in the form of concept hierarchies (rolling up or drilling down) schema hierarchies (lattices) set-grouping hierarchies (successive sub-grouping of attributes) rule-based hierarchies

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 10 Interestedness measures I Simplicity: More complex the structure, the more difficult it is to interpret, and so likely to be less interesting (rule length,…) Certainty: Validity, trustworthiness # tuples containing both A and B confidence(A  B)  # tuples containing A Sometimes called “certainty factor”

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 11 Interestedness measures II Utility: Support is the percentage of task- relevant data tuples for which the pattern is true # tuples containing both A and B support(A  B)  total # tuples

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 12 Visualisation of discovered patterns Hierarchies tables pie/bar charts dot/box plots ……

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 13 Descriptive Datamining (Concept Description & Characterisation ) Concept description:Description of data generalised at multiple levels of abstraction Concept characterisation: Concise and succinct summarisation of a given collection of data Concept comparison: Discrimination

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 14 Data Generalisation Abstraction of task-relevant high conceptual level data from a database containing relatively low conceptual level data –Data cube (OLAP) approach (offline pre- computation) (Figs 2.1 & 2.2, pages 46 &47) –Attribute-oriented induction approach (online aggregation) Presentation of generalisation (Tables 5.3 & 5.4 on p. 191, and Figs 5.2, 5.3, & 5.4 on pages 192 & 193)

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 15 Descriptive Statistical Measures and Displays I Measures of central tendency –Mean, Weighted mean (weights signifying importance or occurrence frequency) –Median –Mode Measures of dispersion –Quartiles, outliers, boxplots

6/25/2015 Acc 522 Fall 2001 (Jagdish S. Gangolly) 16 Descriptive Statistical Measures and Displays II Displays –Histograms (Fig 5.6, page 214) –Barcharts –Quantile plot (Fig 5.7, page 215) –Quantile-Quantile plot (Fig 5.8, page 216) –Scatter plot (Fig 5.9, page 216) –Loess curve (Fig 5.10, page 217)