1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Slides:



Advertisements
Similar presentations
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Advertisements

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
Lecture Notes for Chapter 4 Introduction to Data Mining
Learning From Data Chichang Jou Tamkang University.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.
What is Cluster Analysis?
Neural Network Homework Report: Clustering of the Self-Organizing Map Professor : Hahn-Ming Lee Student : Hsin-Chung Chen M IEEE TRANSACTIONS ON.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 5 Data mining : A Closer Look.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Evaluating Performance for Data Mining Techniques
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
2 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion Objectives, Prerequisite and.
Data mining and machine learning A brief introduction.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Taylor Rassmann.  Grouping data objects into X tree of clusters and uses distance matrices as clustering criteria  Two Hierarchical Clustering Categories:
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
CS654: Digital Image Analysis
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Clustering.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
Prepared by: Mahmoud Rafeek Al-Farra
CLUSTERING AND SEGMENTATION MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Data Mining and Decision Support
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CURE: An Efficient Clustering Algorithm for Large Databases Authors: Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Presentation by: Vuk Malbasa For CIS664.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Transformation: Normalization
DATA MINING © Prentice Hall.
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Data Mining 101 with Scikit-Learn
Dipartimento di Ingegneria «Enzo Ferrari»,
Basic machine learning background with Python scikit-learn
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures Transparencies prepared by Ho Tu Bao [JAIST]

2 Lecture 5: Automatic Cluster Detection One of the most widely used KDD classification techniques for unsupervised data. Content of the lecture 1. Introduction 2. Partitioning Clustering 3. Hierarchical Clustering 4. Software and case-studies Prerequisite: Nothing special

3 Partitioning Clustering Each cluster must contain at least one object Each object must belong to exactly one group

4 Partitioning Clustering What is a “good” partitioning clustering? Key ideas: Objects in each group are similar and objects between different groups are dissimilar. Minimize the within-group distance and Maximize the between-group distance. Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.

5 Hierarchical Clustering A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. Partition Q is nested into partition P if every component of Q is a subset of a component of P. (This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).

6 Bottom-up Hierarchical Clustering x 1 x 2 x 3 x 4 x 5 x 6

7 Top-Down Hierarchical Clustering x 1 x 2 x 3 x 4 x 5 x 6

8 OSHAM: Hybrid Model Wisconsin Breast Cancer Data Attributes Brief Description of Concepts Concept Hierarchy Multiple Inheritance Concepts Discovered Concepts

9 Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures

10 Lecture 6: Neural networks One of the most widely used KDD classification techniques. Content of the lecture Prerequisite: Nothing special 1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies

11 Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures

12 Lecture 7 Evaluation of discovered knowledge One of the most widely used KDD classification techniques. Content of the lecture 1. Cross validation 2. Bootstrapping 3. Case-studies Prerequisite: Nothing special

13 Out-of-sample testing Historical Data (warehouse) Sampling method Sample data Sampling method Training data Induction method Testing data Error estimation Model 2/3 1/3 error The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption

14 Cross Validation Historical Data (warehouse) Sampling method Sample data Sampling method Sample 1 Induction method Sample n Error estimation Model Run’s error 10-fold cross validation appears adequate (n = 10) Sample Error estimation iterate - Mutually exclusive - Equal size

15 randomly split the data set into 3 subsets of equal size run on each 2 subsets as training data to find knowledge test on the rest subset as testing data to evaluate the accuracy average the accuracies as final evaluation A data set A method to be evaluated Evaluation: k-fold cross validation (k=3)

16 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion This presentation summarizes the content and organization of lectures in module “Knowledge Discovery and Data Mining”