Christoph F. Eick Questions Review October 12, 2010 1.How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Clustering (2). Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram –A tree like.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Hierarchical Clustering, DBSCAN The EM Algorithm
Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.
2001/12/18CHAMELEON1 CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Paper presentation in data mining class Presenter : 許明壽 ; 蘇建仲.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
Part II - Clustering© Prentice Hall1 Clustering Large DB Most clustering algorithms assume a large data structure which is memory resident. Most clustering.
Clustering Methods Professor: Dr. Mansouri
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.
Clustering II.
Cluster Analysis.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Basic Data Mining Techniques
Cluster Analysis.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
DATA MINING LECTURE 8 Clustering The k-means algorithm
2013 Teaching of Clustering
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
A Few Answers Review September 23, 2010
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Ch. Eick: Region Discovery Project Part3 Region Discovery Project Part3: Overview The goal of Project3 is to design a region discovery algorithm and evaluate.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Topic9: Density-based Clustering
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
Data Mining & Machine Learning Group ADMA09 Rachsuda Jianthapthaksin, Christoph F. Eick and Ricardo Vilalta University of Houston, Texas, USA A Framework.
Christoph F. Eick Questions and Topics Review November 11, Discussion of Midterm Exam 2.Assume an association rule if smoke then cancer has a confidence.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Zeidat&Eick, MLMTA, Las Vegas K-medoid-style Clustering Algorithms for Supervised Summary Generation Nidal Zeidat & Christoph F. Eick Dept. of Computer.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
1 Core Techniques: Cluster Analysis Cluster: a number of things of the same kind being close together in a group (Longman dictionary of contemporary English.
Clustering/Cluster Analysis. What is Cluster Analysis? l Finding groups of objects such that the objects in a group will be similar (or related) to one.
Other Clustering Techniques
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Clustering By : Babu Ram Dawadi. 2 Clustering cluster is a collection of data objects, in which the objects similar to one another within the same cluster.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
ΠΑΝΕΠΙΣΤΗΜΙΟ ΙΩΑΝΝΙΝΩΝ ΑΝΟΙΚΤΑ ΑΚΑΔΗΜΑΪΚΑ ΜΑΘΗΜΑΤΑ Εξόρυξη Δεδομένων Ομαδοποίηση (clustering) Διδάσκων: Επίκ. Καθ. Παναγιώτης Τσαπάρας.
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
CSE 4705 Artificial Intelligence
Hierarchical Clustering: Time and Space requirements
Clustering CSC 600: Data Mining Class 21.
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
CS 685: Special Topics in Data Mining Jinze Liu
CS 685: Special Topics in Data Mining Jinze Liu
CSE572, CBS572: Data Mining by H. Liu
SEEM4630 Tutorial 3 – Clustering.
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Presentation transcript:

Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision tree learning? 2.What are the characteristics of representative-based/ prototype-based clustering algorithms—what do they all have in common? 3.K-means is one of the most popular clustering algorithms. Give reasons why K-means is that popular! 4.What of the following cluster shapes K-means is capable to discover? a) triangles b) clusters inside clusters c) the letter ‘T ‘d) any polygon of 5 points e) the letter ’I’ 5. Assume we apply K-medoids for k=3 to a dataset consisting of 5 objects numbered 1,..5 with the following distance matrix: Distance Matrix:  object The current set of representatives is {1,3,4}; indicate all computations k-medoids (PAM) performs in its next iteration 6.What are the characteristics of a border point in DBSCAN? 7.If you increase the MinPts parameter of DBSCAN; how will this affect the clustering results? 8.DBSCAN supports the notion of outliers. Why is this desirable? 9.What are the 10.How is region discovery in spatial datasets different from traditional clustering? 11.What are the unique characteristics of hierarchical clustering?

Christoph F. Eick SomeAnswers Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision tree learning? No answer first question! To obtain a low generalization error! To find the correct amount of model complexity that leads to a low generalization error. 1.What are the characteristics of representative-based/ prototype-based clustering algorithms—what do they all have in common? a) The form clusters by assigning objects in the dataset to the closest prototype/representative.(using 1-NN queries b) They are iterative algorithms that change the current partitioning until a predefined termination condition is met [c)cluster shapes are limited to convex polygons] 1.K-means is one of the most popular clustering algorithms. Give reasons why K-means is that popular! K-means is popular because it is relatively efficient (runtime complexity is basically O(n)and storage complexity is O(n)) and easy to use. It uses implicit fitness function (SSE) and terminates at local optimal for this fitness function. Its properties are well understood. 1.What of the following cluster shapes K-means is capable to discover? a) triangles b) clusters inside clusters c) the letter ‘T ‘d) any polygon of 5 points e) the letter ’I’ Only a and e!! 5. Assume we apply K-medoids for k=3 to a dataset consisting of 5 objects numbered 1,..5 with the following distance matrix: The current set of representatives is {1,3,4}; indicate all computations k-medoids (PAM) performs in its next iteration Distance Matrix:  object

Christoph F. Eick Answers Review October 12, 2010 Cont. 6.What are the characteristics of a border point in DBSCAN? It is not a core point but it is within the radius  of one or more core points. 6.If you increase the MinPts parameter of DBSCAN; how will this affect the clustering results? There will be more outliers! It is hard to say if the number of clusters will increase or decrease (two effects interact: some clusters die(  less clusters); some other bigger clusters will be split into multiple smaller clusters(  more clusters)) 6.DBSCAN supports the notion of outliers. Why is this desirable? a) More descriptive and compact clusters b)no need to remove outliers prior to clustering 6.DBSCAN has a complexity of O(n**2) which can be reduced by using spatial index structures to O(log(n)*n). Explain! For each point in the dataset we have to decide if it is a core point or not, which takes O(n) without supportive data structures; because there are n points in the dataset we obtain O(n**2). For each core c point we also have to compute all the points that are density-reachable from c, but this is O(n) or less… 6.How is region discovery in spatial datasets different from traditional clustering? a) Supports plug-in fitness functions b) Finds clusters in the subspace of spatial attributes and not in the complete attribute space!