TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

ECG Signal processing (2)
Cluster Analysis: Basic Concepts and Algorithms
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Generative Topic Models for Community Analysis
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Visual Recognition Tutorial
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering Unsupervised learning Generating “classes”
Data mining and machine learning A brief introduction.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Partitional and Hierarchical Based clustering Lecture 22 Based on Slides of Dr. Ikle & chapter 8 of Tan, Steinbach, Kumar.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Today Ensemble Methods. Recap of the course. Classifier Fusion
First topic: clustering and pattern recognition Marc Sobel.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Algorithms Emerging IT Fall 2015 Team 1 Avenbaum, Hamilton, Mirilla, Pisano.
Thanh Le, Katheleen J. Gardiner University of Colorado Denver
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Data Mining – Algorithms: K Means Clustering
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Data Science Credibility: Evaluating What’s Been Learned
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Clustering (1) Clustering Similarity measure Hierarchical clustering
Data Mining Practical Machine Learning Tools and Techniques
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Slides by Eamonn Keogh (UC Riverside)
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
Dipartimento di Ingegneria «Enzo Ferrari»,
Clustering (3) Center-based algorithms Fuzzy k-means
K-means and Hierarchical Clustering
Jianping Fan Dept of CS UNC-Charlotte
K Nearest Neighbor Classification
Bayesian Models in Machine Learning
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
COSC 4335: Other Classification Techniques
Objectives Data Mining Course
Support Vector Machine _ 2 (SVM)
Probabilistic Latent Preference Analysis
Text Categorization Berlin Chen 2003 Reference:
Junheng, Shengming, Yunsheng 11/09/2018
Clustering (2) & EM algorithm
Introduction to Machine learning
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Presentation transcript:

TOP DM 10 Algorithms C4.5 C 4.5 Research Issue: Stable trees. It is well known that the error rate of a tree on the cases from which it was constructed (the resubstitution error rate) is much lower than the error rate on unseen cases (the predictive error rate). For example, on a well- known letter recognition dataset with 20,000 cases, the resubstitution error rate for C4.5 is 4%, but the error rate from a leave-one-out (20,000-fold) cross- validation is 11.7%. As this demonstrates, leaving out a single case from 20,000 often affects the tree that is constructed! Suppose now that we could develop a non-trivial tree-construction algorithm that was hardly ever affected by omitting a single case. For such stable trees, the resubstitution error rate should approximate the leave-one-out cross-validated error rate, suggesting that the tree is of the “right” size. Suppose now that we could develop a non-trivial tree-construction algorithm that was hardly ever affected by omitting a single case. For such stable trees, the resubstitution error rate should approximate the leave-one-out cross-validated error rate, suggesting that the tree is of the “right” size.

TOP DM 10 Algorithms K-Means Local minima can be countered by running the algorithm multiple times with different seeds Limitations Hard assignments of points to clusters improvements: Fuzzy K-means / EM

TOP DM 10 Algorithms K-Means Research Issues Local minima can be countered by running the algorithm multiple times with different seeds Limitations Hard assignments of points to clusters will falter if spherical balls are not well separated (see next slide!) improvements: Fuzzy K-means / EM

TOP DM 10 Algorithms K-Means---Research Issues So, it will falter whenever the data is not well described by reasonably separated spherical balls, for example, if there are non-convex shaped clusters in the data. This problem may be alleviated by rescaling the data to “whiten” it before clustering, or by using a different distance measure that is more appropriate for the dataset. For example, information-theoretic clustering uses the KL-divergence to measure the distance between two data points… K-means can be paired with another algorithm to describe non-convex clusters. One first clusters the data into a large number of groups using k-means. These groups are then agglomerated into larger clusters using single link hierarchical clustering, which can detect complex shapes. This approach also makes the solution less sensitive to initialization, and since the hierarchical method provides results at multiple resolutions, one does not need to pre-specify k either.

Top Ten DM Algorithms Continued SVM Answer Q1-Q3 page 10 of the article as a homework Apriori EM (kind of generalization of K-means): uses a mixture of Gaussian distributions instead of centroids as cluster models. Basic loop: DO Create Cluster (E-step) Update Model Parameters (M-Step) UNTIL there is no change

Top Ten Continued PAGE RANK (determines the importance of webpages based on link structure) Solves a complex system of score equations PageRank is a probability distribution used to represent the likelihood that a person randomly clicking on links will arrive at any particular page. Uses random walk to determine page importance More information: http://www.prchecker.info/check_page_rank.php http://en.wikipedia.org/wiki/PageRank http://infolab.stanford.edu/~backrub/google.html (original PageRank paper) AdaBoost (ensemble approach) k-NN (k-nearest neighbor)

Top Ten Continued Naïve Bayes (Machine Learning Class) CART Recursive partitioning procedure Uses GINI Similar to C4.5 but uses other techniques to obtain trees Some newer work on forests recently