Dimensionality Reduction

Slides:

Advertisements

Similar presentations

Machine Learning Homework

Advertisements

Covariance Matrix Applications

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Principal Component Analysis

Dimensional reduction, PCA

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.

Data Mining: A Closer Look Chapter Data Mining Strategies 2.

Chapter 5 Data mining : A Closer Look.

The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.

Chapter 3 Data Exploration and Dimension Reduction 1.

This week: overview on pattern recognition (related to machine learning)

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)

Appendix: The WEKA Data Mining Software

Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.

Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.

Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Face Recognition: An Introduction

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Data Mining and Decision Support

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Principal Components Analysis ( PCA)

Manifold Learning JAMES MCQUEEN – UW DEPARTMENT OF STATISTICS.

Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.

Machine Learning Homework Gaining familiarity with Weka, ML tools and algorithms.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.

CSE 4705 Artificial Intelligence

Big data classification using neural network

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Machine Learning with Spark MLlib

What Is Cluster Analysis?

CSC 4510/9010: Applied Machine Learning

PREDICT 422: Practical Machine Learning

An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.

Introduction to Data Mining

Dimension Reduction in Workers Compensation

Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.

Principal Component Analysis (PCA)

Basic machine learning background with Python scikit-learn

Machine Learning Basics

Self organizing networks

Principal Component Analysis

Principal Component Analysis (PCA)

Descriptive Statistics vs. Factor Analysis

Course Introduction CSC 576: Data Mining.

Dimensionality Reduction

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Chapter 7: Transformations

Multivariate Methods Berlin Chen

Group 9 – Data Mining: Data

Feature Selection Methods

Principal Component Analysis

Multivariate Methods Berlin Chen, 2005 References:

Topic 5: Cluster Analysis

Neural Networks Weka Lab

The “Margaret Thatcher Illusion”, by Peter Thompson

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Dimensionality Reduction Some material from: Data Mining: Practical Machine Learning Tools and Techniques (Chapter 7) Villanova University Machine Learning Project

Dimensionality Reduction Clustering We know that clustering is a way to understand or examine our data where we do the following: Collect examples Compute similarity among examples according to some metric Group examples together such that examples within a cluster are similar, examples in different clusters are different Summarize each cluster, sometimes assign new instances to the most similar cluster Villanova University Machine Learning Project Dimensionality Reduction

Some typical Uses of Clustering A technique demanded by many real world tasks Bank/Internet Security: fraud/spam pattern discovery Biology: taxonomy of living things such as kingdom, phylum, class, order, family, genus and species City-planning: Identifying groups of houses according to their house type, value, and geographical location Climate change: understanding earth climate, find patterns of atmospheric and ocean Finance: stock clustering analysis to uncover correlation underlying shares Image Compression/segmentation: coherent pixels grouped Information retrieval/organization: Google search, topic-based news Land use: Identification of areas of similar land use in an earth observation database Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Social network mining: special interest group automatic discovery 9 Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Clustering may also be used to reduce the number of attributes in a data set: dimensionality reduction Why? A large number of attributes is typical, for instance, in text mining image processing biology For instance, in the current UCI repository: http://archive.ics.uci.edu/ml/datasets.html The UCI repository currently has 51 stat sets with more than 100 attributes, all of which have more than 1000 instances. Bring it up and point out a couple of them. Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Clustering can be carried out as a precursor to running, for instance, a KNN classifier. Reducing dimensionality can improve performance in terms of speed and memory improve performance in terms of accuracy especially for an algorithm such as Naive Bayes or KNN which weights all variables equally Villanova University Machine Learning Project Dimensionality Reduction

Why Would This Help Accuracy? Attributes are not related to class: Adding shirt color to weather.arff Attributes highly correlated, not contributing independent information. Temperature in Celsius and temperature in Fahrenheit Attribute/Class relations are non-linear or are best described as relations between two variables. Don’t play if it’s cold AND rainy. Attributes are sparse, with only a few values for any instance. Any text sample! Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Attribute Selection Obvious way of reducing the number of attributes: Throw some out. We looked at a simple method for this in the exercises for section 17.2. Weka supports more sophisticated methods in the Select Attributes tab, , described in section 7.1 Often useful, and often used. Villanova University Machine Learning Project Dimensionality Reduction

Clusters as Attributes Cluster tools such as K Means group data by similarity, based on the attributes The cluster models are basically weighted combinations of attributes We can consider cluster membership itself as an attribute and it captures variation in our other attributes Villanova University Machine Learning Project Dimensionality Reduction

Cluster Membership in Weka In Weka we can use cluster membership as an attribute In the Preprocess tab there is an unsupervised attribute filter called “AddCluster” Choose a cluster tool in the genericObjectEditor window Apply it, Cluster will appear as another attribute Examining the data (through the Edit… button) show that cluster membership has been added to each instance Bring his up with the big diabetes file and step through it. LOOK at your data. A lot of these are not in fact numeric, although the CSV file defaults that way. Villanova University Machine Learning Project Dimensionality Reduction

Cluster Membership in Weka Now that you have cluster as an attribute, run classifiers as usual Be sure to set the class. Weka defaults to the last attribute Created attributes such as cluster will be last! You can consider removing attributes as usual. keep stepping Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Example Diabetes file from UCI 101766 instances, 50 attributes load (as a CSV file) run NaiveBayes (once!). Accuracy: 56% Add K-Means cluster filter, remove other inputs run NB again (reset class!). Accuracy: 54% So we have replaced 49 attributes with 1, with only a slight loss in accuracy Can explore adding in some of the others to see if accuracy can be improved Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Attribute Selection Note that the most effective attribute selection often comes from knowledge of the domain. For instance, the first attribute in the diabetes data file is Encounter ID. The second is Patient ID. These can probably be discarded immediately. Villanova University Machine Learning Project Dimensionality Reduction

Attribute or Feature Extraction Suppose we want to use whatever information we can get from each attribute? Can we map the values to a smaller, equivalent set of attributes? Sounds familiar? In Regression classifiers we have predicted class based on a weighted combination of attributes in SVMs we have used a kernel to map our inputs to non-linear forms Villanova University Machine Learning Project Dimensionality Reduction

Unsupervised Dimension Reduction Regression and SVMs are both supervised. Can we apply similar concepts to map or project our data without a class to define the output? Clustering looks at how close instances are based on distance between attributes We can use a comparable metric for our unsupervised reduction: critic is a reduction in predicted error for our existing points Villanova University Machine Learning Project Dimensionality Reduction

Consider These Data Points Clearly we can represent these points with complete accuracy with two attributes Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction But With a single value, the distance from the green line, we can capture most of the variation in the values Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Projecting! So we can reduce our attributes from two to one by a transformation that captures most of the variation. The total of the yellow lines is the error; we choose the green line to minimize it Villanova University Machine Learning Project Dimensionality Reduction

Unsupervised Dimension Reduction A common use of unsupervised learning is to remap our inputs into a smaller number of variables. The most common method is principal component analysis (PCA) common statistical technique also sometimes called factor analysis The goal of PCA is to project a large number of attributes or dimensions into a smaller space Villanova University Machine Learning Project Dimensionality Reduction

Principal component analysis Method for identifying the important “directions” in the data Can rotate data into (reduced) coordinate system that is given by those directions Algorithm: Find direction (axis) of greatest variance Find direction of greatest variance that is perpendicular to previous direction and repeat Implementation: find eigenvectors of covariance matrix by diagonalization Eigenvectors (sorted by eigenvalues) are the directions Note that perpendicular gets a little strange when we are talking about multiple Villanova University Machine Learning Project Dimensionality Reduction

Example: 10-dimensional data Can transform data into space given by components Data is normally standardized for PCA Could also apply this recursively in tree learner Villanova University Machine Learning Project Dimensionality Reduction

Unsupervised Attribute Selection The Attribute Selection tab in Weka lets you apply principal component analysis automatically. Choose PrincipalComponents as an Attribute Evaluator. Search Method must be Ranker. Let it choose automatically. Amount of variance to cover defaults to 95% Class can be set to No class; otherwise whatever attribute is considered the class will be omitted Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Results: Right-click on result list to get “Save transformed data…”. This can now be fed to another algorithm. Output window shows the results, including proportion of variance accounted for and transformations. Can be very slow. Diabetes example on my (old) laptop did not finish overnight. Villanova University Machine Learning Project Dimensionality Reduction

Dimensionality Reduction Discussion Reduces classifier time —preprocessing itself takes time. In theory should not change accuracy for methods which differentially weight attributes, such as J48 or regression methods In practice… Villanova University Machine Learning Project Dimensionality Reduction