Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.

Slides:

Advertisements

Similar presentations

Chapter 5 Multiple Linear Regression

Advertisements

Chapter Outline 3.1 Introduction

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Feature Selection Presented by: Nafise Hatamikhah

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Principal Component Analysis

MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.

Exploratory Data Mining and Data Preparation

Mutual Information Mathematical Biology Seminar

Feature Selection for Regression Problems

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

09/05/2005 סמינריון במתמטיקה ביולוגית Dimension Reduction - PCA Principle Component Analysis.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.

Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.

Ordinary least squares regression (OLS)

1 When Does Randomization Fail to Protect Privacy? Wenliang (Kevin) Du Department of EECS, Syracuse University.

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.

E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Principal Component Analysis. Consider a collection of points.

1 Feature Selection: Algorithms and Challenges Joint Work with Yanglan Gang, Hao Wang & Xuegang Hu Xindong Wu University of Vermont, USA; Hefei University.

1 Formal Evaluation Techniques Chapter 7. 2 test set error rates, confusion matrices, lift charts Focusing on formal evaluation methods for supervised.

NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.

Chapter 2 Dimensionality Reduction. Linear Methods

Presented By Wanchen Lu 2/25/2013

Chapter 3 Data Exploration and Dimension Reduction 1.

Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.

315 Feature Selection. 316 Goals –What is Feature Selection for classification? –Why feature selection is important? –What is the filter and what is the.

Additive Data Perturbation: data reconstruction attacks.

Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

SINGULAR VALUE DECOMPOSITION (SVD)

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

CSE 185 Introduction to Computer Vision Face Recognition.

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.

Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;

EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.

Principal Component Analysis (PCA)

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.

3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.

Chapter 13 Discrete Image Transforms

Linear Algebra Review Tuesday, September 7, 2010.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Principal Component Analysis (PCA)

Feature Selection: Algorithms and Challenges

Data Transformation: Normalization

Exploring Microarray data

School of Computer Science & Engineering

Dimension reduction : PCA and Clustering by Agnieszka S. Juncker

Advanced Artificial Intelligence Feature Selection

Roberto Battiti, Mauro Brunato

Feature Selection To avid “curse of dimensionality”

Principal Component Analysis

Parallelization of Sparse Coding & Dictionary Learning

Dept. of Computer Science University of Liverpool

Machine Learning in Practice Lecture 22

Data Transformations targeted at minimizing experimental variance

Chapter 7: Transformations

Feature Selection Methods

Data Mining CSCI 307, Spring 2019 Lecture 23

Presentation transcript:

Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University

MSCS 282: Data Mining - Craig A. Struble2 Overview  Dimension Reduction Correlation Principal Component Analysis Singular Value Decomposition  Feature Selection Information Content …

MSCS 282: Data Mining - Craig A. Struble3 Dimension Reduction  The number of attributes causes complexity of learning, clustering, etc. to grow exponentially “Curse of dimensionality”  We need methods to reduce the number of attributes  Dimension reduction reduces attributes without (directly) considering relevance of the attribute. Not really removing attributes, but combining/recasting them.

MSCS 282: Data Mining - Craig A. Struble4 Correlation  A causal, complementary, parallel, or reciprocal relationship  The simultaneous change in value of two numerically valued random variables  So, if one attribute’s value changes in a predictable way whenever another one changes, why keep them both?

MSCS 282: Data Mining - Craig A. Struble5 Correlation Analysis  Pearson’s Correlation Coefficient  Positive means both increase simultaneously  Negative means one increases as other decreases  If r A,B has a large magnitude, A and B are strongly correlated and one of the attributes can be removed

MSCS 282: Data Mining - Craig A. Struble6 Correlation Analysis X (Years Experience)Y (Salary in $1000s) Strong relationship

MSCS 282: Data Mining - Craig A. Struble7 Principal Component Analysis  Karhunen-Loeve or K-L method  Combine “essence” of attributes to create a (hopefully) smaller set of variables the describe the data  An instance with k attributes is a point in k- dimensional space  Find c k-dimensional orthogonal vectors that best represent the data such that c <= k  These vectors are combinations of attributes.

MSCS 282: Data Mining - Craig A. Struble8 Principal Component Analysis  Normalize the data  Compute c orthonormal vectors, which are the principal components  Sort in order of decreasing “significance” Measured in terms of data variance  Can reduce data dimension by choosing only the most significant principal components

MSCS 282: Data Mining - Craig A. Struble9 Singular Value Decomposition  One method of PCA  Let A be an m by n matrix. Then A can be written as the product of matrices such that U is an m by n matrix, V is an n by n matrix, and  is an n by n diagonal matrix with singular values  1 >=  2 >=…>=  n >=0. Furthermore, U and V are orthogonal matrices

MSCS 282: Data Mining - Craig A. Struble10 Singular Value Decomposition

MSCS 282: Data Mining - Craig A. Struble11 Singular Value Decomposition > x <- t(array(1:12,dim=c(3,4))) > str(s <- svd(x)) $u [,1] [,2] [,3] [1,] [2,] [3,] [4,] $v [,1] [,2] [,3] [1,] [2,] [3,] > a <- diag(s$d) [,1] [,2] [,3] [1,] e+00 [2,] e+00 [3,] e-16

MSCS 282: Data Mining - Craig A. Struble12 Singular Value Decomposition  The amount of variance captured by a singular value is  The entropy of the data set is

MSCS 282: Data Mining - Craig A. Struble13 Feature Selection  Select the most “relevant” subset of attributes  Wrapper approach Features are selected as part of the mining algorithm  Filter approach Features selected before mining algorithm  Wrapper approach is generally more accurate but also more computationally expensive

MSCS 282: Data Mining - Craig A. Struble14 Feature Selection  Feature selection is actually a search problem Want to select subset of features giving most accurate model a,b,c a,b a,c b,c a bc 

MSCS 282: Data Mining - Craig A. Struble15 Feature Selection  Any search heuristics will work Branch and bound “Best-first” or A* Genetic algorithms etc.  Bigger problem is to estimate the relevance of attributes without building classifier.

MSCS 282: Data Mining - Craig A. Struble16 Feature Selection  Using entropy Calculate information gain of each attribute Select the l attributes with the highest information gain Removes attributes that are the same for all data instances

MSCS 282: Data Mining - Craig A. Struble17 Feature Selection  Stepwise forward selection Start with empty attribute set Add “best” of attributes Add “best” of remaining attributes Repeat. Take the top l  Stepwise backward selection Start with entire attribute set Remove “worst” of attributes Repeat until l are left.

MSCS 282: Data Mining - Craig A. Struble18 Feature Selection  Other methods Sample data, build model for subset of data and attributes to estimate accuracy. Select attributes with most or least variance Select attributes most highly correlated with goal attribute.  What does feature selection provide you? Reduced data size Analysis of “most important” pieces of information to collect.