Dept. of Computer Science University of Liverpool

Slides:

Advertisements

Similar presentations

Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.

Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.

Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Feature Selection Presented by: Nafise Hatamikhah

Exploratory Data Mining and Data Preparation

Ensemble Learning: An Introduction

Three kinds of learning

Dimension Reduction and Feature Selection Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.

Introduction to Wavelets

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Text Mining: Text-as-Data March 25, 2009.

Genetic Algorithm.

Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.

1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.

Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.

Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

Slides for “Data Mining” by I. H. Witten and E. Frank.

COT6930 Course Project. Outline Gene Selection Sequence Alignment.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Data Lecture Notes for Chapter 2 Introduction to Data Mining by Tan, Steinbach,

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Data Mining and Decision Support

3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Data Science Credibility: Evaluating What’s Been Learned

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Data Transformation: Normalization

Data Science Algorithms: The Basic Methods

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Data Mining (and machine learning)

Machine Learning Feature Creation and Selection

K Nearest Neighbor Classification

Data Mining Practical Machine Learning Tools and Techniques

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Machine Learning Chapter 3. Decision Tree Learning

Machine Learning Chapter 3. Decision Tree Learning

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

CSCI N317 Computation for Scientific Applications Unit Weka

Machine Learning in Practice Lecture 23

Dept. of Computer Science University of Liverpool

Ensemble learning.

Machine Learning in Practice Lecture 22

Boltzmann Machine (BM) (§6.4)

CSC 558 – Data Analytics II, Prep for assignment 1 – Instance-based (lazy) machine learning January 2018.

Data Transformations targeted at minimizing experimental variance

Nearest Neighbors CSC 576: Data Mining.

Chapter 7: Transformations

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

Wavelet-based histograms for selectivity estimation

COSC 4335: Part2: Other Classification Techniques

Avoid Overfitting in Classification

Dept. of Computer Science University of Liverpool

Data Mining CSCI 307, Spring 2019 Lecture 21

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

Data Preprocessing Copyright, 1996 © Dale Carnegie & Associates, Inc.

Presentation transcript:

Dept. of Computer Science University of Liverpool COMP527: Data Mining COMP527: Data Mining M. Sulaiman Khan (mskhan@liv.ac.uk)‏ Dept. of Computer Science University of Liverpool 2009 Attribute Selection March 4, 2009 Slide 1

COMP527: Data Mining COMP527: Data Mining Introduction to the Course Introduction to Data Mining Introduction to Text Mining General Data Mining Issues Data Warehousing Classification: Challenges, Basics Classification: Rules Classification: Trees Classification: Trees 2 Classification: Bayes Classification: Neural Networks Classification: SVM Classification: Evaluation Classification: Evaluation 2 Regression, Prediction Input Preprocessing Attribute Selection Association Rule Mining ARM: A Priori and Data Structures ARM: Improvements ARM: Advanced Techniques Clustering: Challenges, Basics Clustering: Improvements Clustering: Advanced Algorithms Hybrid Approaches Graph Mining, Web Mining Text Mining: Challenges, Basics Text Mining: Text-as-Data Text Mining: Text-as-Language Revision for Exam Attribute Selection March 4, 2009 Slide 2

Dimensionality Reduction Genetic Algorithms Compression Today's Topics COMP527: Data Mining Sampling Dimensionality Reduction Genetic Algorithms Compression Principal Component Analysis Attribute Selection March 4, 2009 Slide 3

Instance Selection COMP527: Data Mining Before getting to the data mining, we may want to either remove instances or select only a portion of the complete data set to work with. Why? Perhaps our algorithms don't scale well to the amount of data we have. Partitioning: Split the database into sections and work with each in turn. Often not appropriate unless the algorithm is designed to do it. Sampling: Select a random subset of the data and use that which is hopefully representative. Attribute Selection March 4, 2009 Slide 4

Simple Random Sample without Replacement: Sampling COMP527: Data Mining Simple Random Sample without Replacement: Draw n instances, with the same probability of drawing each instance. Once an instance is drawn, it is removed from the pool. Simple Random Sample with Replacement: As above, but the instance is replaced so that it can be drawn again. Advantages: Cost is proportional to number of items in the sample, not the entire data set. However it's random and simple... we might randomly get a very non-representative sample. Attribute Selection March 4, 2009 Slide 5

Sampling Sample from Clusters: COMP527: Data Mining Sample from Clusters: Cluster the data first into k clusters, then draw a random sample from each cluster, with or without replacement. If the clustering algorithm performs well, then the sample will be significantly likely to be more representative. However clustering is expensive ... possibly more expensive than using the entire data set. Stratified Sample: Group the instances according to some attribute value (eg class) and draw a random sample from each layer. Could be considered as naive clustering. Attribute Selection March 4, 2009 Slide 6

Outliers / Noise COMP527: Data Mining It may be tempting to remove all of the noisy data so that the system can learn a cleaner model for classification However, if the data to be classified is also noisy, the classifier trained on pure data will perform worse than one trained with similar noise to the noise in the test set. Successful classifiers have some tolerance for noise in the training set, where as if the noise is removed, it might overfit. Attribute Selection March 4, 2009 Slide 7

Dimensionality Reduction COMP527: Data Mining Results can often be improved by reducing the number of attributes (dimensions) in an instance. In particular, removing redundant, noisy, or otherwise unhelpful attributes can improve speed and accuracy. But we need to know which attributes to use. Two approaches: Filter: Remove attributes first by looking at the data Wrapper: Learning algorithm is used to determine importance Eg, the accuracy of the resulting classifier determines the 'goodness' of each attribute. Attribute Selection March 4, 2009 Slide 8

Dimensionality Reduction COMP527: Data Mining Either way, we need a way to search through the combinations of attributes to find the best set. Can't examine all of the combinations, so we need a strategy. Stepwise Forward Selection: Find the best attribute and add. Stepwise Backward Elimination: Find the worst attribute and remove. Genetic Algorithms: Use a 'survival of the fittest' along with random cross-breeding approach. Attribute Selection March 4, 2009 Slide 9

Dimensionality Reduction COMP527: Data Mining The problem is now an evaluation of the usefulness of each attribute, and when to stop adding or removing attributes. Some possibilities: Entropy or other similar function (Filter)‏ Evaluate the results of a classifier built using the current set plus/minus each attribute in turn and add/remove the best/worst. (Wrapper) (Very computationally expensive, but can combine with sampling)‏ Stopping is a hill climbing exercise again... stop at a (local) maximum point .. eg where adding/removing attributes doesn't improve the performance. Attribute Selection March 4, 2009 Slide 10

Genetic Algorithms COMP527: Data Mining Main idea: Simulate the evolutionary process in nature, whereby the fit survive and inter-breed with others to allow mutation. Select the most fit individuals to reproduce, which bear offspring, and iterate. eg: Initial Population Evaluate Select Cross-over Mutate Attribute Selection March 4, 2009 Slide 11

1. Start with a set of random attribute subsets. Genetic Algorithms COMP527: Data Mining How does this help? We want to find the best set of attributes by examining sub-optimal sets: 1. Start with a set of random attribute subsets. 2. Evaluate each by looking at classifier accuracy. 3. Select the best sets 4. Select attributes from those sets and cross over with attributes in the other fittest sets. 5. Allow for some random mutations 6. Goto 2, until termination conditions are met. Attribute Selection March 4, 2009 Slide 12

Select a random instance from the data. Relief COMP527: Data Mining Select a random instance from the data. Locate the nearest neighbours from its class, and the opposite class (for a 2 class dataset)‏ Compare each attribute of the instance to its neighbours and update an overall relevance score based on its similarity to the other member of its class and dis-similarity to the instance not of its class. Repeat a given number of times, rank the attributes by the 'relevance' score and cut at a given threshold. ReliefF designed to work with more than two classes. Attribute Selection March 4, 2009 Slide 13

Other Ideas COMP527: Data Mining Learn a decision tree, and then only use the attributes that appear in the tree, after pruning. No effect on building another tree, but can give the attributes to a different algorithm. Use the minimal subset of attributes that allow unique identification of each instance. (Not always possible, can easily overfit)‏ Cluster attributes as instances and remove outliers (if there are a LOT of attributes). Eg a vertical cluster, rather than a horizontal cluster. Attribute Selection March 4, 2009 Slide 14

Other Searching Methods COMP527: Data Mining Race Search: Rather than check the accuracy many times for many different sets of attributes, we can have a race where the attributes that lag behind are dropped. Schemata Search: Series of races to determine if each attribute should be dropped. Or generate an initial order (eg through entropy) and then race with these initial weights. Attribute Selection March 4, 2009 Slide 15

DWT: Discrete Wavelet Transform DFT: Discrete Fourier Transform Compression COMP527: Data Mining We could use a lossy compression technique to remove attributes that are considered not important enough to keep. Consider a 100% jpeg and a 85% jpeg... they look very similar, but some unnecessary information is lost... the sort of thing we want to do with attributes. Techniques: DWT: Discrete Wavelet Transform DFT: Discrete Fourier Transform PCA: Principal Component Analysis Attribute Selection March 4, 2009 Slide 16

Discrete Wavelet Transform COMP527: Data Mining Still not going over the exact details of signal processing : )‏ DWT transforms a vector of attribute values into a different vector of 'wavelet coefficients', of the same length as the original. This transformed vector can be truncated at a certain threshold. Any values below the threshold are set to 0. The remaining data is then an approximation of the original, in the transformed space. We can reverse the transformation to return to the original attributes, minus the ones lost in the truncation. There are many different DWTs, in families (eg Haar and Daubechies)‏ Attribute Selection March 4, 2009 Slide 17

Discrete Fourier Transform COMP527: Data Mining Another signal processing technique, this time using sines and cosines. Fourier's theory is that any signal can be generated by adding up the correct sine waves. The Fourier Transform is an equation to calculate the frequency, amplitude and phase of each sine wave needed. Discrete Fourier Transform is the same thing using sums instead of integrals. However DWTs are more efficient -- DFT uses more space for the same approximation (and hence more attributes). Attribute Selection March 4, 2009 Slide 18

Principal Component Analysis Data Mining Not a signal processing technique! Idea: Just because the dataset has various dimensional axes, doesn't mean those are the best axes to use. Find the best axes in order and drop the least important ones. Dataset on regular axes Dataset on revised axes Attribute Selection March 4, 2009 Slide 19

Principal Component Analysis Data Mining Place the first axis in the direction of the greatest variance. Then continue to place axes in order of variance, such that they are orthogonal to all other axes. Not too complicated for a computer program to do (but too complicated to explain how it does it, especially in N dimensions at once!)‏ The variance can be graphed... Attribute Selection March 4, 2009 Slide 20

Principal Component Analysis Data Mining First 3 axes make up 84% of the variance! Attribute Selection March 4, 2009 Slide 21

Benchmarking Attribute Selection Techniques: Further Reading COMP527: Data Mining Benchmarking Attribute Selection Techniques: http://citeseer.ist.psu.edu/382752.html Witten 5.7-5.10, 7.3 Han Chapter 2 Attribute Selection March 4, 2009 Slide 22