Data Mining: Concepts and Techniques — Chapter 3 — Cont.

Slides:



Advertisements
Similar presentations
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Advertisements

Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Bivariate Analysis Cross-tabulation and chi-square.
Analysis of frequency counts with Chi square

Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Principal Component Analysis
Pre-processing for Data Mining CSE5610 Intelligent Software Systems Semester 1.
Chapter 11 Multiple Regression.
2015年7月2日星期四 2015年7月2日星期四 2015年7月2日星期四 Data Mining: Concepts and Techniques1 Data Transformation and Feature Selection/Extraction Qiang Yang Thanks: J.
Chapter 9: Introduction to the t statistic
Chi-Square Tests and the F-Distribution
F-Test ( ANOVA ) & Two-Way ANOVA
The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
For testing significance of patterns in qualitative data Test statistic is based on counts that represent the number of items that fall in each category.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
Outline Introduction Descriptive Data Summarization Data Cleaning Missing value Noise data Data Integration Redundancy Data Transformation.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Chapter 16 The Chi-Square Statistic
Data Reduction Strategies Why data reduction? A database/data warehouse may store terabytes of data Complex data analysis/mining may take a very long time.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Data Mining: Data Lecture Notes for Chapter 2. 2 What is Data? l Collection of data objects and their attributes l An attribute is a property or characteristic.
Chi-squared Test. Discontinuous data Eye colour, tongue rolling how many individuals fall into a particular category test whether the number of individuals.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
Inference for Distributions of Categorical Variables (C26 BVD)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test (19.5) Table I χ 2 test for independence (19.9) Table.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Data Preprocessing: Data Reduction Techniques Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
3/13/2016Data Mining 1 Lecture 1-2 Data and Data Preparation Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB) Bangkok.
The Chi Square Equation Statistics in Biology. Background The chi square (χ 2 ) test is a statistical test to compare observed results with theoretical.
Principal Components Analysis ( PCA)
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
CHI SQUARE DISTRIBUTION. The Chi-Square (  2 ) Distribution The chi-square distribution is the probability distribution of the sum of several independent,
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
I. ANOVA revisited & reviewed
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 4 Basic Estimation Techniques
Data Transformation: Normalization
Chapter 7. Classification and Prediction
Information Management course
Data Mining: Concepts and Techniques
Principal Components Analysis
Dimension Reduction via PCA (Principal Component Analysis)
Principal Component Analysis
Principal Component Analysis
PCA is “an orthogonal linear transformation that transfers the data to a new coordinate system such that the greatest variance by any projection of the.
Overview and Chi-Square
Feature Selection Methods
Data Transformations targeted at minimizing experimental variance
Chapter 7: Transformations
Feature Selection Methods
Diagnostics and Remedial Measures
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Data Mining: Concepts and Techniques — Chapter 3 — Cont. More on Feature Selection: Chi-squared test Principal Component Analysis

Attribute Selection Question: Are attributes A1 and A2 independent? ID Outlook Temperature Humidity Windy Play 1 100 40 90 T 2 F 3 50 4 10 30 5 15 70 6 7 8 9 11 12 13 14 Question: Are attributes A1 and A2 independent? If they are very dependent, we can remove either A1 or A2 If A1 is independent on a class attribute A2, we can remove A1 from our training data

Deciding to remove attributes in feature selection Dependent (ChiSq=small) ? A2=class attribute Independent (Chisq=large ? A1 Dependent (ChiSq=small) ? A2 = class attribute Independent (Chisq=large ?

Chi-Squared Test (cont.) Question: Are attributes A1 and A2 independent? These features are nominal valued (discrete) Null Hypothesis: we expect independence Outlook Temperature Sunny High Cloudy Low

The Weather example: Observed Count temperature Outlook High Low Outlook Subtotal Sunny 2 Cloudy 1 Temperature Subtotal: Total count in table =3 Outlook Temperature Sunny High Cloudy Low

The Weather example: Expected Count If attributes were independent, then the subtotals would be Like this (this table is also known as temperature Outlook High Low Subtotal Sunny 2*2/3=4/3=1.3 2*1/3=2/3=0.6 2 Cloudy 2*1/3=0.6 1*1/3=0.3 1 Subtotal: Total count in table =3 Outlook Temperature Sunny High Cloudy Low

Question: How different between observed and expected? If Chi-squared value is very large, then A1 and A2 are not independent  that is, they are dependent! Degrees of freedom: if table has n*m items, then freedom = (n-1)*(m-1) In our example Degree = 1 Chi-Squared=?

Chi-Squared Table: what does it mean? If calculated value is much greater than in the table, then you have reason to reject the independence assumption When your calculated chi-square value is greater than the chi2 value shown in the 0.05 column (3.84) of this table  you are 95% certain that attributes are actually dependent! i.e. there is only a 5% probability that your calculated X2 value would occur by chance

Example Revisited (http://helios.bto.ed.ac.uk/bto/statistics/tress9.html) We don’t have to have two-dimensional count table (also known as contingency table) Suppose that the ratio of male to female students in the Science Faculty is exactly 1:1, But, the Honours class over the past ten years there have been 80 females and 40 males. Question: Is this a significant departure from the (1:1) expectation? Observed Honours Male Female Total 40 80 120

Expected (http://helios.bto.ed.ac.uk/bto/statistics/tress9.html) Suppose that the ratio of male to female students in the Science Faculty is exactly 1:1, but in the Honours class over the past ten years there have been 80 females and 40 males. Question: Is this a significant departure from the (1:1) expectation? Note: the expected is filled in, from 1:1 expectation, instead of calculated Expected Honours Male Female Total 60 120

Chi-Squared Calculation Female Male Total Observed numbers (O) 80 40 120 Expected numbers (E) 60 O - E 20 -20 (O-E)2 400 (O-E)2 / E 6.67 Sum=13.34 = X2

Chi-Squared Test (Cont.) Then, check the chi-squared table for significance http://helios.bto.ed.ac.uk/bto/statistics/table2.html#Chi%20squared%20test Compare our X2 value with a c2 (chi squared) value in a table of c2 with n-1 degrees of freedom n is the number of categories, i.e. 2 in our case -- males and females). We have only one degree of freedom (n-1). From the c2 table, we find a "critical value of 3.84 for p = 0.05. 13.34 > 3.84, and the expectation (that the Male:Female in honours major are 1:1) is wrong!

Chi-Squared Test in Weka: weather.nominal.arff

Chi-Squared Test in Weka

Chi-Squared Test in Weka

Example of Decision Tree Induction Initial attribute set: {A1, A2, A3, A4, A5, A6} A4 ? A1? A6? Class 2 Class 1 Class 2 Class 1 Reduced attribute set: {A1, A4, A6} >

Principal Component Analysis Given N data vectors from k-dimensions, find c <= k orthogonal vectors that can be best used to represent data The original data set is reduced to one consisting of N data vectors on c principal components (reduced dimensions) Each data vector Xj is a linear combination of the c principal component vectors Y1, Y2, … Yc Xj= m+W1*Y1+W2*Y2+…+Wk*Yc, i=1, 2, … N M is the mean of the data set W1, W2, … are the ith components Y1, Y2, … are the ith Eigen vectors Works for numeric data only Used when the number of dimensions is large

Principal Component Analysis See online tutorials such as http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf X2 Y1 Y2 x Note: Y1 is the first eigen vector, Y2 is the second. Y2 ignorable. X1 Key observation: variance = largest!

Principal Component Analysis: one attribute first Temperature 42 40 24 30 15 18 35 Question: how much spread is in the data along the axis? (distance to the mean) Variance=Standard deviation^2

Now consider two dimensions X=Temperature Y=Humidity 40 90 30 15 70 Covariance: measures the correlation between X and Y cov(X,Y)=0: independent Cov(X,Y)>0: move same dir Cov(X,Y)<0: move oppo dir

More than two attributes: covariance matrix Contains covariance values between all possible dimensions (=attributes): Example for three attributes (x,y,z):

Background: eigenvalues AND eigenvectors Eigenvectors e : C e = e How to calculate e and : Calculate det(C-I), yields a polynomial (degree n) Determine roots to det(C-I)=0, roots are eigenvalues  Check out any math book such as Elementary Linear Algebra by Howard Anton, Publisher John,Wiley & Sons Or any math packages such as MATLAB

Steps of PCA Let be the mean vector (taking the mean of all rows) Adjust the original data by the mean X’ = X – Compute the covariance matrix C of adjusted X Find the eigenvectors and eigenvalues of C. For matrix C, vectors e (=column vector) having same direction as Ce : eigenvectors of C is e such that Ce=e,  is called an eigenvalue of C. Ce=e  (C-I)e=0 Most data mining packages do this for you.

Steps of PCA (cont.) Calculate eigenvalues  and eigenvectors e for covariance matrix: Eigenvalues j corresponds to variance on each component j Thus, sort by j Take the first n eigenvectors ei; where n is the number of top eigenvalues These are the directions with the largest variances

An Example Mean1=24.1 Mean2=53.8 X1 X2 X1' X2' 19 63 -5.1 9.25 39 74 14.9 20.25 30 87 5.9 33.25 23 -30.75 15 35 -9.1 -18.75 43 -10.75 32 -21.75 73 19.25

Covariance Matrix C= Using MATLAB, we find out: Eigenvectors: 75 106 482 C= Using MATLAB, we find out: Eigenvectors: e1=(-0.98,-0.21), 1=51.8 e2=(0.21,-0.98), 2=560.2 Thus the second eigenvector is more important!

If we only keep one dimension: e2 yi -10.14 -16.72 -31.35 31.374 16.464 8.624 19.404 -17.63 We keep the dimension of e2=(0.21,-0.98) We can obtain the final data as

Using Matlab to figure it out

PCA in Weka

Wesather Data from UCI Dataset (comes with weka package)

PCA in Weka (I)

Summary of PCA PCA is used for reducing the number of numerical attributes The key is in data transformation Adjust data by mean Find eigenvectors for covariance matrix Transform data Note: only linear combination of data (weighted sum of original data)

Missing and Inconsistent values Linear regression: Data are modeled to fit a straight line least-square method to fit Y=a+bX Multiple regression: Y = b0 + b1 X1 + b2 X2. Many nonlinear functions can be transformed into the above.

Regression Height Y1 Y1’ y = x + 1 X1 Age

Clustering for Outlier detection Outliers can be incorrect data. Clusters  majority behavior

Data Reduction with Sampling Allow a mining algorithm to run in complexity that is potentially sub-linear to the size of the data Choose a representative subset of the data Simple random sampling may have very poor performance in the presence of skew (uneven) classes Develop adaptive sampling methods Stratified sampling: Approximate the percentage of each class (or subpopulation of interest) in the overall database Used in conjunction with skewed data

Sampling SRSWOR (simple random sample without replacement) SRSWR Raw Data SRSWOR (simple random sample without replacement) SRSWR

Sampling Example Cluster/Stratified Sample Raw Data

Summary Data preparation is a big issue for data mining Data preparation includes Data warehousing Data reduction and feature selection Discretization Missing values Incorrect values Sampling