Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Projections Simple transformations can often make a large difference in performance Example transformations (not necessarily for performance improvement): Difference of two date attributes Ratio of two numeric (ratio-scale) attributes Concatenating the values of nominal attributes Encoding cluster membership Adding noise to data Removing data randomly or selectively Obfuscating the data

Rodney Nielsen, Human Intelligence & Language Technologies Lab Principal Component Analysis Method for identifying the important “directions” in the data Can rotate data into (reduced) coordinate system that is given by those directions Algorithm: 1.Find direction (axis) of greatest variance 2.Find direction of greatest variance that is perpendicular to previous direction and repeat Implementation: find eigenvectors of covariance matrix by diagonalization ● Eigenvectors (sorted by eigenvalues) are the directions

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example: 10-Dimensional Data Can transform data into space given by components Data is normally standardized for PCA Could also apply this recursively in tree learner

Co-occurrence matrix applebloodcellsibmdataboxtissuegraphicsmemoryorganplasma pc20013100000 body03000020021 disk10020301200 petri02100020101 lab00302020213 sales00023001200 linux20013201100 debt00023402000

Rodney Nielsen, Human Intelligence & Language Technologies Lab Singular Value Decomposition A=UDV ’

U.35.09-.2.52-.09.40.02.63.20-.00-.02.05-.49.59.44.08-.09-.44-.04-.6-.02-.01.35.13.39-.60.31.41-.22.20-.39.00.03.08-.45.25-.02.17.09.83.05-.26-.01.00.29-.68-.45-.34-.31.02-.21.01.43-.02-.07.37-.01-.31.09.72-.48-.04.03.31-.00.08.46.11-.08.24-.01.39.05.08 -.00-.01.56.25.30-.07-.49-.52.14-.3-.30.00-.07

D 9.19 6.36 3.99 3.25 2.52 2.30 1.26 0.66 0.00

V.21.08-.04.28.04.86-.05 -.31-.12.03.04-.37.57.39.23-.04.26-.02.03.25.44.11-.39-.27-.32-.30.06.17.15-.41.58.07.37.15.12-.12.39-.17-.13.71-.31-.12.03.63-.01-.45.52-.09-.26.08-.06.21.08-.02.49.27.50-.32-.45.13.02-.01.31.12-.03.09-.51.20.05-.05.02.29.08-.04-.31-.71.25.11.15-.12.02-.32.05-.59-.62-.23.07.28-.23-.14-.45.64.17-.04-.32.31.12-.03.04-.26.19.17-.06-.07-.87-.10-.07.22-.20.11-.47-.12-.18-.27.03-.18.09.12-.58.50

Co-occurrence matrix after SVD applebloodcellsibmdatatissuegraphicsmemoryorganplasma pc.73.00.111.32.0.01.86.77.00.09 body.001.21.3.00.331.6.00.85.841.5 disk.76.00.011.32.1.00.91.72.00 germ.001.11.2.00.491.5.00.86.771.4 lab.211.72.0.351.72.5.181.71.22.3 sales.73.15.391.32.2.35.85.98.17.41 linux.96.00.161.72.7.031.11.0.00.13 debt 1.2.00 2.13.2.001.51.1.00

Rodney Nielsen, Human Intelligence & Language Technologies Lab Effect of SVD SVD reduces a matrix to a given number of dimensions This may convert a word level space into a semantic or conceptual space If “ dog ” and “ collie ” and “ wolf ” are dimensions/columns in the word co-occurrence matrix, after SVD they may be a single dimension that represents “ canines ” The dimensions are the principle components that may (hopefully) represent the meaning of concepts SVD has effect of smoothing a very sparse matrix, so that there are very few 0 valued cells

Rodney Nielsen, Human Intelligence & Language Technologies Lab Context Representation Represent each instance of the target word to be clustered by averaging the word vectors associated with its context This creates a “ second order ” representation of the context The context is represented not only by the words that occur therein, but also the words that occur with the words in the context elsewhere in the training corpus

Second Order Context Representation These two contexts share no words in common, yet they are similar! disk and linux both occur with “ Apple ”, “ IBM ”, “ data ”, “ graphics ”, and “ memory ” The two contexts are similar because they share many second order co-occurrences applebloodcellsibmdatatissuegraphicsmemoryorganPlasma disk.76.00.011.32.1.00.91.72.00 linux.96.00.161.72.7.031.11.0.00.13 I got a new disk today! What do you think of linux?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Random Projections PCA is nice but expensive: cubic in number of attributes Alternative: use random directions (projections) instead of principle components Surprising: random projections preserve distance relationships quite well (on average) Can use them to apply kD-trees to high-dimensional data Can improve stability by using ensemble of models based on different projections

Rodney Nielsen, Human Intelligence & Language Technologies Lab Partial Least-Squares Regression PCA is often a pre-processing step before applying a learning algorithm When linear regression is applied the resulting model is known as principal components regression Output can be re-expressed in terms of the original attributes Partial least-squares differs from PCA in that it takes the class attribute into account Finds directions that have high variance and are strongly correlated with the class

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithm 1.Start with standardized input attributes 2.Attribute coefficients of the first PLS direction: ● Compute the dot product between each attribute vector and the class vector in turn 3.Coefficients for next PLS direction: ● Original attribute values are first replaced by difference (residual) between the attribute's value and the prediction from a simple univariate regression that uses the previous PLS direction as a predictor of that attribute ● Compute the dot product between each attribute's residual vector and the class vector in turn 4.Repeat from 3

Rodney Nielsen, Human Intelligence & Language Technologies Lab Text to Attribute Vectors Many data mining applications involve textual data (eg. string attributes in ARFF) Standard transformation: convert string into bag of words by tokenization Attribute values are binary, word frequencies (f ij ), log(1+f ij ), or TF  IDF: Only retain alphabetic sequences? What should be used as delimiters? Should words be converted to lowercase? Should stopwords be ignored? Should hapax legomena be excluded? Or even just keep the k most frequent words?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Time Series In time series data, each instance represents a different time step Some simple transformations: Shift values from the past/future Compute difference (delta) between instances (ie. “derivative”) In some datasets, samples are not regular but time is given by timestamp attribute Need to normalize by step size when transforming Transformations need to be adapted if attributes represent different time steps

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Similar presentations

Presentation on theme: "Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall.

Similar presentations

Presentation on theme: "Data Science Dimensionality Reduction WFH: Section 7.3 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall."— Presentation transcript:

Similar presentations

About project

Feedback