Algebraic Techniques for Analysis of Large Discrete-Valued Datasets  Mehmet Koyutürk and Ananth Grama, Dept. of Computer Sciences, Purdue University {koyuturk,

Slides:

Advertisements

Similar presentations

Eigen Decomposition and Singular Value Decomposition

Advertisements

Eigen Decomposition and Singular Value Decomposition

CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.

PARTITIONAL CLUSTERING

Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.

Dimensionality Reduction PCA -- SVD

15-826: Multimedia Databases and Data Mining

Data Mining Techniques: Clustering

Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,

What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.

Lecture 21: Spectral Clustering

DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December

Principal Component Analysis

3D Geometry for Computer Graphics

Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.

1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.

Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.

TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.

IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.

Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Kathryn Linehan Advisor: Dr. Dianne O’Leary

Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.

DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD

Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)

CS 277: Data Mining Recommender Systems

Machine Learning CS 165B Spring Course outline Introduction (Ch. 1) Concept learning (Ch. 2) Decision trees (Ch. 3) Ensemble learning Neural Networks.

Chapter 2 Dimensionality Reduction. Linear Methods

CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.

CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.

Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.

1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.

Non Negative Matrix Factorization

SVD: Singular Value Decomposition

SINGULAR VALUE DECOMPOSITION (SVD)

The Effect of Dimensionality Reduction in Recommendation Systems

Technical Report of Web Mining Group Presented by: Mohsen Kamyar Ferdowsi University of Mashhad, WTLab.

Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.

1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.

Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.

CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.

DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.

Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 

Oracle Advanced Analytics

Auburn University

Document Clustering Based on Non-negative Matrix Factorization

School of Computer Science & Engineering

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Lecture: Face Recognition and Feature Reduction

Waikato Environment for Knowledge Analysis

LSI, SVD and Data Management

Parallelism in High-Performance Computing Applications

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

Parallelization of Sparse Coding & Dictionary Learning

Restructuring Sparse High Dimensional Data for Effective Retrieval

Latent Semantic Analysis

Presentation transcript:

Algebraic Techniques for Analysis of Large Discrete-Valued Datasets  Mehmet Koyutürk and Ananth Grama, Dept. of Computer Sciences, Purdue University {koyuturk, *This work was supported in part by National Science Foundation grants EIA , ACI and ACI

Motivation Handling large discrete-valued datasets Extracting relations between data items Summarizing data in an error-bounded fashion Clustering of data items Finding concise interpretable representations for clustered data Applications Association rule mining Classification Data partitioning & clustering Data compression

Algebraic Model Sparse matrix representation Each column corresponds to an item Each row corresponds to an instance Document-Term matrix (Information Retrieval) Columns: Terms Rows: Documents Buyer-Item matrix (Data Mining) Columns: Items Rows: Transactions Rows contain patterns of interest!

Basic Idea Not all such matrices are rank 1 (cannot be represented accurately as a single outer product) We must find the best outer product Concise Error-bounded x : presence vector y : pattern vector

An Example Consider the universe of items {bread, butter, milk, eggs, cereal} And grocery lists {butter, milk, cereal} {milk, cereal} {eggs, cereal} {bread, milk, cereal} These lists can be represented by a matrix as follows:

An Example (contd.) This rank-1 approximation can be interpreted as follows: Item set {milk, cereal} is characteristic to three buyers This is the most dominant pattern in the data

Rank-1 Approximation Problem: Given discrete matrix A mxn, find discrete vectors x mx1 and y nx1 to Minimize ||A-xy T || 2 F, the number of non-zeros in the error matrix NP-hard! Assuming continuous space of vectors and using basic algebraic transformations, the above minimization reduces to: Maximize (x T Ay) 2 / ||x|| 2 ||y|| 2

Background Singular Value Decomposition (SVD) [Berry et.al., 1995] Decompose matrix into A=U  V T U, V orthogonal,  contains singular values Decomposition based on underlying patterns Latent Semantic Indexing (LSI) Semi-Discrete Decomposition (SDD) [Kolda & O’Leary, 2000] Restrict entries of U and V to {-1,0,1} Can perform as well as SVD in LSI using less than one-tenth the storage[Kolda & O’Leary, 1998]

Background (contd.) Centroid Decomposition [Chu & Funderlic, 2002] Decomposition based on spatial clusters Centroid corresponds to the collective trend of a cluster Data characterized by correlation matrix Centroid method: Linear time heuristic to discover clusters Two drawbacks for discrete-attribute data Continuous in nature Computation of correlation matrix requires quadratic time

Background (contd.) Principal Direction Divisive Partitioning (PDDP) [Boley, 1998] Recursively splits matrix based on principal direction of vectors(rows) Does not force orthogonality Takes advantage of sparsity Assumes continuous space

Alternating Iterative Heuristic In continuous domain, the problem is: minimize F(d,x,y)=||A-dxy T || F 2 F(d,x,y)=||A|| F 2 -2d x T Ay+ d 2 ||x|| 2 ||y|| 2 (1) Setting  F /  d = 0 gives us the minimum of this function at d * =x T Ay/||x|| 2 ||y|| 2 (for positive definite matrix A) Substituting d* in (1), we get equivalent problem: maximize (x T Ay) 2 / ||x|| 2 ||y|| 2 This is the optimization metric used in SDD’s alternating iterative heuristic

Alternating Iterative Heuristic Approximate binary optimization metric to that of continuous problem Set s=Ay/||y|| 2, maximize (x T s) 2 /||x|| 2 This can be done by sorting s in descending order and assigning 1’s to components of x in a greedy fashion Optimistic, works well on very sparse data Example A=A= y 0 = [ ]  s x 0 = Ay = [1 1 0] T  x 0 = [1 1 0] T  s y 0 = A T y = [ ] T  y 1 = [ ]  s x 1 = Ay = [2 2 0] T  x 1 = [1 1 0] T

Initialization of pattern vector Crucial to find appropriate local optima Must be performed in at most  (nz(A)) time Some possible schemes Center: Initialize y as the centroid of rows, obviously cannot discover a cluster. Separator: Bipartition rows on a dimension, set center of one group as initial pattern vector. Greedy graph growing: Bipartition rows with starting from one row and growing a cluster centered on that row in a greedy manner, set center of that cluster as initial pattern vector.  Neighborhood: Randomly select a row, identify set of all rows that share a column with it, set center of this set as initial pattern vector. Aims at discovering smaller clusters, more successful.

Recursive Algorithm At any step, given rank-one approximation A  xy T, split A to A 1 and A 0 based on rows if x i =1 row i goes into A 1 if x i =0 row i goes into A 0 Stop when Hamming radius of A 1 is less then some threshold all rows of A are present in A 1 if Hamming radius of A 1 greater than threshold, partition based on hamming distances to pattern vector and recurse

Recursive Algorithm Example: A=A= set  =1 Rank-1 Appx.: y = [ ] x = [1 1 1] T  h.r. = 2 >   A

Effectiveness of Analysis Input: 4 uniform patterns intersecting pairwise, 1 pattern on each row (overlapping patterns of this nature are particularly challenging for many related techniques) Detected patterns Input permuted to demonstrate strength of detected patterns

Effectiveness of Analysis Input: 10 gaussian patterns, 1 pattern on each row Detected patterns Permuted input

Effectiveness of Analysis Input: 20 gaussian patterns, 2 patterns on each row Detected patterns Permuted input

Application to Data Mining Used for preprocessing data to reduce number of transactions for association rule mining Construct matrix A: Rows correspond to transactions Columns correspond to items Decompose A into XY T Y is the compressed transaction set Each transaction is weighted by the number of rows containing the pattern (# of non-zeros in the corresponding row of X)

Application to Data Mining (contd.) Transaction sets generated by IBM Quest data generator Tested on 10K to 1M transactions containing 20(L), 100(M), and 500(H) patterns A-priori algorithm ran on Original transaction set Compressed transaction set Results Speed-up in the order of hundreds Almost 100% precision and recall rates

Preprocessing Results Data# trans.# items# pats.# sing. vectors Prepro. time (secs.) M10K L100K M100K H100K M1M

Precision & Recall on M100K

Speed-up on M100K

Run-time Scalability -Rank-1 approximation requires O(nz(A)) time -Total run-time at each level in the recursive tree can't exceed this since total # of non-zeros at each level is at most nz(A)  Run-time is O(kXnz(A)) where k is the number of discovered patterns runtime vs # columnsruntime vs # rowsruntime vs # nonzeros Run-time on data with 2 gaussian patterns on each row

Conclusions and Ongoing Work Scalable to extremely high-dimensions Takes advantage of sparsity Clustering based on dominant patterns rather than pairwise distances Effective in discovering dominant patterns Hierarchical in nature, allowing multi- resolution analysis Current work Parallel implementation

References [Berry et.al., 1995] M. W. Berry, S. T. Dumais, and G. W. O'Brien, Using linear algebra for intelligent information retrieval, SIAM Review, 37(4): , [Boley, 1998] D. Boley, Principal direction divisive partitioning (PDDP),Data Mining and Knowledge Discovery, 2(4): , [Chu & Funderlic, 2002] M. T. Chu and R.E. Funderlic, The centroid decomposition: relationships between discrete variational decompositions and SVDs, SIAM J. Matrix Anal. Appl., 23(4): , [Kolda & O’Leary, 1999] T. G. Kolda and D. O’Leary, Latent semantic indexing via a semi-discrete matrix decomposition, In The Mathematics of Information Coding, Extraction and Distribution, G. Cybenko et al., eds., vol. 107 of IMA Volumes in Mathematics and Its Applications. Springer- Verlag, pp , [Kolda & O’Leary, 2000] T. G. Kolda and D. O’Leary, Computation and uses of the semidiscrete matrix decomposition, ACM Trans. On Math. Software, 26(3): , 2000.