Independent Components in Text

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Outlines Background & motivation Algorithms overview
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Dimension reduction (1)
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Visual Recognition Tutorial
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Dimensional reduction, PCA
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Independent Component Analysis (ICA) and Factor Analysis (FA)
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
A Quick Practical Guide to PCA and ICA Ted Brookings, UCSB Physics 11/13/06.
A Unifying Review of Linear Gaussian Models
Outline Separating Hyperplanes – Separable Case
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 LING 696B: PCA and other linear projection methods.
Independent Component Analysis on Images Instructor: Dr. Longin Jan Latecki Presented by: Bo Han.
Access Control Via Face Recognition Progress Review.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
SINGULAR VALUE DECOMPOSITION (SVD)
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
CSE 185 Introduction to Computer Vision Face Recognition.
Latent Dirichlet Allocation
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Machine Learning Saarland University, SS 2007 Holger Bast Marjan Celikik Kevin Chang Stefan Funke Joachim Giesen Max-Planck-Institut für Informatik Saarbrücken,
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Principal Component Analysis (PCA)
Information Retrieval and Organisation Chapter 14 Vector Space Classification Dell Zhang Birkbeck, University of London.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Introduction to Independent Component Analysis Math 285 project Fall 2015 Jingmei Lu Xixi Lu 12/10/2015.
An Introduction of Independent Component Analysis (ICA) Xiaoling Wang Jan. 28, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 10: PRINCIPAL COMPONENTS ANALYSIS Objectives:
Irena Váňová. B A1A1. A2A2. A3A3. repeat until no sample is misclassified … labels of classes Perceptron algorithm for i=1...N if then end * * * * *
Face detection and recognition Many slides adapted from K. Grauman and D. Lowe.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Dynamic graphics, Principal Component Analysis Ker-Chau Li UCLA department of Statistics.
Evaluating Classifiers
Probability Theory and Parameter Estimation I
Document Clustering Based on Non-negative Matrix Factorization
LECTURE 11: Advanced Discriminant Analysis
Background on Classification
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
University of Ioannina
Outlier Processing via L1-Principal Subspaces
CH 5: Multivariate Methods
Information Retrieval
Principal Component Analysis (PCA)
Additive Data Perturbation: data reconstruction attacks
Dynamic graphics, Principal Component Analysis
Hyperparameters, bias-variance tradeoff, validation
Probabilistic Models with Latent Variables
Bayesian belief networks 2. PCA and ICA
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Feature space tansformation methods
Parametric Methods Berlin Chen, 2005 References:
Machine learning overview
Multivariate Methods Berlin Chen, 2005 References:
Restructuring Sparse High Dimensional Data for Effective Retrieval
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Independent Components in Text Paper by Thomas Kolenda, Lars Kai Hansne and Sigurdur Sigurdsson Yuan Zhijian

Vector Space Representations Indexing: Forming a team set of all words occurring in the database. -- Form term set -- Document -- Term-document matrix

Vector Space Representations Weighting: Determine the values of the weights Similarity measure: based on inner product of weight vectors or other metrics

LSI-PCA Model The main objective is to uncover hidden linear relations between histograms, by rotating the vector space basis. Simplify by taking the k largest singular values

ICA—Noisy Separation Model: X=AS+U Assumptions: -- I.I.d. Sources -- I.I.d. and Gaussian noise with variance and -- Source distribution:

ICA—Noisy Separation(cont.) Known mixing parameters, e.g. A, -- Bayes formula: P(S|X)œ P(X|S)P(S) -- Maximizing it w.r.t.S -- Solution: -- For low noise level

ICA (cont.) Text representations on the LSI space Document classification Key words -- Back projection of documents to the original vector histogram space

ICA (cont.) Generalisation error -- Principle tool for model selection Bias-variance dilemma: -- Too few components, leading high error -- Too many components, leading ”overfit”

Examples MED data set -- 124 abstracts, 5 groups, 1159 terms Results: -- ICA is successful in recognizing and ”explaining” the group structure.

Examples CRAN data set -- 5 classes, 138 documents, 1115 terms Results: -- ICA identified some group structure but not as convincingly as in the MED data

Conclusion ICA is quite fine Independence of the sources may or may not be well aligned with a manual labeling