Compression-based Unsupervised Clustering of Spectral Signatures D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller WHISPERS, Lisbon, 8.06.2011.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
15-583:Algorithms in the Real World
Albert Gatt Corpora and Statistical Methods Lecture 13.
PARTITIONAL CLUSTERING
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Google Similarity Distance Presented by: Akshay Kumar Pankaj Prateek.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Patch to the Future: Unsupervised Visual Prediction
Chain Rules for Entropy
Introduction to Bioinformatics
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
CodeSimian CS491B – Andrew Weng. Motivation Academic integrity is a universal issue Plagiarism is still common today Kaavya Viswanathan (Harvard Student)
Mutual Information Mathematical Biology Seminar
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Fundamental limits in Information Theory Chapter 10 :
Biology How Does Information/Entropy/ Complexity fit in?
What is Cluster Analysis?
Clustering by Compression Rudi Cilibrasi (CWI), Paul Vitanyi (CWI/UvA)
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques
Information Theory and Security
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
1 Statistical NLP: Lecture 5 Mathematical Foundations II: Information Theory.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Information theory, fitness and sampling semantics colin johnson / university of kent john woodward / university of stirling.
Some basic concepts of Information Theory and Entropy
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
2. Mathematical Foundations
INFORMATION THEORY BYK.SWARAJA ASSOCIATE PROFESSOR MREC.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Anomaly Detection Using Symmetric Compression Benjamin Arai & Chris Baron Computer Science and Engineering Department University of California - Riverside.
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.
On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Chapter 11: The Data Survey Supplemental Material Jussi Ahola Laboratory of Computer and Information Science.
(Important to algorithm analysis )
Algorithmic Information Theory, Similarity Metrics and Google Varun Rao.
Remote Sensing Supervised Image Classification. Supervised Image Classification ► An image classification procedure that requires interaction with the.
Towards Compression-based Information Retrieval Daniele Cerra Symbiose Seminar, Rennes,
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Randomized Algorithms for Bayesian Hierarchical Clustering
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
Coding Theory Efficient and Reliable Transfer of Information
Computer Graphics and Image Processing (CIS-601).
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
1 Lecture 7 System Models Attributes of a man-made system. Concerns in the design of a distributed system Communication channels Entropy and mutual information.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Deploying Analytical Redundancy for System Fault Tolerance V. Cortellessa, D. Del Gobbo, A. Mili, M. Shereshevsky, and Z. Zhuang CSEE Dept. West Virginia.
Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
Lecture 21: GIS Analytical Functionality (V)
Introduction to Information theory
Bag-of-Visual-Words Based Feature Extraction
Research in Computational Molecular Biology , Vol (2008)
Digital Multimedia Coding
REMOTE SENSING Multispectral Image Classification
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

Compression-based Unsupervised Clustering of Spectral Signatures D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller WHISPERS, Lisbon,

Folie 2 Contents Introduction CBSM as Spectral Distances Traditional Spectral distances NCD as spectral Distance Compression-based Similarity Measures How to quantify Information? Normalized Compression Distance

Folie 3 Contents CBSM as Spectral Distances Compression-based Similarity Measures Introduction

Folie 4 Introduction Many applications in hyperspectral remote sensing rely on quantifying the similarities between two pixels, represented by spectra: Classification / Segmentation Target Detection Spectral Unmixing Spectral distances Mostly based on vector processing Any different (and effective) similarity measure out there? Similar! Not Similar!

Folie 5 Contents Introduction Compression-based Similarity Measures How to quantify Information? Normalized Compression Distance CBSM as Spectral Distances

Folie 6 How to quantify information? Two approaches Probabilistic (classic) Algorithmic VS. Information  Uncertainty Shannon Entropy Information  Complexity Kolmogorov Complexity Related to a single object (string) x Length of the shortest program q among Qx programs which outputs the string x Measures how difficult it is to describe x from scratch Uncomputable Related to a random variable X with probability mass function p(x) Measure of the average uncertainty in X Measures the average number of bits required to describe X Computable

Folie 7 VS. (Statistic) Mutual InformationAlgorithmic Mutual Information Amount of computational resources shared by the shortest programs which output the strings x and y The joint Kolmogorov complexity K(x,y) is the length of the shortest program which outputs x followed by y Symmetric, non-negative If then K(x,y) = K(x) + K(y) x and y are algorithmically independent Measure in bits of the amount of information a random variable X has about another variable Y The joint entropy H(X,Y) is the entropy of the pair ( X,Y ) with a joint distribution p(x,y) Symmetric, non-negative If I(X ; Y) = 0 then H(X;Y) = H(X) + H(Y) X and Y are statistically independent Mutual Information in Shannon/Kolmogorov Probabilistic (classic) Algorithmic

Folie 8 Normalized Information Distance (NID) Normalized length of the shortest program that computes x knowing y, as well as computing y knowing x Similarity Metric NID(x,y)=0 iff x=y NID(x,y)=1 -> maximum distance between x and y The NID minimizes all normalized admissible distances NID (x, y) = Li - Vitányi

Folie 9 Compression: Approximating Kolmogorov Complexity Big problem! The Kolmogorov complexity K(x) is uncomputable! K(x) represents a lower bound for what an off-the-shelf compressor can achieve when compressing x What if we use the approximation: C(x) is the size of the file obtained by compressing x with a standard lossless compressor (such as Gzip) A Original size: 65 Kb Compressed size: 47 Kb B Original size: 65 Kb Compressed size: 2 Kb

Folie 10 Normalized Compression Distance (NCD) Approximate the NID by replacing complexities with compression factors If two objects compress better together than separately, it means they share common patterns and are similar!! Advantages Basically parameter-free (data-driven) Applicable with any off-the-shelf compressor to diverse datatypes x y Coder C(x) C(y) C(xy) NCD

Folie 11 Evolution of CBSM 1993 Ziv & Merhav First use of relative entropy to classify texts 2000 Frank et al., Khmelev First compression-based experiments on text categorization 2001 Benedetto et al. Intuitively defined compression-based relative entropy Caused a rise of interest in compression-based methods 2002 Watanabe et al. Pattern Representation based on Data Compression (PRDC) First in classifying general data with a first step of conversion into strings 2004 NCD Solid theoretical foundations (Algorithmic Information Theory) Many things came next… Chen-Li Metric for DNA classification (Chen & Li, 2005) Compression-based Dissimilarity Measure (Keogh et al., 2006) Cosine Similarity (Sculley & Brodley, 2006) Dictionary Distance (Macedonas et al., 2008) Fast Compression Distance (Cerra and Datcu, 2010)

Folie 12 Compression-Based Similarity Measures: Applications Clustering and classification of: Simple Texts Dictionaries from different languages Music DNA genomes Volcanology Chain letters Authorship attribution Images …

Folie 13 How to visualize a distance matrix? An unsupervised clustering of a distance matrix related to a dataset can be carried out with a dendrogram (binary tree) A dendrogram represents a distance matrix in two dimensions It recursively splits the dataset in two groups containing similar objects The most similar objects appear as siblings abcdef a b c d e f

Folie 14 Rodents An all-purpose method: application to DNA genomes Clustered by Primates

Folie 15 Landslides Explosions Volcanology Separate Explosions (ex) from Landslides (Ls) Stromboli Volcano

Folie 16 Optical Images Hierarchical Clustering 60 Spot 5 subsets, spatial resolution 5m

Folie 17 SAR Scene Hierarchical Clustering 32 TerraSAR-X subsets, Acquired over Paris, spatial resolution 1.8m False Alarm

Folie 18 Contents Introduction CBSM as Spectral Distances Traditional Spectral distances NCD as spectral Distance Compression-based Similarity Measures

Folie 19 Rocks Categorization 41 spectra From Aster 2.0 Spectral Library Spectra belonging to different rocks may present a similar behaviour or overlap MaficFelsicShale

Folie 20 Some well-known Spectral Distances Euclidean Distance Spectral Angle Spectral Correlation Spectral Information Divergence

Folie 21 Results Evaluation of the dendrogram through visual inspection Is it possible to cut the dendogram to separate the classes? How many objects would be misplaced given the best cuts?

Folie 22 Conclusions The NCD can be employed as a spectral distance, and may provide surprising results Why? The NCD is resistant to noise Differences between minerals of the same class may be regarded as noise The NCD (implicitly) focuses on the relevant information within the data We guess that the analysis benefits from considering the general behaviour of the spectra Drawbacks Computationally intensive (spectra have to be analyzed sequentially) Dependent to some extent on the compressor used In every case the best compressor for the data at hand should be used, which approximates at best the Kolmogorov complexity

Folie 23 Compression