Towards Compression-based Information Retrieval Daniele Cerra Symbiose Seminar, Rennes, 18.11.2010.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Lecture 2: Basic Information Theory TSBK01 Image Coding and Data Compression Jörgen Ahlberg Div. of Sensor Technology Swedish Defence Research Agency (FOI)
Image Compression. Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Analysis of Algorithms
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
An introduction to Data Compression
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Compression-based Unsupervised Clustering of Spectral Signatures D. Cerra, J. Bieniarz, J. Avbelj, P. Reinartz, and R. Mueller WHISPERS, Lisbon,
Mutual Information Mathematical Biology Seminar
Fundamental limits in Information Theory Chapter 10 :
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Information Theory Rong Jin. Outline  Information  Entropy  Mutual information  Noisy channel model.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Fast multiresolution image querying CS474/674 – Prof. Bebis.
Information Theory and Security
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Probability and Statistics Review Thursday Sep 11.
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
Image and Video Compression
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Chapter 2 Source Coding (part 2)
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Positive and Negative Randomness Paul Vitanyi CWI, University of Amsterdam Joint work with Kolya Vereshchagin.
On Data Mining, Compression, and Kolmogorov Complexity. C. Faloutsos and V. Megalooikonomou Data Mining and Knowledge Discovery, 2007.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
COLOR HISTOGRAM AND DISCRETE COSINE TRANSFORM FOR COLOR IMAGE RETRIEVAL Presented by 2006/8.
Algorithmic Information Theory, Similarity Metrics and Google Varun Rao.
COMMUNICATION NETWORK. NOISE CHARACTERISTICS OF A CHANNEL 1.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Image Compression – Fundamentals and Lossless Compression Techniques
Competence Centre on Information Extraction and Image Understanding for Earth Observation Prof. Dr. Mihai Datcu SATELLITE IMAGE ARTIFACTS DETECTION BASED.
Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 17 Image Compression 17.1 Introduction Redundant and irrelevant information  “Your wife, Helen, will meet you at Logan Airport in Boston.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
CHAPTER 5 SIGNAL SPACE ANALYSIS
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Basic Concepts of Information Theory Entropy for Two-dimensional Discrete Finite Probability Schemes. Conditional Entropy. Communication Network. Noise.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
1 An Efficient Classification Approach Based on Grid Code Transformation and Mask-Matching Method Presenter: Yo-Ping Huang.
Digital Image Processing Lecture 22: Image Compression
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
1Causal Performance Models Causal Models for Performance Analysis of Computer Systems Jan Lemeire TELE lab May 24 th 2006.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Channel Coding Theorem (The most famous in IT) Channel Capacity; Problem: finding the maximum number of distinguishable signals for n uses of a communication.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
(C) 2000, The University of Michigan 1 Language and Information Handout #2 September 21, 2000.
Image Processing Architecture, © Oleh TretiakPage 1Lecture 5 ECEC 453 Image Processing Architecture Lecture 5, 1/22/2004 Rate-Distortion Theory,
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Data Compression.
Dipartimento di Ingegneria «Enzo Ferrari»,
Data Compression CS 147 Minh Nguyen.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chapter 8 – Compression Aims: Outline the objectives of compression.
Presentation transcript:

Towards Compression-based Information Retrieval Daniele Cerra Symbiose Seminar, Rennes,

Competence Centre on Information Extraction and Image Understanding for Earth Observation 2 Scope Pattern Matching Shannon Information Theory Data Compression Algorithmic KL-divergence Dictionary-based Similarity Grammar-based Similarity Content-based Image Retrieval Classification, Clustering, Detection Complexity Estimation for Annotated Datasets Theory Methods Applications Algorithmic Information Theory Main Contributions

Competence Centre on Information Extraction and Image Understanding for Earth Observation 3 Outline  The core: Compression-based similarity measures (CBSM)  Theory: a “Complex” Web  Speeding up CBSM  Applications and Experiments  Conclusions and Perspectives

Competence Centre on Information Extraction and Image Understanding for Earth Observation 4 Outline  The core: Compression-based similarity measures (CBSM)  Theory: a “Complex” Web  Speeding up CBSM  Applications and Experiments  Conclusions and Perspectives

Competence Centre on Information Extraction and Image Understanding for Earth Observation 5 Compression-based Similarity Measures  Most well-known: Normalized Compression Distance (NCD)  General Distance between any two strings x and y Similarity metric under some assumptions  Basically parameter-free  Applicable with any off-the-shelf compressor (such as Gzip)  If two objects compress better together than separately, it means they share common patterns and are similar x y Coder C(x) C(y) C(xy) NCD Li, M. et al., “The similarity metric”, IEEE Tr. Inf. Theory, vol. 50, no. 12, 2004

Competence Centre on Information Extraction and Image Understanding for Earth Observation 6 Applications of CBSM Clustering and classification of:  Texts  Music  DNA genomes  Chain letters  Images  Time Series  … Rodents Primates Landslides Explosions DNA Genomes Seismic signals Satellite images

Competence Centre on Information Extraction and Image Understanding for Earth Observation 7 Assessment and discussion of NCD results  Results obtained by NCD often outperform state-of-the-art methods  Comparisons with 51 other distances*  But NCD-like measures have always been applied to restricted datasets  Size < 100 objects in the main papers on the topic  All information retrieval systems use at least thousands of objects  More thorough experiments are required  NCD is too slow to be applied on a large dataset  1 second (on a 2.65 GHz machine) to process 10 strings of 10 KB each and output 5 distances  Being NCD data-driven, the full data has to be processed again and again to compute each distance from a given object  The price to pay for a parameter-free approach is that a compact representation of the data in any explicit parameter space is not allowed * Keogh, E., Lonardi, S. & Ratanamahatana, C. “Towards Parameter-free Data Mining”, SIGKDD 2004.

Competence Centre on Information Extraction and Image Understanding for Earth Observation 8 Outline  The core: Compression-based similarity measures (CBSM)  Theory: a “Complex” Web  Speeding up CBSM  Applications and Experiments  Conclusions and Perspectives

Competence Centre on Information Extraction and Image Understanding for Earth Observation 9 A “Complex” Web Classic Information Theory Algorithmic Information Theory Data Compression Alg. Mutual Information K(x)K(x) Compression H(X)H(X) NCD NID Shannon Mutual Information  How to quantify information?

Competence Centre on Information Extraction and Image Understanding for Earth Observation 10 Shannon Entropy  Information content of the output of a random variable X  Example: Entropy of the outcomes of the toss of a biased/unbiased coin  Max H(X) -> Coin not biased Every toss carries a full bit of information!  Note: H(X) can be (much) greater than 1 if the values that X can take are more than two!

Competence Centre on Information Extraction and Image Understanding for Earth Observation 11 Entropy & Compression: Shannon-Fano Code x P(x) SymbolABCDE Code Bits per Symbol in average  Compression is achieved! X={A,B,C,D,E} We should use 3 bits per symbol to encode the outcomes of X ABCDE 2/51/5 1/10

Competence Centre on Information Extraction and Image Understanding for Earth Observation 12 The probabilistic approach lacuna  H(X) is related to the probability density function of a data source X  It can’t tell the amount of information contained in an isolated object!  Algorithmic Information Theory comes to the rescue… s={I_carry_important_information}  What is the information content of s?  Source S: unknown

Competence Centre on Information Extraction and Image Understanding for Earth Observation 13 Algorithmic Information Theory: Kickoff 3 parallel definitions:  R. J. Solomonoff, “A formal theory of inductive inference”,  A. N. Kolmogorov, “Three approaches to the quantitative definition of information”,  G. J. Chaitin, “On the length of programs for computing finite binary sequences”, 1966.

Competence Centre on Information Extraction and Image Understanding for Earth Observation 14 Kolmogorov Complexity  Known also as:  Kolmogorov-Chaitin complexity  Algorithmic complexity  Shortest program q that outputs the string x and halts on an universal Turing machine  K(x) is a measure of the computational resources needed to specify the object x  K(x) is uncomputable

Competence Centre on Information Extraction and Image Understanding for Earth Observation 15 Kolmogorov Complexity  A string with low complexity   13 x (Write 001)  A string with high complexity   Write

Competence Centre on Information Extraction and Image Understanding for Earth Observation 16 Two approaches to information content Probabilistic (classic) Algorithmic VS. Information  Uncertainty Shannon Entropy Information  Complexity Kolmogorov Complexity  Related to a single object x  Length of the shortest program q among Qx programs which outputs the finite binary string x and halts on a Universal Turing Machine  Measures how difficult it is to describe x from scratch  Uncomputable  Related to a discrete random variable X on a finite alphabet A with a probability mass function p(x)  Measure of the average uncertainty in X  Measures the average number of bits required to describe X  Computable if p(x) is known

Competence Centre on Information Extraction and Image Understanding for Earth Observation 17 Alg. Mutual Information A “Complex” Web  How to measure the information shared between two objects? Classic Information Theory Algorithmic Information Theory Data Compression K(x)K(x) Compression H(X)H(X) NCD NID Shannon Mutual Information

Competence Centre on Information Extraction and Image Understanding for Earth Observation 18 Probabilistic (classic) Algorithmic VS. (Statistic) Mutual Information Algorithmic Mutual Information  Amount of computational resources shared by the shortest programs which output the strings x and y  The joint Kolmogorov complexity K(x,y) is the length of the shortest program which outputs x followed by y  Symmetric, non-negative  If then  K(x,y) = K(x) + K(y)  x and y are algorithmically independent  Measure in bits of the amount of information a random variable X has about another variable Y  The joint entropy H(X,Y) is the entropy of the pair ( X,Y ) with a joint distribution p(x,y)  Symmetric, non-negative  If I(X ; Y) = 0 then  H(X;Y) = H(X) + H(Y)  X and Y are statistically independent Shannon/Kolmogorov Parallelisms: Mutual Information

Competence Centre on Information Extraction and Image Understanding for Earth Observation 19 A “Complex” Web  How to derive a computable similarity measure? Classic Information Theory Algorithmic Information Theory Data Compression Alg. Mutual Information K(x)K(x) Compression H(X)H(X) NCD NID Shannon Mutual Information

Competence Centre on Information Extraction and Image Understanding for Earth Observation 20 Compression: Approximating Kolmogorov Complexity  K(x) represents a lower bound for what an off-the-shelf compressor can achieve  Vitányi et al. suggest this approximation:  C(x) is the size of the file obtained by compressing x with a standard lossless compressor (such as Gzip) A Original size: 65 Kb Compressed size: 47 Kb B Original size: 65 Kb Compressed size: 2 Kb

Competence Centre on Information Extraction and Image Understanding for Earth Observation 21 Back to NCD  The size K(x) of the shortest program which outputs x is assimilated to the size C(x) of the compressed version of x  Normalized measure of the elements that a compressor may use twice when compressing two objects x and y  Derived from algorithmic mutual information  Normalized length of the shortest program that computes x knowing y, as well as computing y knowing x  Similarity metric minimizing any admissible metric ComputableAlgorithmic Normalized Compression Distance (NCD) Normalized Information Distance (NID)

Competence Centre on Information Extraction and Image Understanding for Earth Observation 22 A “Complex” Web  What if we add some more? Classic Information Theory Algorithmic Information Theory Data Compression Alg. Mutual Information K(x)K(x) Compression H(X)H(X) NCD NID Shannon Mutual Information Occam’s razor

Competence Centre on Information Extraction and Image Understanding for Earth Observation 23 Occam’s Razor William of Occam 14th century All other things being equal, the simplest solution is the best! If two explanations exist for a given phenomenon, pick the one with the smallest Kolmogorov Complexity! Say I! Maybe today William of Occam could say:

Competence Centre on Information Extraction and Image Understanding for Earth Observation 24 An Example of Occam’s razor  The Copernican model (Ma) is simpler than the Ptolemaic (Mb).  Mb in order to explain the motion of Mercury relative to Venus, introduced the existence of epicycles in the orbits of the planets.  Ma accounts for this motion with no further explanation.  Finally, the model Ma was acknowledged as correct.  Note that we can assume:  K(Ma) < K(Mb)

Competence Centre on Information Extraction and Image Understanding for Earth Observation 25 Outline  The core: Compression-based similarity measures (CBSM)  Theory: a “Complex” Web  Speeding up CBSM  Applications and Experiments  Conclusions and Perspectives

Competence Centre on Information Extraction and Image Understanding for Earth Observation 26 Evolution of CBSM  1993 Ziv & Merhav  First use of relative entropy to classify texts  2000 Frank et al., Khmelev  First compression-based experiments on text categorization  2001 Benedetto et al.  Intuitively defined compression-based relative entropy  Caused a rise of interest in compression-based methods  2002 Watanabe et al.  Pattern Representation based on Data Compression (PRDC)  Dictionary-based  First in classifying general data with a first step of conversion into strings  Independent from IT concepts  2004 NCD  Solid theoretical foundations (Algorithmic Information Theory)  Other similarity measures  Keogh et al. (Compression-based Dissimilarity Measure),  Chen & Li (Chen-Li Metric for DNA classification)  Sculley & Brodley (Cosine Similarity)  Differ from NCD only by their normalization factors - Sculley & Brodley (2006)  2008 Macedonas et al.  Independent definition of dictionary distance

Competence Centre on Information Extraction and Image Understanding for Earth Observation 27 Pattern Recognition based on Data Compression (PRDC) Distance of object x from object y Length of string representing the object x Length of string x coded with the dictionary D(y) extracted from y Watanabe et al., 2002  PRDC equation is not normalized according to the complexity of x and skips the joint compression step.  Normalizing the equation used in PRDC, almost identical measure with NCD are obtained ( average difference on 400 measures)  PRDC can be inserted in the list of measures which differ by NCD only for the normalization factor (Sculley & Brodley, 2006) Length of string representing the object x coded with the dictionary extracted from itself Length of string x coded with the dictionary D(xy) extracted from x and y joint Distance between two objects encoded into strings x and y

Competence Centre on Information Extraction and Image Understanding for Earth Observation 28 Grammar-based Approximation  A dictionary extracted from a string x in PRDC may be regarded as a model for x  To better approximate K(x), consider the smallest Context-Free Grammar (CFG) generating x.  The grammar’s set of rules can be regarded as the smallest dictionary and generative model for x. : size of x of length N represented by its smallest context-free grammar G(x) : number of rules contained in the grammar  Two-part complexity representation  Model + data given the model (MDL-like)  Complexity overestimations are intuitively accounted for and decreased in the second term NCD with..IntraclassInterclassDifference Standard Compressor Grammar approximation Avg. distances on 40,000 measurements Sample CFG G(z) for string z = {aaabaaacaaadaaaf}

Competence Centre on Information Extraction and Image Understanding for Earth Observation 29 Comparisons... Normalized PRDC

Competence Centre on Information Extraction and Image Understanding for Earth Observation 30 Comparisons... NCD Grammars Bears far apart Rodents scattered Rodents Primates Seals Divided Note  Using a specific compressor for DNA genomes improves NCD results considerably

Competence Centre on Information Extraction and Image Understanding for Earth Observation 31 Drawbacks of the Introduced CBSM Solution  Extract dictionaries from the data, possibly offline  Compare only those Similarity Measure AccuracySpeed NCD Algorithmic Kullback-Leibler PRDC Normalized PRDC Grammar-based ?? How to combine accuracy and speed? Comparable to NCD Worse than NCD Better than NCD

Competence Centre on Information Extraction and Image Understanding for Earth Observation 32 Outline  The core: Compression-based similarity measures (CBSM)  Theory: a “Complex” Web  Speeding up CBSM  Applications and Experiments  Conclusions and Perspectives

Competence Centre on Information Extraction and Image Understanding for Earth Observation 33 Images preprocessing: 1D encoding Scanning Direction  What about loss of textural information?  Horizontal textural information is already implicit in the dictionaries  Basic vertical interactions are stored for each pixel  Smooth / Rough: 1 bit of information  Other solutions (e.g. Peano scanning) gave worse performances  Conversion to Hue Saturation Value (HSV) color space  Scalar quantization  4 bits for Hue Human eye is more sensitive to changes in hue  2 bits for Saturation  2 bits for Value HSV color space

Competence Centre on Information Extraction and Image Understanding for Earth Observation 34 Dictionary-based Distance: Dictionary Extraction  Convert each image to a string and extract meaningful patterns into dictionaries using an LZW-like compressor  Unlike LZW, loose (or no) constraints on dictionary size, flexible alphabet size  Sort entries in the dictionaries in order to enable binary searches  Store only the dictionary CA AA BB..ABABBCA.. AA BB CA  Dictionary-based universal compression algorithm  Improvement by Welch (1984) over the LZ78 compressor (Lempel & Ziv, 1978)  Searches for matches between the text to be compressed and a set of previously found strings contained in a dictionary  When a match is found, a substring is substituted by a code representing a pattern in the dictionary LZW ”TOBEORNOTTOBEORTOBEORNOT!”

Competence Centre on Information Extraction and Image Understanding for Earth Observation 35 Fast Compression Distance  Consider two dictionaries to compute a distance between them as the difference between shared/not shared patterns  The joint compression step of NCD is now replaced by an inner join of two sets  NCD acts like a black box, FCD simplifies it by making dictionaries explicit  Dictionaries have to be extracted only once (also offline) Count(select * from (D(x))) Count(select * from Inner_Join(D(x),D(y)))

Competence Centre on Information Extraction and Image Understanding for Earth Observation 36 How fast is FCD with respect to NCD? Operations needed for the joint compression steps (LZW-based NCD)  Further advantages  If in the search a pattern gives a mismatch, ignore all extensions of that pattern Ensured by LZW’s prefix-closure property  Ignore shortest patterns (regard them as noise)  To reduce storage space, ignore all redundant patterns which are prefixes of others No losses also ensured by LZW’s prefix-closure property  Complexity decreases by approx. one order of magnitude n. elements in x n. patterns in x

Competence Centre on Information Extraction and Image Understanding for Earth Observation 37 Datasets 164 video frames from the movie “Run, Lola, Run” 1500 digital photos and hand- drawn images 90 books of known Italian authors 10,200 photographs of objects pictured from 4 different points of view 144 infrared images of meadows some of which contain fawns Corel Lola Liber Fawns & Meadows Nister-Stewenius

Competence Centre on Information Extraction and Image Understanding for Earth Observation 38 Authorship Attribution AuthorTextsSuccesses Dante88 D’Annunzio44 Deledda1515 Fogazzaro55 Guicciardini66 Machiavelli1210 Manzoni44 Pirandello1111 Salgari1111 Svevo55 Verga99 TOTAL9088 Classification accuracy: comparison with 6 other compression-based methods Running times comparison for the top 3 methods FCD

Competence Centre on Information Extraction and Image Understanding for Earth Observation 39 Example of NCD’s Failure: Wild Animals Detection The 3 missed detections (FCD) Confusion Matrices FawnMeadowAccuracyTime FCD Fawn % 58 sec Meadow0103 NCD Fawn % 14 min Meadow Fawns 103 Meadows Limited buffer size in the compressor and total loss of vertical texture causes NCD’s performance to decrease! Compressor used with NCD: LZW Image size: 160x120

Competence Centre on Information Extraction and Image Understanding for Earth Observation 40 Content-based image retrieval system Classical (Smeulders, 2000) Many steps and parameters to set Proposed Compare each object in the set to a query image Q and rank the results on the basis of their distance from Q

Competence Centre on Information Extraction and Image Understanding for Earth Observation 41 Applications: COREL Dataset 1500 images, 15 classes, 100 images per class AfricansBeachArchitectureBuses Dinosaurs Mountains Elephants Food Flowers Horses CavesPostcards Sunsets TigersWomen

Competence Centre on Information Extraction and Image Understanding for Earth Observation 42 Precision (P) vs. Recall (R) Evaluation True Positives False Positives False Negatives Minimum Distortion Information Retrieval (Jeong & Gray, 2004) Jointly Trained Codebook (Daptardar & Storer, 2008) Running time: 18 min (images resampled to 64x64)2 Processors (2GHz) + 2 GB RAM

Competence Centre on Information Extraction and Image Understanding for Earth Observation 43 Confusion Matrix  Classification according to the minimum average distance from a class

Competence Centre on Information Extraction and Image Understanding for Earth Observation 44 False alarms (?) Typical images belonging to the class “Africans” The 10 misclassified images  Mostly misclassified as “Tigers”  No human presence  “Extreme” image misclassified as “Food”

Competence Centre on Information Extraction and Image Understanding for Earth Observation 45 Estimation of the “complexity” of a dataset Dataset Classes Objects Content Diversity Fawns & Meadows2144Average Lola19164Average Corel151500High Liber 1190Low Dataset MAP score Fawns & Meadows Liber 0.81 Lola0.661 Corel0.433 Average Reduced High Complexity Rank

Competence Centre on Information Extraction and Image Understanding for Earth Observation 46 A larger dataset and a comparison with state-of-the-art methods: Nister-Stewenius  images  2550 objects photographed under 4 points of view  Score (from 1 to 4) represents the number of meaningful objects retrieved in the top-4  SIFT-based NS1 and NS2 use different training sets and parameters settings, and yield different results  FCD is independent from parameters  Only 1000 images processed for NCD  Query Time: 8 seconds

Competence Centre on Information Extraction and Image Understanding for Earth Observation 47 SAR Scene Hierarchical Clustering 32 TerraSAR-X subsets acquired over Paris False Alarm

Competence Centre on Information Extraction and Image Understanding for Earth Observation 48 Outline  The core: Compression-based similarity measures (CBSM)  Theoretical Foundations  Contributions: Theory  Contributions: Applications and Experiments  Conclusions

Competence Centre on Information Extraction and Image Understanding for Earth Observation 49 What can (not) compression-based techniques do  The FCD allows validating compression-based techniques  Largest dataset tested by NCD: 100 objects  Largest dataset tested by FCD: objects  CBSM do NOT Outperform every existing technique  Results obtained so far on small datasets could be misleading  On the larger datasets analyzed, results are often inferior to the state of the art  Open question: could they be somehow improved?  BUT they yield comparable results to other existing techniques  Reducing drastically (ideally skipping) the parameters definition  Skipping feature extraction  Skipping clustering (in the case of classification)  Without assuming a priori knowledge of the data  Data driven, applicable to any kind of data

Competence Centre on Information Extraction and Image Understanding for Earth Observation 50 Compression