Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Introduction to Non Parametric Statistics Kernel Density Estimation.
Scale & Affine Invariant Interest Point Detectors Mikolajczyk & Schmid presented by Dustin Lennon.
ECG Signal processing (2)
Fast Algorithms For Hierarchical Range Histogram Constructions
Alternative Simulation Core for Material Reliability Assessments Speculation how to heighten random character of probability calculations (concerning the.
Is it statistically significant?
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
STAT 497 APPLIED TIME SERIES ANALYSIS
Robust Foreground Detection in Video Using Pixel Layers Kedar A. Patwardhan, Guillermoo Sapire, and Vassilios Morellas IEEE TRANSACTION ON PATTERN ANAYLSIS.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Pattern Recognition and Machine Learning
Elementary hypothesis testing
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Ambiguity in Radar and Sonar Paper by M. Joao D. Rendas and Jose M. F. Moura Information theory project presented by VLAD MIHAI CHIRIAC.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Classification with reject option in gene expression data Blaise Hanczar and Edward R Dougherty BIOINFORMATICS Vol. 24 no , pages
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
Kernel Density Estimation - concept and applications
On Error Preserving Encryption Algorithms for Wireless Video Transmission Ali Saman Tosun and Wu-Chi Feng The Ohio State University Department of Computer.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Numerical Analysis - Simulation -
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Software Reliability SEG3202 N. El Kadri.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj NIPS 2009.
CS654: Digital Image Analysis Lecture 30: Clustering based Segmentation Slides are adapted from:
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Gaussian Processes Li An Li An
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
Applications of Supervised Learning in Bioinformatics Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Image Segmentation Shengnan Wang
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Yun, Hyuk Jin. Theory A.Nonuniformity Model where at location x, v is the measured signal, u is the true signal emitted by the tissue, is an unknown.
Estimating standard error using bootstrap
Inference for the Mean of a Population
Deep Feedforward Networks
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Unsupervised Riemannian Clustering of Probability Density Functions
Non-Parametric Models
A segmentation and tracking algorithm
K Nearest Neighbor Classification
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Unfolding Problem: A Machine Learning Approach
REMOTE SENSING Multispectral Image Classification
Presented by Nagesh Adluru
Welcome to the Kernel-Club
Generally Discriminant Analysis
LECTURE 09: BAYESIAN LEARNING
Presentation transcript:

Robust Kernel Density Estimation by Scaling and Projection in Hilbert Space Presented by: Nacer Khalil

Table of content 1.Introduction 1.Definition of robustness 2.Robust Kernel Density Estimation 2.Nonparametric Contamination Models 3.Scaled project Kernel Density Estimator 4.Experiment and conclusion 5.Case study 1

Introduction What is robustness robustness is the ability of a computer system to cope with errors during execution (wikipedia) Robustness: – Reliability – Availability – Resilience 2

Introduction What is robustness Robustness in machine learning is: – The extend to which testing errors is consistent with training errors – The extent to which the performance of the algorithm resists to noise 3

Introduction Robust kernel density estimation Density Estimation – Parametric – Nonparametric Non parametric density estimation: – Enables working under more general assumptions, but not robust – Difficulty in making it robust 4

Introduction Problem statement How to make non parametric kernel density estimation robust? We consider the situation where most observations come from a target density f tar but some are drawn from contaminating function f con 5

Introduction Contribution The authors introduce a new formalism to describe transformations that “decontaminates” f obs The process of decontamination is: – Scaling: Multiply the KDE by a real number to scale – Shifting: Find closest pdf to the scaled KDE in the L 2 norm 6

2. Nonparametric Contamination Models Problem setting: We know: – – ε We do not know f tar and f con 7

Nonparametric Contamination Models New formalism Let D be the set of all pdfs in R d Let the term contamination model refers to any subset V in D x D. i.e. (f tar,f con ) Let R ε : D -> D be a set of transformations on D indexed by ε in [0,1) We say R ε decontaminates V for all (f tar,f con ) in V ε in [0,1) we have: 8

Nonparametric Contamination Models Proposed contamination method 9

Decontamination procedure To recover f tar, we need to scale f obs by β= 1/(1-ε) – Let 10

Decontamination procedure 11

Other possible decontamination models Use anomaly detection and construct KDE from non-anomalous samples. Level set method: for a probability measure μ, this method finds the set S with smallest Lebesque measure such that μ(S) > t (threshold). The samples outside the sample are declared anomalous Find connected components and declare those that are as being anomalous 12

Other possible decontamination models 13

Scaled Projection Kernel Density Estimator Let’s consider approximating R a ε in a finite sample situation. Let f in L 2 (R 2 ) be a pdf and X 1, …,X n be samples from f. Let k σ (x,x’) be a smoothing kernel with bandwidth σ. The classic density estimator is: Since we do not know ε, we will scale β > 1 14

Classic kernel estimator We define SPKDE is defined Can be represented 15

The minimization is a quadratic program a = [a 1, …,a n ] G: Gram matrix of k σ (.,X 1 ),…, k σ (.,X n ) b: G 1 β/n 16

In infinite sample SKPDE decontamination 17

Experiment and conclusion Datasets To show SPKDE properties: – Used Synthetic data – Idealized experiment where contamination is uniform – Sample size of 500 and ε = 0.2 therefore β = 1.25 For the remaining experiments: – 12 classification datasets – ε = 0, 0.05, 0.1, 0.2, 0.25, 0.3 – Each test is performed on 15 permutations of the dataset 18

Experiment and conclusion Performance criteria Investigated Kullback Leibler (KL) divergence Given performance metric and contamination amount, we compare the mean performance using the Wilcoxon signed test 19

Kullback Leibler (KL) divergence 20

Experiment and conclusion Methods 21

Experiment and conclusion SPKDE is effective at compensating for contamination using the DKL metric SPKDE outperforms RKDE RejKDE is significantly worse than SPKE SPKDE also outperforms KDE when no contamination takes place 22

Mini case study Apply SPKDE to the old faithful dataset We generate a new contaminated dataset using a Gaussian We mix the clean and contaminated dataset and apply SPKDE 23

Old faithful density clean data 24

Old faithful density contaminated data, ε =

SPKDE Scaling 26 We multiply the density by β = 1/(1-ε)

Shifting Search for uniform function that has matches highest number of points in distribution Slice the uniform function from the distribution – Start with threshold and count how many points are close – Increase threshold until number of points decreases – Slice those points from distribution 27

Shifting 28

29

Conclusion Paper presents a way to construct a robust kernel density estimation Makes a number of assumptions – Contamination rate is known – The contamination is uniform (Shape of distribution does not change) You need more information about the contamination – Distribution and contamination rate 30