Bayesian Nonparametric Classification and Applications

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayesian Belief Propagation
Gentle Introduction to Infinite Gaussian Mixture Modeling
Xiaolong Wang and Daniel Khashabi
Course: Neural Networks, Instructor: Professor L.Behera.
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Detecting MAC Layer Back-off Timer Violations in Mobile Ad Hoc Networks Venkata Nishanth Lolla, Lap Kong Law, Srikanth V. Krishnamurthy, Chinya Ravishankar,
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Supervised Learning Recap
Segmentation and Fitting Using Probabilistic Methods
EE-148 Expectation Maximization Markus Weber 5/11/99.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Department of Electrical and Computer Engineering Case Study of Big Data Analysis for Smart Grid Department of Electrical and Computer Engineering Zhu.
Lecture 5: Learning models using EM
Machine Learning CMPT 726 Simon Fraser University
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Radial Basis Function Networks
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Cooperative spectrum sensing in cognitive radio Aminmohammad Roozgard.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
COGNITIVE RADIO FOR NEXT-GENERATION WIRELESS NETWORKS: AN APPROACH TO OPPORTUNISTIC CHANNEL SELECTION IN IEEE BASED WIRELESS MESH Dusit Niyato,
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
1 Secure Cooperative MIMO Communications Under Active Compromised Nodes Liang Hong, McKenzie McNeal III, Wei Chen College of Engineering, Technology, and.
Bringing Inverse Modeling to the Scientific Community Hydrologic Data and the Method of Anchored Distributions (MAD) Matthew Over 1, Daniel P. Ames 2,
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Randomized Algorithms for Bayesian Hierarchical Clustering
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Stick-Breaking Constructions
Lecture 2: Statistical learning primer for biologists
Department of Electrical and Computer Engineering A NONPARAMETRIC BAYESIAN FRAMEWORK FOR MOBILE DEVICE SECURITY AND LOCATION BASED SERVICES Nam Tuan Nguyen.
Spectrum Sensing In Cognitive Radio Networks
Resource Allocation in Hospital Networks Based on Green Cognitive Radios 王冉茵
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Mohsen Riahi Manesh and Dr. Naima Kaabouch
Zhu Han University of Houston Thanks for Dr. Nam Nguyen Work
Latent Variables, Mixture Models and EM
REMOTE SENSING Multispectral Image Classification
Game Theory in Wireless and Communication Networks: Theory, Models, and Applications Lecture 2 Bayesian Games Zhu Han, Dusit Niyato, Walid Saad, Tamer.
Bayesian Models in Machine Learning
Collapsed Variational Dirichlet Process Mixture Models
SMEM Algorithm for Mixture Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
Robust Full Bayesian Learning for Neural Networks
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Biointelligence Laboratory, Seoul National University
Sofia Pediaditaki and Mahesh Marina University of Edinburgh
EM Algorithm and its Applications
Presentation transcript:

Bayesian Nonparametric Classification and Applications Department of Electrical and Computer Engineering Bayesian Nonparametric Classification and Applications Zhu Han Department of Electrical and Computer Engineering University of Houston. Thanks to Nam Nguyen, Guanbo Zheng, and Dr. Rong Zheng

Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Content Introduction Problem statement Basic concepts Dirichlet distribution/process Bayesian nonparametric classification Generative model Inference model Performance bounds Simulation results Applications in wireless security with device finger printing Device finger print Masquerade attack and Sybil attack detection Primary user emulation attack Overview of wireless amigo lab

Introduction: problem statement Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Introduction: problem statement Model selection: How many clusters are there? What’s the hidden process created the observations? What are the latent parameters of the process? The questions can be solved by using Nonparametric Bayesian Inference! Nonparametric: Number of clusters (or classes) can grow as more data is observed and need not to be known as a priori. Bayesian Inference: Use Bayesian rule to infer about the latent variables.

Examples of Bayesian inference used for parameter update Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Examples of Bayesian inference used for parameter update A Beta distribution is chosen to be prior: Example: a=2, b=2 (head and tail prob. are equal) A Binomial distribution is the conjugate likelihood: One trial (N=1) and the result is one head (m=1) Lead to the Posterior: Update of parameters given the observations The probability of head happening is higher

Key idea Posterior Likelihood Prior Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Key idea Need to choose a conjugate prior and likelihood distributions Posterior Likelihood Prior If we are able to choose the conjugate distributions, we can update our posterior like the example before. Want to sample the posterior distribution P(μ|Observations), and get values of the parameters, μ. μ can be any parameter.

Dirichlet distribution Bayesian Nonparametric Classification Dirichlet distribution An extension of the Beta distribution to multiple dimensions. K: number of clusters πi: weight with marginal distribution αi: prior

Bayesian Nonparametric Classification Dirichlet process A random distribution G on Θ is Dirichlet process distributed with base distribution H and concentration parameter α, written as G ∼ DP(α,H), if for every finite measurable partition A1, . . .,AK of Θ H(·), the mean of the DP, α, strength of the prior

Bayesian nonparametric update Bayesian Nonparametric Classification Bayesian nonparametric update Have t observation x1,…,xt. Define The posterior distribution on Θ The posterior Dirichlet process Small number of observation t, the prior dominates When t increases, the prior has less and less impact  controls the balance between the impact of prior and trials Can be used to learn and combine any distributions

Bayesian Nonparametric Classification Applications Distribution estimation Cognitive radio spectrum bidding Estimate the aggregated effects from all other CR users Primary user spectrum map Different CR users see the spectrum differently How to combine the others’ sensing (as a prior) with own sensing. Clustering for wireless security Infinite Gaussian mixture model

Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Content Introduction Problem statement Basic concepts Dirichlet distribution/process Bayesian nonparametric classification Generative model Inference model Performance bounds Simulation results Applications in wireless security with device finger printing Device finger printing Masquerade attack and Sybil attack detection Primary user emulation attack Overview of wireless amigo lab

Generative model vs Inference algorithm Bayesian Nonparametric Classification Generative model vs Inference algorithm In the generative model, we start with the parameters and end up creating observations. On the other hand, in the inference algorithm, we start with observations and end up inferring about the parameters.

Generative model: A general idea Bayesian Nonparametric Classification Generative model: A general idea Dir(1/6,1/6, 1/6,1/6,1/6,1/6): If we sample the above distr., we will obtain the weights, or the probabilities for each face. 1 2 3 4 5 6 ∞ 7 Question: If we have a dice with infinite number of faces, then how to deal with the situation? Dir() does not support infinite case. π1 π2 π3 π4 π5 π6

Generative model: Stick breaking process: Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Generative model: Stick breaking process: Generate an infinite number of weights which sum up to 1. Sample a breaking point: Calculate the weight: 1 1-π1’ π1’ (1-π2’ )(1-π1’) π2’ (1-π1’)

Bayesian Nonparametric Classification Generative model Stick(α) Infinite number of faces/classes 1 2 3 4 5 6 ∞ 7 π1 π2 π∞ Indicators are created according to multinomial distribution. z1, z2 .. = 1 z20, z21 .. = 2 X1:N The observations follows a distribution such as Gaussian. µ∞ Σ∞ µ1 Σ1 µ2 Σ2

Generative model: A graphical representation Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Generative model: A graphical representation zi π α xi H ѳi N ∞ 1 2 3 4 5 6 ∞ 7 µ∞ Σ∞ µ1 Σ1 µ2 Σ2 π1 π2 π∞

Bayesian Nonparametric Classification Inference model: Nonparametric Bayesian Classification algorithm – Gibbs sampler approach Finding the posterior of the multivariate distribution P(Z|X) Given the observation X, what are the probability that it belongs to cluster Z In other word, which cluster a sample belongs to? Painful due to the integrations needed to carry out. Instead, finding a univariate distribution is more easily to implement For new observation, want to know the distribution of indicator In other word, find the marginal distribution of Zi given the other indicators. Given the distribution, we can implement the Gibbs sampling method to sample a value for a variable given all the other variables. The process is repeated and proved to be converged after a few iterations. Purpose: Find Then use Gibbs sampler to sample values for the indicators.

Nonparametric Bayesian Classification inference Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Nonparametric Bayesian Classification inference Goal: is the set of all other labels except the current one, ith Prior (Chinese Restaurant Process) Likelihood (e.g. given as Gaussian) Posterior Probability assigned to a represented class Probability assigned to an unrepresented class is the number of observations in the same class, k, excluding the current one, ith ? ? Chinese Restaurant Process

Inference model: Posterior distributions Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Inference model: Posterior distributions Given the prior and the likelihood, we come up with the posterior: Probability of assigning to a unrepresented cluster: Probability of assigning to a represented cluster: (1) t is the student-t distribution (2)

Inference model: Gibbs sampler Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Inference model: Gibbs sampler Start with random indicator for each observation. Remove the current ith observation from its cluster Update the indicator zi according to (1) and (2) given all the other indicators No Converge? Yes STOP

Performance bounds for clustering Bayesian Nonparametric Classification Performance bounds for clustering Consider the scenario of two cluster in a D-dimensional space, what is the bounds of the probability of assigning a point to its correct cluster. Bhattacharyya Upper bound for the hit rate in assigning a point to its original cluster: where Hoeffding Lower bound for the hit rate in assigning a point to its original cluster:

Kullback-Leibler divergence Bayesian Nonparametric Classification Kullback-Leibler divergence We use the Kullback-Leibler Divergence (KLD) to measure the difference between two multivariate distribution, defined as below:

Intuition why it works so well Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Simulation results KLD = 4.5 Original Data KLD = 4.5 Clustered Result Intuition why it works so well Not the boundary or threshold. But clustering so that each cluster looks more like the distribution (Gaussian).

A victim nonparametric classification: Mean-Shift Clustering Method Nonparametric Bayesian Classification A victim nonparametric classification: Mean-Shift Clustering Method Used as a baseline nonparametric algorithm to compare with the Nonparametric Bayesian Classification method. Determine the means of the observations: Based on a kernel with diameter h to count the number of points inside a circle around an observation estimate the density. Take the derivative of the density and move toward the 0 gradient (means). Repeat the actions for all observations. Observations which move toward the same mean are in the same cluster.

Two clusters, detection of correct number of clusters Bayesian Nonparametric Classification Two clusters, detection of correct number of clusters NBC S

Two cluster, correct clustering performance Bayesian Nonparametric Classification Two cluster, correct clustering performance NBC

Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Content Introduction Problem statement Basic concepts Dirichlet distribution/process Bayesian nonparametric classification Generative model Inference model Performance bounds Simulation results Applications in wireless security with device finger printing Device finger printing Masquerade attack and Sybil attack detection Primary user emulation attack Overview of wireless amigo lab

Device Fingerprint: Literature review [S. Bratus, 2008]: Uses information about the chipset, the firmware or the driver of an 802.11 wireless device. Active detection methods require extra message exchanges. Furthermore, responses that are firmware, driver, OS dependent can be spoofed as well. [J. Hall, 2005]: Uses the transient signals at the start of transmissions. Basic drawback is transient signal is difficult to capture since it lasts on the order of only several hundreds of nanoseconds. [V. Brik, 2008]: The authors proposed a Passive RAdio-metric Device Identification System (PARADIS) with an accuracy of detection over 99%. The results shows convincingly that radiometric can be used effectively to differentiate wireless devices. Nonetheless, all the above methods require a training phase to collect and extract fingerprints of legitimate devices. Our method, in contrast, is an unsupervised approach. Users do not need to register first.

Device finger printing Bayesian Nonparametric Classification Device finger printing Extracted Features: Clustering over those dimension of features The Carrier Frequency Difference: defined as the difference between the carrier frequency of the ideal signal and that of the transmitted signal. The Phase Shift Difference: defined as the phase shift from one symbol to the one next to it:

Device finger printing Bayesian Nonparametric Classification Device finger printing 3. The Second-Order Cyclostationary Feature: Utilize the Cyclic Autocorrelation Function (CAF): Find CAF values at (for OFDM) α is the cyclic frequency 4. The Received Signal Amplitude (location dependent): d is distance from a transmitted device to the sensing device. |h| is the fading component, then Ap is defined as:

Applications: Masquerade and Sybil attack detection Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Applications: Masquerade and Sybil attack detection Key: Use device dependent radio-metrics as fingerprints Approaches: Features selected are channel invariant. Passive detection method. Unsupervised clustering approach. Device1 00-B0-D0-86-BB-F7 Device2 00-0C-F1-56-98-AD Device3 Masquerade attack Device1 00-B0-D0-86-BB-F7 00-0C-F1-56-98-AD Device2 00-A0-C9-14-C8-29 Sybil attack

Masquerade and Sybil attack detection: Algorithm Bayesian Nonparametric Classification Masquerade and Sybil attack detection: Algorithm Data Collection Device feature space K clusters and M MAC addresses K ≠ M? Yes K > M K < M Masquerade attack Sybil attack Determine the number of attackers Update the “black” list with the MAC address No Unsupervised clustering

Bayesian Nonparametric Classification Masquerade and Sybil attack detection: Simulation result in the case of N devices The number of devices is varied from 1 to 6 The diameter of each corresponding cluster is about 1 The feature space is chosen to be 40x40 As the number of devices (N) increases, performance decreases since the overlapping chance is bigger

Masquerade and Sybil attack detection: Preliminary USRP2 experiment Bayesian Nonparametric Classification Masquerade and Sybil attack detection: Preliminary USRP2 experiment We collected the fingerprints from some WiFi devices by the USRP2 and tried the algorithm, below is the result:

Applications: PUE attack detection Bayesian Nonparametric Classification Applications: PUE attack detection In Cognitive radio, a malicious node can pretend to be a Primary User (PU) to keep the network resources (bandwidth) for his own use. How to detect that? We use the same approach, collect device dependent fingerprints and classify the fingerprints. We limit our study to OFDM system using QPSK modulation technique.

Bayesian Nonparametric Classification PUE attack detection DECLOAK algorithm ROC curve

Conclusions Dirichlet distribution/process Bayesian Nonparametric Classification Department of Electrical and Computer Engineering Conclusions Dirichlet distribution/process Can study and combine any distribution with any prior distribution Application: Auction of spectrum, CPC to combine heterogeneous information Other applications? Infinite Gaussian mixture model Clustering with unknown number of clusters Device finger printing Security applications Beyond Gaussian? Find possible future collaborations

Bayesian Nonparametric Classification References Yee Whye Teh videos Dirichlet Processes: Tutorial and Practical Course Hierarchical Dirichlet Processes Zhu Han, Rong Zheng, and Vincent H. Poor, “Repeated Auctions with Bayesian Nonparametric Learning for Spectrum Access in Cognitive Radio Networks," IEEE Transactions on Wireless Communications, vol.10, no.3, pp.890-900, March 2011. Walid Saad, Zhu Han, H. Vincent Poor, Tamer Basar, and Ju Bin Song, “A Cooperative Bayesian Nonparametric Framework for Primary User Activity Monitoring in Cognitive Radio Networks," accepted, IEEE Journal on Selected Areas in Communications, special issue on Cooperative Network. Nam Tuan Nguyen, Guanbo Zheng, Zhu Han and Rong Zheng, ``Device Fingerprinting to Enhance Wireless Security using Nonparametric Bayesian Method,“ INFOCOM 2011. Nam Tuan Nguyen, Rong Zheng, and Zhu Han, “On Identifying Primary User Emulation Attacks in Cognitive Radio Systems Using Nonparametric Bayesian Classification," revision IEEE Transactions on Signal Processing.

References [Bishop, 2006]: Pattern Recognition and Machine Learning [S. Bratus, 2008]: Active behavioral Fingerprinting of Wireless Devices. [J. Hall, 2005]: Radio frequency fingerprinting for Intrusion Detection in Wireless Networks. [V. Brik, 2008]: Wireless Device Identification with Radiometric Signatures

Overview of Wireless Amigo Lab Lab Overview 7 Ph.D. students, 2 joint postdocs (with Rice and Princeton) supported by 5 concurrent NSF,1 DoD, and 1 Qatar grants Current Concentration Game theoretical approach for wireless networking Compressive sensing and its application Smartgrid communication Bayesian nonparametric learning Security: trust management, belief network, gossip based Kalman Physical layer security Quickest detection Cognitive radio routing/security Sniffing: femto cell and cloud computing USRP2 Implementation Testbed

Questions? Thank you very much