- A Maximum Likelihood Approach Vinod Kumar Ramachandran ID:
Wireless Sensor Networks – Introduction A Network with thousands of tiny sensors deployed in a region of interest to perform a specific task. For example, to detect and classify vehicles moving in a given region Application (1) Battlefield Surveillance (2) Disaster Relief (3) Environmental Monitoring Key Design Issues Low Battery Power of Sensors Efficient Collaborative Signal Processing Algorithms (CSP) must be developed Design of proper communication schemes to transmit information to a manager node Philosophy Each Sensor has Limited Sensing and Processing Capabilities But thousands of such sensors can coordinate to achieve a very complex task Main Advantage of Sensor Networks – Ability to accomplish a very complex task with tiny, inexpensive sensors In detection and classification of targets, factors affecting individual decisions Measurement Noise Statistical Variability of Target Signals
Multi-Target Classification Objective: To detect and classify the targets present in a region of interest Suppose M distinct targets are possible. Then this becomes a N-ary hypothesis testing problem where N = 2 M For example, when M = 2, four hypotheses are possible H 0 : No target present H 1 : Target 1 alone is present H 2 : Target 2 alone is present H 3 : Both target 1 and target 2 is present Signal Model Signals from all sources are assumed Gaussian Region of interest can be divided into spatial coherence regions (SCRs) Signals obtained from distinct SCRs are independent and signals from sensors with a single SCR are highly correlated Correlated signals within individual SCRs can overcome measurement noise Independent signals across SCRs can overcome statistical variability of target signals
Problem Formulation For M distinct targets, N = 2 M hypotheses possible Probability of m-th target being present = denotes the presence ( = 1) or absence ( = 0 ) of m-th target. Prior probabilities under different hypotheses is given by where is the binary representation of integer j. Signals from different targets are assumed Gaussian with m-th target having a mean of zero and covariance matrix Assume there are G SCRs (independent measurements). The N-ary hypothesis testing problem is where measurement noise is added to the signal vector
Problem Formulation (contd…) Under H j, the probability density function at the k-th SCR is: and the G measurements are i.i.d. Basic Architecture for Multi-target Classification
Classification Strategies Centralized Fusion: Data from all SCRs are directly combined in an optimal fashion. Advantage: Serves as a benchmark as it is optimal. Disadvantage: Heavy communication burden because all data have to be transmitted to manager node. Decision Fusion: Individual SCRs make the decision and the decisions themselves are combined in an optimal manner in the manager node Advantage: Very low communication burden because only decisions are transmitted Disadvantage: May not be as reliable as centralized fusion Optimal Centralized Classifier: Decides the correct hypothesis according to This can also be written as
Performance of Optimal Classifier Note that each SCR has to transmit N likelihoods to the manager node. The probability of error of the optimal centralized classifier can be bounded as [1] Thus the probability of error goes to zero as the number of independent measurements G goes to infinity provided the Kullback-Leibler distances between any pairwise pdfs are strictly positive [1] J.Kotecha, V.Ramachandran, A.Sayeed, “Distributed Multitarget Classification in Wireless Sensor Networks”, IEEE Journal on Selected Areas in Commn., December 2003
Local Classifiers Local decisions are made at the individual SCRs based on the single measurement in that SCR. The local decisions are communicated to the manager node that makes the final optimal decision. These classifiers are also called distributed classifiers. Since only the decisions are transmitted, this comes as a big relief to the network in terms of the communication burden Optimal Local Classifier The optimal local classifier in the k-th SCR makes the decision as The pmfs under different hypothesis are characterized by: The optimal local classifier need to calculate N likelihoods for each decision and this grows exponentially with M targets.
Sub-optimal Local Classifiers To circumvent exponential complexity, the suboptimal local classifiers conduct M tests, one for each target for the presence or absence of the target Partition the set of hypotheses into two sets, and where contains those hypotheses in which the m-th target is present and contains those hypotheses in which it is absent Define Then The feature vector will be distributed as a weighted sum of Gaussians It has been shown in [1] that the suboptimal classifiers give perfect classification in the limit of large G, the number of independent measurements
Sub-optimal Local Classifiers (Contd…) Mixture Gaussian Classifer (MGC) For m = 1,…, M, let ( ) denote the value of the m-th binary hypothesis test between and in the k-th SCR = 1 if or = 0 otherwise For the simple case of M = 2, = 1 if It can be observed that the above test is a weighted sum of two tests Single Gaussian Classifier (SGC) This classifier approximates the pdfs as The value of the m-th test in k-th SCR is one if
Fusion of Local Decisions at the Manager Node Ideal Communication Links The classifier at the final manager node makes the decision as: The above expressions apply to all three local classifiers; the only difference is that the different local classifiers induce different pmfs Noisy Communication Links In noisy channels, each SCR sends an amplified version of its local hard decision. The hard decisions are corrupted by additive white Gaussian noise (AWGN). The manager node makes the final decision as:
Simulations Data Source: DARPA SensIT program Simulation Tool: MATLAB Results for Two Targets (M = 2) – Probability of Error versus G (independent Measurements) Measurement SNR = -4 dB Communication SNR = 0 dB, 10 dB
Measurement SNR = 10 dB Communication SNR = 0 dB, 10 dB Chernoff Bounds for M = 2 Targets with Ideal Communication Channel
Results for M = 3 Targets – Probability of Error versus G (independent measurements) Measurement SNR = -4 dB Communication SNR = 0 dB, 10 dB Measurement SNR = 10 dB Communication SNR = 0 dB, 10 dB
Observations In all cases, the error probabilities of the optimal centralized classifier serves as the lower bound and hence this is the best classifier possible. The key observation is that for all the classifiers, the probability of error goes down exponentially with the number of independent measurements, G. (Note that the plots shown are in the log scale). This indicates that we can attain reliable classification by combining a relatively moderate number of much less reliable independent local decisions. The performance improves with increasing measurement SNRs for all classifiers For any distributed classifier, the probability of error is higher with noisy communication channel than with ideal communication links. The performance of the mixture Gaussian classifier (MGC) is close to the optimal distributed classifier in all cases thereby making it an attractive choice in practice. The performance of the single Gaussian classifier (SGC) is worse compared to the MGC and the difference is large at higher measurement and communication SNRs In the simulations performed, the KL distances for all pairwise hypotheses were found to be positive and hence perfect classification is possible in the limit of large G. The Chernoff bounds shown match the error exponent fairly but exhibit an offset. The performance of all the classifiers is worse for M = 3 (eight hypotheses) case than for M = 2 (4 hypotheses) case The difference between the MGC and the optimal distributed classifier seem to increase for the three target case.
Conclusions The performance of the optimal centralized and distributed classifiers with exponential complexity have been compared with the sub-optimal distributed classifiers with linear complexity Several issues warrant further investigation such as: Signal model does not assume path loss in sensing measurements. When path loss is considered, the G measurements will be independent but not identically distributed. The path loss will limit the number of independent measurements. Relationship between the number of targets and the dimensionality of the feature vector. Increasing the feature vector dimension may result in improved performance for higher number of targets The sub-optimal CSP algorithms can be combined with other sub-optimal approaches such as tree-based classifiers to develop a classifier with even lower complexity