A Low-Complexity Universal Architecture for Distributed Rate-Constrained Nonparametric Statistical Learning in Sensor Networks Avon Loy Fernandes, Maxim Raginsky & Todd P. Coleman Introduction 1. We consider the problem of fitting a function of sensor location to a series of noisy observations. 2.Characteristics of the problem: Regression : The ultimate goal is for the function to be a good predictor of the response of a sensor randomly placed in the field with a small mean squared error. Nonparametric : We only assume that the function lies in some class of sufficiently smooth functions. Algorithm is universal : it works for any joint distribution of sensor locations and measurements as long as the regression function is sufficiently smooth. No assumptions are made a priori about the noise distribution – it is allowed to be unbounded and need not be additive. 3.This setup corresponds to real-world scenarios where a large number of cheap sensors are deployed in the field where little can be assumed about the ambient noise present. In many such scenarios, the underlying object of interest is a smooth function (e.g. temperature gradient, pressure gradient etc.) Learning & Rate Constraints The sensors have to communicate wirelessly to the fusion center and must therefore digitize the observations. The fusion center will only see a quantized version of the sensor observations, but will have complete knowledge of the sensor locations. Each sensor knows its own location. An algorithm is proposed where each sensor quantizes its observations and passes one message to one neighbor that relates to a universal sequential probability estimator. Each sensor then compresses its observation according to its probability estimate and passes the information (bits) to the fusion center. The fusion center uses a decompression algorithm, decodes the quantizer indices losslessly and then obtains the quantized representation of the sensor outputs. A statistical learning algorithm is used to approximate the function. Only low-complexity operations are performed by the encoder. Each sensor quantizes its observation using a uniform scalar quantizer. The indices are losslessly compressed using arithmetic or Huffman codes, where the probability model is obtained by the simple message passing scheme. This approach demonstrates a novel linkage of principles from data compression and information theory with principles from nonparametric estimation and statistical learning theory. Proposed Approach 1. Field Model : N sensors are placed uniformly at random on the unit square [0,1] 2. The output of the i th sensor is of the form, where f :[0,1] 2 →R is some smooth function, X i are the sensor locations, and Z 1,…,Z N are i.i.d. Gaussian RVs ~ N (0,σ 2 ). This model is standard, but the scheme applies to an arbitrary relation between X and Y. 2. Quantization : Given ε>0 (the quantizer step size), the encoder mapping is defined as follows: Let U 1,…,U N be i.i.d. RVs drawn from a uniform distribution on the interval. This is the dither signal, which is known at the fusion center. Dithering is necessary to make the estimator robust. The i th sensor computes The unit square [0,1] 2 is divided into n ordered squares, where n is chosen adaptively depending on the quantizer outputs M i. Then L i є {1,…,n} is the number of the square containing X i. Similarly, the dither interval is divided into n subintervals (also chosen adaptively) and K i є {1,…,n} is the number of the subinterval containing U i. The i th sensor knows (M i,K i,L i ); the fusion center knows all (K i,L i ). 3. Krichevsky-Trofimov Estimator : Given the index M i =m i, each sensor calculates the conditional probability: where the numerator and denominator contain the K-T Estimate defined as follows: The K-T Estimator induces a lossless code whose redundancy converges to zero as N →∞. 4. Decoder: The fusion center, upon decoding M, computes: 5. Estimation of the function: An estimator using Fourier coefficients is used to estimate f : where, The cutoffs J 1,n and J 2,n can either be chosen adaptively or prior knowledge on the smoothness of f can be incorporated into their choice. Theoretical Guarantees 1. Bit Rate : For a given quantizer step size, the average number of bits per sensor is bounded from above by the conditional rate distortion function of Y given X evaluated at epsilon, plus bits (Ziv’s bound with side information). In practice, this number will be slightly higher than Ziv’s bound. 2. Estimator Performance : Using Ziv’s entropy-coded scalar quantization with dither, the proposed estimator is an unbiased, efficient estimator of theta. 3. The algorithm attains minimax convergence rates for the MSE for regression functions that are Lipschitz, Sobolev, etc. Scalar quantization of sensor observations affects the multiplicative constants, but not the convergence rates. Results Conclusions The algorithm was able to learn the function well (i.e. low MSE). Empirically, the algorithm performs very close to Ziv’s bound with side information. MSE curves reproduce the linear relation between MSE and epsilon. The algorithm is attractive at low epsilon values because we can use a huge number of sensors, communicating their observations at very low rates, and still get minimax-rate convergence of the MSE. This approach can also be generalized to multiplicative noise and Poisson noise. References [1] Maxim Raginsky, “Learning From Compressed Observations”, Proceedings of the 2007 IEEE Information Theory Workshop, Lake Tahoe, CA [2] Ye Wang and Prakash Ishwar, “On Non-Parametric Field Estimation using Randomly Deployed, Noisy, Binary Sensors”, Proceedings of ISIT 2007, Nice, France [3] Jacob Ziv, “On Universal Quantization”, IEEE Transactions on Information Theory, v IT-31, n 3, May, 1985, p [4] Vladimir N. Vapnik, “An Overview of Statistical Learning Theory”, IEEE Transactions on Neural Networks, v 10, n 5, Sept. 1999, p [5] Ram Zamir and Meir Feder, “On Universal Quantization by Randomized Uniform/Lattice Quantizers”, IEEE Transactions on Information Theory, v 38, n 2 pt I, Mar, 1992, p [6] Slobodan N. Simić, “A Learning-Theory Approach to Sensor Networks”, IEEE Pervasive Computing, v 2, n 4, October/December, 2003, p Illinois Center for Wireless Systems