Collecting Correlated Information from Wireless Sensor Networks Presented by Bo Han Joint work with Aravind Srinivasan and Amol Deshpande
Outline l Introduction of wireless sensor networks l Communication architecture l Design factors of sensor networks l Applications of sensor networks l Correlated information collection l What we have done so far l Conclusion
Introduction l A sensor network is composed of a large number of sensor nodes, which are densely deployed either inside the phenomenon or very close to it. l Random deployment, the position of sensor nodes need not be engineered or predetermined. l Self-organizing capabilities. l Cooperative effort of sensor nodes.
Sensor Networks vs Ad-Hoc Networks l The number of nodes in a sensor network can be several orders of magnitude higher than the nodes in an Ad-Hoc network. l Sensor nodes are densely deployed. l Sensor nodes are prone to failures. l The topology of a sensor network changes very frequently.
Sensor Networks vs Ad-Hoc Networks l Sensor nodes mainly use broadcast, most ad hoc networks are based on point-to-point communication. l Sensor nodes are limited in power, computational capacities and memory. l Sensor nodes may not have global ID.
Communication Architecture l The sensor nodes are usually scattered in a sensor field. l Each of these scattered sensor nodes has the capabilities to collect data and route data back to the sink. l Data are routed back to the sink by a multi-hop infrastructureless architecture. l The sink may communicate with the task manager node via Internet or satellite.
Example of Sensor Networks
Data Delivery Models l Continuous: sensors communicate their data continuously at a prespecified rate. l Event driven: sensors report information only when the event of interest occurs. l Observer initiated (request-reply): sensors only reports their results in response to an explicit request from the observer. l Hybrid: all three approaches coexist.
Protocol Stack
Five Layers l The physical layer addresses the needs of simple but robust modulation, transmission, and receiving techniques. The MAC protocol must be power-aware and able to minimize collision with neighbors ’ broadcasts. l The network layer takes care of routing the data supplied by the transport layer. l The transport layer helps to maintain the flow of data if the sensor networks application requires it. l Different types of application software can be built and used on the application layer.
Three Plans l The power management plane manages how a sensor node uses its power. l The mobility management plane detects and registers the movement of sensor nodes, so a route back to the user is always maintained, and the sensor nodes can keep track of who their neighbor sensor nodes are. l The task management plane balances and schedules the sensing tasks given to a specific region.
Design Factors l Fault tolerance l Scalability l Production costs l Hardware constraints l Transmission media l Power consumption l Sensor network topology l Environment
Fault Tolerance l Fault tolerance is the ability to sustain sensor network functionalities without any interruption due to sensor node failures. l The fault tolerance level depends on the application of the sensor networks.
Scalability l Depending on the application, the number may reach an extreme value of millions. New schemes must be able to work with this number of nodes. l Basically, the density gives the number of nodes within the transmission radius of each node in a region. l Must also utilize the high density of the sensor networks.
Production Costs l The cost of a single node is very important to justify the overall cost of the networks, since sensor networks consist of large number of sensor nodes. l The cost of a sensor node should be less than a dollar.
Transmission Media l In a multihop sensor network, communicating nodes are linked by a wireless medium. l To enable global operation, the chosen transmission medium must be available worldwide. l Radio, infrared and optical media.
Power Consumption l Limited power source. l Battery lifetime is limited. l Each sensor node plays a dual role of data originator and data router (data processor). l The malfunctioning of a few nodes consumes lot of energy (rerouting of packets and significant topological changes).
Sensor Network Topology l Pre-deployment and deployment phase, either thrown in as a mass or placed one by one. Post-deployment phase, topology changes are due to change in sensor nodes ’ position, reachability, available energy, malfunctioning, and task details. l Re-deployment of additional nodes phase, additional sensor nodes can be redeployed at any time to replace malfunctioning nodes or due to changes in task dynamics.
Environment l Sensor nodes are densely deployed either very close or directly inside the phenomenon to be observed. l They usually work unattended in remote geographic areas. l They may be working in the interior of large machinery, at the bottom of an ocean, in a biologically or chemically contaminated field, in a battlefield beyond the enemy lines, and in a home or large building.
Applications of Sensor Networks l Military: Battlefield surveillance, Nuclear, biological and chemical attack detection and reconnaissance. l Environment: Forest fire detection, Flood detection. l Health: Telemonitoring of human physiological data, tracking and monitoring patients and doctors inside a hospital. l Home application: Home automation and smart environment.
Sensor Devices and Applications l Berkeley Motes l iBadge - UCLA MIT d'Arbeloff Lab – The ring sensor l Nose-on-a-chip Zilog ’ s eZ80 l iButton
Berkeley Motes Small (under 1 ” square) microcontroller. l It consists of: –Microprocessor –A set of sensors for temperature, light, acceleration and motion –A low power radio for communicating with other motes l C compiler inclusion.
Berkeley Motes
iBadge - UCLA l Investigate behavior of children/patient. l Features: –Speech recording / replaying –Position detection –Direction detection / estimation (compass) –Weather data: temperature, humidity, pressure and light
iBadge - UCLA
MIT d'Arbeloff Lab – The Ring Sensor l An ambulatory, telemetric, continuous health monitoring device developed by d'Arbeloff Laboratory for Information Systems and Technology at MIT. l Monitor the physiological status of the wearer and transmit the information to the medical professional over the Internet. l Clinical trials have been done in conjunction with Massachusetts General Hospital's Emergency Room, and researchers are now working on commercialization of the ring-sized device.
Nose-on-a-chip l Nose-on-a-chip is a MEMS- based sensor, developed at Oak Ridge National Laboratory. l Can detect 400 species of gases and transmit a signal indicating the level to a central control station. l Consists of an array of tiny sensors on one integrated circuit and electronics on another. l The chip can be customized to detect virtually any chemical or biological species.
Zilog ’ s eZ80 l Provides a way to internet-enabled process control and monitoring applications. l Temperature sensor, water leak detector and many more applications. l Enables users to access Webserver data and files from anywhere in the world.
iButton l A 16mm computer chip armored in a stainless steel can. l Up-to-date information can travel with a person or object.
Correlated Information Gathering l Problem Definition l Correlation Modeling l Distributed Source Coding l Asymmetric Communication Channels l Our Approach
Fundamental Problem l Collecting information from distributed sources. –Objective: correlations reduce bits that must be sent. l Correlation examples in sensor networks: –Weather in geographic region. –Similar views of same image. l Focus: information theory –Number of bits sent. –Ignore network topology.
Modeling Correlation l k sensor nodes each have n -bit string. l Input drawn from distribution D. –Sample specifies all kn bits. –Captures correlations and a priori knowledge. l Objective: –Inform server of all k strings. –Ideally: nodes send H(D) bits. –H(D): Binary entropy of D. x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 xkxk
Binary entropy of D l Optimal code to describe sample from D : –Expected number of bits required ≈ H(D). Ranges from 0 to kn. –E asy if entire sample known by single node. Idea: shorter codewords for more likely samples. l Challenge of this problem: input distributed.
(Distributed) Source Coding l Source coding is the process of encoding information using fewer bits than an unencoded representation, also called data compression. l DSC refers to the compression of the outputs of two or more physically separated sources. The sources do not communicate with each other. These sources send their compressed outputs to a central point for joint decoding. DSC is part of network information theory. l DSC has recently become a very active research area – more than 30 years after Slepian and Wolf laid its theoretical foundation.
Distributed Source Coding l [Slepian-Wolf, 1973]: –Simultaneously encode r independent samples. –As r , Bits sent by nodes rH(D). Probability of error 0. l Drawback: relies on r –Recent research: try to remove this.
Slepian-Wolf Theorem
Self-Coding & Foreign Coding
Network Coding l Network coding is a field of information theory and coding theory. l A method of attaining maximum information flow in a network. l The core notion of network coding is to allow mixing of data at intermediate network nodes. l A receiver sees these data packets and deduces from them the messages that were originally intended for that data sink.
Butterfly Network
New approach l Allow interactive communication! –Nodes receive “ feedback ” from server. –Server at least as powerful as nodes. l Power utilization: –Central issue for sensor networks. –Node sending is power intensive. –Node receiving requires less power.. x1x1 x2x2 x4x4 x5x5 xkxk x3x3
New approach l Communication model: –Synchronous rounds: Nodes send bits to server. Server sends bits back to nodes. Nothing directly between nodes. –Asymmetric Communication Channels l Objectives: –Minimize bits sent by nodes. Ideally O(H(D)+k). –Minimize bits sent by server. –Minimize rounds of communication. x1x1 x2x2 x4x4 x5x5 xkxk x3x3
Who knows what? l Nodes: only know own string. –Can also assume they know distribution D. l Server: knows distribution D. –Typical in work on DSC. –Some applications: D must be learned by server Most such cases: D varies with time. Crucial to have r as small as possible. D X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8, D
Fingerprint Function l A class of functions that map an n-bit string to a single bit such that if f is chosen uniformly at random from this class, then for any y 1 ≠ y 2 Pr[f(y 1 ) = f(y 2 )] ≤ p, for some p › ½ l An n 3 x n matrix with each item chosen uniformly at random from {0, 1}.
Correlation l Correlation: Multivariate Normal Distribution. l Put sensor nodes uniformly at random in a unit square. l Covariance matrix. –Diagonal items are all 1 –Cov(x, y) = d(x, y) r l Discretization of this continuous distribution. l Joint entropy estimation H(D).
Sample Set Generation l Determine the size of sample set. l Compute the Cholesky decomposition (matrix square root) of Σ, that is, find the unique lower triangular matrix A such that AA T = Σ. l Let Z = (z 1, …, z n ) T be a vector whose components are n independent standard normal variates (which can be generated, for example, by using the Box- Muller transform). l Let X be μ+ AZ (here, μ= 0).
The Algorithm l The algorithm can figure out the sensor node string with high probability in log 2 (H(D)) phases. l Each phase has two rounds. l First round: sensor nodes send fingerprint bits to the server. l Second round: server sends feedback to the sensor nodes.
Phase i – First Round l Node sends some number of fingerprint bits to the server, each specified by a fingerprint function chosen randomly. l When each fingerprint bit is processed, the inputs in the sample set that do not agree with that bit are discarded from consideration. l Before processing each fingerprint bit sent by the node, the server checks to see if it is an unbalanced bit: if some string x agrees with more than half of the elements of the sample set. l The string is called heavy string.
Phase i – Second Round l The server sends the heavy string to the node. l Node responses with a single indicator bit. l The server keeps track of the total number of balanced bits that it has received. l The server continues this process until it has received some predetermined number of balanced bits or the sample set is empty.
Conclusions l Lots of interesting problems in wireless sensor networks. l New technique for collecting distributed and correlated information (ongoing project). l Allows for arbitrary distributions and correlations. l Single sample sufficient. l Open problems: –Lower bound on rounds. –Incorporating network topology.
Reference l Top conference: MobiCom, MobiHoc, Sensys, IPSN, Infocom, SECON, SODA, STOC, FOCS. l I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, Wireless Sensor Networks: a Survey, Computer Networks 38 (2002) 393—422. l Z. Xiong, A. Liveris, and S. Cheng, Distributed Source Coding for Sensor Networks, IEEE Signal Processing Magazine 21 (2004) 80—94. l A. Ramamoorthy, K. Jain, P. Chou, and M. Effros, Separating Distributed Source Coding from Network Coding, IEEE/ACM Transactions on Networking 14 (2006) 2785—2795. l J. Liu, M. Adler, D. Towsley, and C. Zhang, On Optimal Communication Cost for Gathering Correlated Data through Wireless Sensor Networks, In Proceedings of MobiCom l T. Batu, S. Dasgupta, R. Kumar, and R. Rubinfeld, The Complexity of Approximating Entropy, In Proceedings of STOC l M. Adler, Collecting Correlated Information from a Sensor Network, In Proceedings of SODA 2005.
Questions?