Distributed Indexing and Querying in Sensor Networks using Statistical Models Arnab Bhattacharya Indian Institute of Technology (IIT),

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Indexing and Range Queries in Spatio-Temporal Databases
A Survey on Tracking Methods for a Wireless Sensor Network Taylor Flagg, Beau Hollis & Francisco J. Garcia-Ascanio.
1 Structures for In-Network Moving Object Tracking inWireless Sensor Networks Chih-Yu Lin and Yu-Chee Tseng Broadband Wireless Networking Symp. (BroadNet),
Rumor Routing in Sensor Networks David Braginsky and Deborah Estrin LECS – UCLA Modified and Presented by Sugata Hazarika.
1 Routing Techniques in Wireless Sensor networks: A Survey.
Rumor Routing in Sensor Networks David Braginsky and Deborah Estrin Presented By Tu Tran 1.
Improving the Performance of M-tree Family by Nearest-Neighbor Graphs Tomáš Skopal, David Hoksza Charles University in Prague Department of Software Engineering.
CS 795 – Spring  “Software Systems are increasingly Situated in dynamic, mission critical settings ◦ Operational profile is dynamic, and depends.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Answering Metric Skyline Queries by PM-tree Tomáš Skopal, Jakub Lokoč Department of Software Engineering, FMP, Charles University in Prague.
1 Next Century Challenges: Scalable Coordination in sensor Networks MOBICOMM (1999) Deborah Estrin, Ramesh Govindan, John Heidemann, Satish Kumar Presented.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
An Analysis of the Optimum Node Density for Ad hoc Mobile Networks Elizabeth M. Royer, P. Michael Melliar-Smith and Louise E. Moser Presented by Aki Happonen.
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
1 Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Dissemination protocols for large sensor networks Fan Ye, Haiyun Luo, Songwu Lu and Lixia Zhang Department of Computer Science UCLA Chien Kang Wu.
Distributed Quad-Tree for Spatial Querying in Wireless Sensor Networks (WSNs) Murat Demirbas, Xuming Lu Dept of Computer Science and Engineering, University.
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
An Authentication Service Against Dishonest Users in Mobile Ad Hoc Networks Edith Ngai, Michael R. Lyu, and Roland T. Chin IEEE Aerospace Conference, Big.
Distributed Quad-Tree for Spatial Querying in Wireless Sensor Networks (WSNs) Murat Demirbas, Xuming Lu Dept of Computer Science and Engineering, University.
Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks Xueyan Tang School of Computer Engineering Nanyang Technological.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
1 Internet Networking Spring 2006 Tutorial 3 Ad-hoc networks TBRPF (based on IETF tutorials on TBRPF)
Evaluation of Top-k OLAP Queries Using Aggregate R-trees Nikos Mamoulis (HKU) Spiridon Bakiras (HKUST) Panos Kalnis (NUS)
SIGMOD'061 Energy-Efficient Monitoring of Extreme Values in Sensor Networks Adam Silberstein Kamesh Munagala Jun Yang Duke University.
Wireless Video Sensor Networks Vijaya S Malla Harish Reddy Kottam Kirankumar Srilanka.
Energy Saving In Sensor Network Using Specialized Nodes Shahab Salehi EE 695.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
CS2510 Fault Tolerance and Privacy in Wireless Sensor Networks partially based on presentation by Sameh Gobriel.
Decentralized Scattering of Wake-up Times in Wireless Sensor Networks Amy L. Murphy ITC-IRST, Trento, Italy joint work with Alessandro Giusti, Politecnico.
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
Aggregation in Sensor Networks
Ad-hoc On-Demand Distance Vector Routing (AODV) and simulation in network simulator.
A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.
De-Nian Young Ming-Syan Chen IEEE Transactions on Mobile Computing Slide content thanks in part to Yu-Hsun Chen, University of Taiwan.
COMPUTING AGGREGATES FOR MONITORING WIRELESS SENSOR NETWORKS Jerry Zhao, Ramesh Govindan, Deborah Estrin Presented by Hiren Shah.
Trust- and Clustering-Based Authentication Service in Mobile Ad Hoc Networks Presented by Edith Ngai 28 October 2003.
Dynamic Source Routing in ad hoc wireless networks Alexander Stojanovic IST Lisabon 1.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
A new Ad Hoc Positioning System 컴퓨터 공학과 오영준.
Secure In-Network Aggregation for Wireless Sensor Networks
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
1 - CS7701 – Fall 2004 Review of: Detecting Network Intrusions via Sampling: A Game Theoretic Approach Paper by: – Murali Kodialam (Bell Labs) – T.V. Lakshman.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Aggregation and Secure Aggregation. Learning Objectives Understand why we need aggregation in WSNs Understand aggregation protocols in WSNs Understand.
Efficient Resource Allocation for Wireless Multicast De-Nian Yang, Member, IEEE Ming-Syan Chen, Fellow, IEEE IEEE Transactions on Mobile Computing, April.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
Movement-Based Check-pointing and Logging for Recovery in Mobile Computing Systems Sapna E. George, Ing-Ray Chen, Ying Jin Dept. of Computer Science Virginia.
Aggregation and Secure Aggregation. [Aggre_1] Section 12 Why do we need Aggregation? Sensor networks – Event-based Systems Example Query: –What is the.
On Mobile Sink Node for Target Tracking in Wireless Sensor Networks Thanh Hai Trinh and Hee Yong Youn Pervasive Computing and Communications Workshops(PerComW'07)
Network Dynamics and Simulation Science Laboratory Structural Analysis of Electrical Networks Jiangzhuo Chen Joint work with Karla Atkins, V. S. Anil Kumar,
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Continuous Density Queries for Moving Objects
Aggregation.
Efficient Processing of Top-k Spatial Preference Queries
Lu Tang , Qun Huang, Patrick P. C. Lee
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

Distributed Indexing and Querying in Sensor Networks using Statistical Models Arnab Bhattacharya Indian Institute of Technology (IIT), Kanpur

Jul 17, 2008CS, ULB2 Wireless sensor networks “Sensor” is a tiny, cheap communicating device with limited memory, communication bandwidth and battery life –Communication is precious Provides monitoring of physical phenomena Wireless sensor network (WSN): a collection of such sensors –Enables spatio-temporal monitoring of events –Inter-communication among neighboring sensors –Base station as a centralized point of entry

Jul 17, 2008CS, ULB3 Semantic modeling Uses of WSNs –How many rooms are occupied? –Is there a fire in any room? –What is the pattern of birds’ movements? Low-level individual sensor readings do not provide semantics Content summarization by modeling Which models to use? Where and when to model?

Jul 17, 2008CS, ULB4 Outline Semantic modeling –Which models to use? –Where and when to build the models? MIST: An index structure Query algorithms Experiments Conclusions

Jul 17, 2008CS, ULB5 How to model? Zebranet Track movement of zebras by velocity sensors Three discrete states: –Grazing (G) –Walking (W) –Fast-moving (F) Zebras’ behavior by state sequence –G W W W W F F G G, G G F F F W W W

Jul 17, 2008CS, ULB6 Statistical models Markov Chain (MC) –Provides inference about behavior in general –τ: transition probabilities –π: start state probabilities Hidden Markov Model (HMM) –Try to infer the causes of such behavior –ξ: emission probabilities Use of either model depends on the context Zebra Mobility: HMM Zebra Mobility: MC

Jul 17, 2008CS, ULB7 When and where: Queries Identify interesting behaviors in the network –Example: Identify all zebras (sensors) that observed the behavior pattern FFFF with likelihood > 0.8 May denote possible predator attack Sequence queries –Range query: Return sensors that observed a particular behavior with likelihood > threshold –Top-1 query: Which sensor is most likely to observe a given behavior? Model queries –1-NN query: Which sensor is most similar to a given pattern (model)?

Jul 17, 2008CS, ULB8 Centralized solution Each sensor –Builds a model –Transmits the model to the base station (BS) Queries come to BS BS answers them –No query communication Each update in a sensor is transmitted –Huge update costs

Jul 17, 2008CS, ULB9 Slack-based centralized solution To save update costs Introduce slack locally at each sensor No update if new parameter is within slack of old parameter –Update costs reduced BS knows slack –Finds range for likelihood from each sensor –If cannot be answered by cached models, then query transmitted to the sensor –Query communication costs are introduced

Jul 17, 2008CS, ULB10 Outline Semantic modeling MIST: An index structure –Correlation among models –Composition of models –Hierarchical aggregation of index –Dynamic maintenance Query algorithms Experiments Conclusions

Jul 17, 2008CS, ULB11 MIST (Model-based Index Structure) Overlay a tree on the network Each sensor trains a model (MC/HMM) using observed sequences Aggregation of child models into parent using correlation among models Two types of composite models Bottom-up aggregation of index models Update in models handled by slack

Jul 17, 2008CS, ULB12 Correlation among models Models λ 1,..., λ m are (1- ε)-correlated if for all corresponding parameters σ 1,...,σ m : ε →0: High correlation –Models are similar

Jul 17, 2008CS, ULB13 Outline Semantic modeling MIST: An index structure –Correlation among models –Composition of models –Hierarchical aggregation of index –Dynamic maintenance Query algorithms Experiments Conclusions

Jul 17, 2008CS, ULB14 Average index model λ avg maintains –Average of all corresponding parameters: –ε’: Correlation parameter between λ avg and any λ i –β max, β min : maximum and minimum of all parameters from constituent models

Jul 17, 2008CS, ULB15 Min-max index models λ min and λ max maintains –Minimum and maximum of all corresponding parameters: –No extra parameter

Jul 17, 2008CS, ULB16 Comparison Statistical properties –Average: Valid statistical models Transition and start state probabilities add up to 1 –Min-max: Pseudo-models Probabilities, in general, do not add up to 1 Parameters –Average: 3 extra parameters Total n+3 parameters –Min-max: no extra parameter Total 2n parameters

Jul 17, 2008CS, ULB17 Outline Semantic modeling MIST: An index structure –Correlation among models –Composition of models –Hierarchical aggregation of index –Dynamic maintenance Query algorithms Experiments Conclusions

Jul 17, 2008CS, ULB18 Hierarchical index Average model –Correlation parameter ε’ Correlation gets reduced –β max (β min ) Maximum (minimum) of β max (β min ) ’s of children –Bounds become larger Min- (max-) model –Aggregation of min- (max-) model parameters –Min (max) becomes smaller (larger)

Jul 17, 2008CS, ULB19 Dynamic maintenance Observations and therefore models change Slack parameter δ Models re-built with period d Last model update time u No update if λ (t+d) is within (1- δ) correlation with λ (u) Correlation parameter ε slack maintained in the parent as Hierarchical index construction assumes ε slack

Jul 17, 2008CS, ULB20 Outline Semantic modeling MIST: An index structure Query algorithms –Sequence queries –Model queries Experiments Conclusions

Jul 17, 2008CS, ULB21 Queries Sequence queries –Query sequence of symbols: q = q 1 q 2...q k –Range query: Return sensors that have observed q with a probability > χ –Top-1 query: Given q, return the sensor that has the highest probability of observing q Model queries –Query model: Q = {π,τ} –1-NN query: Return the sensor model that is most similar to Q

Jul 17, 2008CS, ULB22 Range query Probability of observing q from λ is –q is of length k –σ i is the i th parameter in P(q| λ) –For MC λ = {π,τ}, –For HMM, P(q| λ) is calculated as a sum along all possible state paths, each having 2k terms Idea is to bound every parameter σ i separately

Jul 17, 2008CS, ULB23 Bounds Average model –Use of δ and ε slack to correct for changes after the last update –Therefore, bounds for P(q| λ) are Min-max model

Jul 17, 2008CS, ULB24 Top-1 query For any internal node –Each subtree has a lower bound and an upper bound of observing q –Prune a subtree if its lower bound is higher than upper bound of some other subtree Guarantees that best answer is not in this subtree Requires comparison of bounds across subtrees Pruning depends on dissimilarity of subtree models

Jul 17, 2008CS, ULB25 Model (1-NN) query Requires notion of distance between models Euclidean distance or L 2 norm –Corresponding parameters are considered as dimensions Straightforward for MCs For HMMs, state correspondence needs to be established –Domain knowledge –Matching

Jul 17, 2008CS, ULB26 Average models M-tree like mechanism –1-nearest-neighbor (1-NN) query “Model distance” space is a metric space Topology is the overlaid communication tree Average model maintains radius as largest possible distance to any model in the subtree For each parameter

Jul 17, 2008CS, ULB27 Min-max models R-tree like mechanism –1-nearest-neighbor (1-NN) query “Model parameter” space is a vector space Topology is the overlaid communication tree For each parameter σ i, there is a lower (σ i min.(1-δ)) and an upper bound (σ i max /(1-δ)) The min-max models thus form a bounding rectangle –Similar to MBRs

Jul 17, 2008CS, ULB28 “Curse of dimensionality” Dimensionality = number of model parameters No “curse” for sequence queries –Each index model computes two bounds of P(q|λ) –Pruning depends on whether χ (threshold) falls within these bounds –Bounds are real numbers between 0 and 1 –Single dimensional space – probability line “Curse” exists for model queries –R-tree, M-tree like pruning on parameter space

Jul 17, 2008CS, ULB29 Outline Semantic modeling MIST: An index structure Query algorithms Experiments –Experimental setup –Effects of different parameters –Fault-tolerance Conclusions

Jul 17, 2008CS, ULB30 Optimal slack Large slack minimizes updates but querying cost goes up Reverse for small slack Optimal can be chosen by analyzing expected total costs Non-linear optimization –Difficult for local nodes –Almost impossible over the entire network –Changes in the models require re-computation Experimental method

Jul 17, 2008CS, ULB31 Fault-tolerance Periodic heartbeat messages from child to parent –Extra messages When parent fails or child-parent link fails –Child finds another parent –Sends model parameters –Model, correlation, etc. is calculated afresh in parent When node or link comes up –Child switches to original parent –Old parent notified –Parents update their models, correlation, etc.

Jul 17, 2008CS, ULB32 Outline Semantic modeling MIST: An index structure Query algorithms Experiments –Experimental setup –Effects of different parameters –Fault-tolerance Conclusions

Jul 17, 2008CS, ULB33 Experimental setup Two datasets –Real dataset Laboratory sensors Temperature readings Readings for every 30s for 10 days 4 rooms, each having 4 sensors States: C (cold, 27°C) –Synthetic dataset Network size varied from 16 to 512 State size varied from 3 to 11 Correlation parameter ε varied from to 0.5 Both MCs and HMMs Metric to measure –Communication cost in bytes

Jul 17, 2008CS, ULB34 Compared techniques Centralized with no slack –Node transmits all updates to BS –Zero querying cost Centralized with slack –Node maintains slack –Query sent to sensor nodes if cached models at BS cannot answer MIST schemes –Average/min-max models –With/without slack

Jul 17, 2008CS, ULB35 Effect of query rate Slack-based schemes win at small query rates Centralized scheme with no slack is the best at very high query rates

Jul 17, 2008CS, ULB36 Update costs No-slack schemes have almost double costs MIST’s slack schemes are better since updates are pruned at every level in the hierarchy

Jul 17, 2008CS, ULB37 Query costs Costs increase with decreasing correlation (1-ε) At high correlation (low ε), no-slack schemes (including centralized) perform better

Jul 17, 2008CS, ULB38 Optimal slack Minimum exists for MIST’s schemes Centralized: Due to low query rate, update costs dominated over querying costs

Jul 17, 2008CS, ULB39 Network size No-slack schemes are better Querying cost increases due to higher bounds and longer path lengths to leaf nodes

Jul 17, 2008CS, ULB40 Number of states: update costs Update costs increase with number of states MIST schemes are scalable due to hierarchical pruning

Jul 17, 2008CS, ULB41 Number of states: query costs Querying cost decreases –Each model parameter σ decreases –Probability of observing q, i.e., P(q|λ) decreases –Therefore, bounds decrease

Jul 17, 2008CS, ULB42 Number of states: total costs For sequence queries, no “curse of dimensionality”

Jul 17, 2008CS, ULB43 Number of states: model query For model queries, “curse of dimensionality” sets in –Scalable up to reasonable state sizes

Jul 17, 2008CS, ULB44 Fault-tolerance experiments Costs increase moderately due to parent switching Scalable with probability of failure

Jul 17, 2008CS, ULB45 Outline Semantic modeling MIST: An index structure Query algorithms Experiments Conclusions –Future work

Jul 17, 2008CS, ULB46 Conclusions A hierarchical in-network index structure for sensor networks using statistical models Hierarchical model aggregation schemes –Average model –Min-max models Queries –Sequence queries –Model query Experiments –Better than centralized schemes in terms of update, querying and total communication costs –Scales well with network size and number of states

Jul 17, 2008CS, ULB47 Future work How to overlay the tree? –Similar models should be in the same subtree “Quality” of tree –Distributed solutions –What happens when models are updated? Fault-tolerance –How to find the best parent during faults? –Whether to switch back or stay after recovery –How to replicate information in siblings? Deployment