Zhao Cao*, Charles Sutton +, Yanlei Diao*, Prashant Shenoy* * University of Massachusetts, Amherst + University of Edinburgh Distributed Inference and.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Fast Algorithms For Hierarchical Range Histogram Constructions
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.
A Survey on Tracking Methods for a Wireless Sensor Network Taylor Flagg, Beau Hollis & Francisco J. Garcia-Ascanio.
Improving Forecast Accuracy by Unconstraining Censored Demand Data Rick Zeni AGIFORS Reservations and Yield Management Study Group May, 2001.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
1 Vertically Integrated Seismic Analysis Stuart Russell Computer Science Division, UC Berkeley Nimar Arora, Erik Sudderth, Nick Hay.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.
SEBD Tutorial, June Monitoring Distributed Streams Joint works with Tsachi Scharfman, Daniel Keren.
Phylogenetic Trees Presenter: Michael Tung
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science SPIRE: Scalable Processing of RFID Event Streams Yanlei Diao University of Massachusetts,
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Load-Reuse Analysis design and evaluation Rastislav Bodík Rajiv Gupta Mary Lou Soffa.
Probabilistic Databases Amol Deshpande, University of Maryland.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Inferences About Process Quality
Algorithm: For all e E t, define X e = {w e if e G t, 1 - w e otherwise}. Measure likelihood of substructure S by. Flag S as anomalous if, where is an.
Department of Computer Science University of Massachusetts, Amherst PRESTO: Feedback-driven Data Management in Sensor Network Ming Li, Deepak Ganesan,
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
New Challenges in Cloud Datacenter Monitoring and Management
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Optimal Placement and Selection of Camera Network Nodes for Target Localization A. O. Ercan, D. B. Yang, A. El Gamal and L. J. Guibas Stanford University.
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Success status, page 1 Collaborative learning for security and repair in application communities MIT & Determina AC PI meeting July 10, 2007 Milestones.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
1 Mining surprising patterns using temporal description length Soumen Chakrabarti (IIT Bombay) Sunita Sarawagi (IIT Bombay) Byron Dom (IBM Almaden)
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
User Study Evaluation Human-Computer Interaction.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 AdaBoost.. Binary Classification. Read 9.5 Duda,
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Data Mining and Decision Support
HASE: A Hybrid Approach to Selectivity Estimation for Conjunctive Queries Xiaohui Yu University of Toronto Joint work with Nick Koudas.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Simulation. Types of simulation Discrete-event simulation – Used for modeling of a system as it evolves over time by a representation in which the state.
An Offline Approach for Whole-Program Paths Analysis using Suffix Arrays G. Pokam, F. Bodin.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 8 1 MER301: Engineering Reliability LECTURE 8: Chapter 4: Statistical Inference,
Confidence Intervals Cont.
PREGEL Data Management in the Cloud
Markov ó Kalman Filter Localization
Infer: A Bayesian Inference Approach towards Energy Efficient Data Collection in Dense Sensor Networks. G. Hartl and B.Li In Proc. of ICDCS Natalia.
I. Statistical Tests: Why do we use them? What do they involve?
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Zhao Cao*, Charles Sutton +, Yanlei Diao*, Prashant Shenoy* * University of Massachusetts, Amherst + University of Edinburgh Distributed Inference and Query Processing for RFID Tracking and Monitoring

2 Applications of RFID Technology RFID readers

3 Tag id: EF.0A Time: , 14:30:00 Manufacturer: X Ltd. Expiration date: Oct 2011 RFID Deployment on a Global Scale + Tag id: EF.0A Reader id: 3478 Time: , 06:10:00 Tag id: EF.0A Reader id: 5140 Time: :15:00 Tag id: EF.0A Reader id: 6647 Time: :00:00 Tag id: EF.0A Reader id: 7990 Time: , 09:10:00 Tag id: EF.0A Reader id: 5140 Time: , 12:40:00

4 Tracking and Monitoring Queries Path Queries: - List the path taken by an item through the supply chain. - Report if a pallet has deviated from its intended path. Path Queries: - List the path taken by an item through the supply chain. - Report if a pallet has deviated from its intended path. Containment Queries: - Alert if a flammable item is not packed in a fireproof case. - Verify that food containing peanuts is never exposed to other food cases. Containment Queries: - Alert if a flammable item is not packed in a fireproof case. - Verify that food containing peanuts is never exposed to other food cases. Hybrid Queries: - For any frozen food placed outside a cooling box, alert if it has been exposed to room temperature for 6 hours. Hybrid Queries: - For any frozen food placed outside a cooling box, alert if it has been exposed to room temperature for 6 hours. Object locations and history Containment among items, cases, pallets Sensor data Location Containment

5 Challenges in RFID Data Stream Processing Q1: For any frozen food placed outside a cooling box, raise an alert if it has been exposed to room temperature for 6 hours. (time, location, temperature) Sensor Stream (time, tag_id, reader_id) RFID Stream 2. RFID Data is incomplete and noisy. 1. RFID data streams are not queriable (no location or containment info) FED Locations: FED 4 Missing Overlapped 3. Scale inference and query processing to numerous sites and millions of objects.

6 A Scalable, Distributed RFID Stream Processing System (time, tag_id, reader_id) Raw RFID Stream Location & Containment Inference (time, tag_id, location, container) Queriable RFID Stream Distributed Query Processing Monitoring result(time, tag_id, query result) Distributed Loc. & Cont. Inference

7 I. Location and Containment Inference – Intuition Containment Inference: Co-location history t=4 FED Time t=1 Reader location: A t=2 BC t=3 DEC Items Cases Location Inference: Smoothing over containment Item 5 is contained in case 2Case 2 is in Location C at t=3 Item 6 is contained in case 2 Iterative procedure Containment Changes: Change point detection Containment between case 1 and item 4 has changed

8 (1) Our Probabilistic Graphical Model C T Sensor model Containment (0 or 1) R R  Hidden variables : true object and container locations  Evidence variables: RFID readings  Independency assumptions: Independence among containers Independence over time  RFID sensor model: read rate, overlap rate  Joint probability:  RFID sensor model:  Containment: edges between hidden variables

9 Current guess about the containment relations (2) Location and Containment Inference using EM  An iterative algorithm in the EM framework: M-Step: (customized) Posterior of each container’s location Choose the best containment relation  Log likelihood Function of containment C E-Step:.

10 (3) Change Point Detection -- Intuition  A statistical approach based on hypothesis testing Null hypothesis: no containment change in [0, T]. Alternative hypothesis: containment change at time t’, 0 ≤ t’ ≤ T t=4 FED Time t=1 Reader location: A t=2 BC t=3 DEC Items Cases If Δ is over a threshold δ, a change; otherwise, no change. δ is obtained by offline sampling hypothetical observation sequences from the model with stable containment (e.g., using the max likelihood).

11 (5) Implementation and Optimizations T C Sensor model Containment (0 or 1) R R  Both E-step and M-step have high complexity O(TCOR 2 )  Inference is run every few seconds  Change point detection: Runs at the end of each inference Sums up quantities memorized in inference, little extra overhead O(TCOR 2 ) O(TC+TO) O(C+O)  Optimizations: Location restriction: each object is read in a few locations Containment restriction: a container includes a small set of objects Candidate pruning: for an object, consider only containers observed frequently in the first few epochs and in several recent epochs History truncation: further eliminate the factor of T Memoization: reuse values from the previous iteration of EM

12 II. Distributed Processing with State migration SELECT tag_id, A[].temp FROM ( SELECTRSTREAM(R.tag_id, R.loc, T.temp) FROMRFIDStream [NOW] as R, TempStream [PARTITION BY sensor_id ROW 1] AS T WHERE (R.container != ‘cooling box’ or R.container = NULL) and R.loc = T.loc and T.temp > 10°C ) AS Global Stream S [ PATTERN SEQ(A+) WHERE A[i].tag_id = A[1].tag_id and A[A.len].time > A[1].time+6 hrs ] Local Processing Global Processing Global Proc. Local Proc. Query processing Inference Site 2 Site 3Site1 State migration Object events (tag,loc,cont,…) RFID readings (tag,reader,time) Sensor readings Query: Raise an alert if a frozen product has been placed outside a cooling box for 6 hours.

13 Minimize Inference State – History Truncation t=0~90 Entry door Belt Shelf A Shelf B t=100~105 t=120~200 Time R NRC NRNC Strength of co-location in M-Step in inference:  Periodically find a critical region, CR, over history.  Later inference runs on (CR + recent history H’).  When an object leaves a site, compress CR to a single weight (co-location strength) to minimize state.

14 Minimize Query Processing State via Sharing  Global query processing A query state for each object As an object leaves a site, transfer the query state to the next  Query state for each object: e.g., Current automaton state Values for future predicate evaluation in automaton execution Values that the query returns SELECT tag_id, A[].temp FROM ( … ) AS Global Stream S [ PATTERN SEQ(A+) WHERE A[i].tag_id = A[1].tag_id and A[A.len].time > A[1].time+6 hrs ]  Volume: kilobytes or more per object per query

15 Minimize Query Processing State via Sharing  Global query processing A query state per object per query As an object leaves a site, transfer the query state to the next  Sharing query states based on stable containment At the exit, objects in a container have the same location and container (but possibly different histories) Share their query states using a centroid-based method Find the most representative query state Compress other similar query states by storing only the differences [1,2,3,4…] Query states before compressionQuery states after compression

16 Implementation and Evaluation  Implemented inference, distributed inference, and distributed query processing  Instrumented an RFID lab in a warehouse setting  Developed a simulator for a network of warehouses Number of warehouses (N): 1-10 Frequency of pallet injection: 1 every 60 seconds Cases per pallet: 5 Items per case: 20 Main read rate of readers (RR): [0.6,1], default 0.8 Overlap rate for shelf readers (OR): [0.2,0.8], default 0.5 Non-shelf reader frequency: 1 every second Shelf reader frequency: 1 every 10 seconds Frequency of anomalies (FA): 1 every 10 to 120 seconds

17 Single Site, Stable Containment Three methods: history truncation (CR), simple windowing (W), naïve (all history) Metrics: accuracy of location and containment inference, time cost of inference  All three methods offer high accuracy for location.  Simple windowing has poor accuracy for containment inference.  Using all history hurts performance.  History truncation (CR) is best in accuracy and performance, insensitive to trace length.

18 Evaluation of a Lab RFID Deployment Trace settings: T1: RR=0.85, OR=0.25 T2: RR=0.85, OR=0.5 T3: RR=0.7, OR=0.25 T4: RR=0.7, OR=0.5 T5 to T8 extend T1 to T4 with 3 items moved across cases, 1 item removed Improved SMURF (window-based temporal smoothing) w. containment inference and change detection  Our algorithm: (1) Location error rates are low. (2) Containment error rates are low with stable containment. (3) Containment changes cause more errors, especially given more noise (lower rate rates or higher overlap rates).  SMURF: much more errors. Simple temporal smoothing has missed opportunities.

19 Results for Distributed Inference w. State Migration Experiment setting: 10 warehouses, each with up to 150,000 items, totaling 1.5 million items Compared algorithms: State Migration (CR), No State Migration (none), and Centralized bytesCentralizedNoneCR RR=0.6125,895, ,890 RR=0.7145,858, ,790 RR=0.8166,746, ,890 RR=0.9187,589, ,890  The naïve method with no state-transfer has a high error rate.  The centralized method incurs a huge amount of data to be transferred. Our method (CR) performs close to the centralized method in accuracy but with x830 reduction in communication cost.

20 Results for Distributed Query Processing The overall accuracy (F-measure) of query results is high (>89%). Query state sharing yields up to 10x reduction in query state size. The accuracy and query state reduction ratio of Q1 are lower than those of Q2, because Q1 combines location and containment while Q2 uses only inferred location. Q1: reports the frozen food that has been placed outside a cooling box for 3 hours. Q2: reports the frozen food that has been exposed to temperature over 10 degrees for 10 hours. RR=0.6RR=0.7RR=0.8RR=0.9 Q1F-measure(%) State w/o sharing (bytes)65,50066,00067,03767,000 State w sharing (bytes)6,9865,7375,5895,156 Q2F-measure(%) State w/o sharing (bytes)80,24885,51087,02987,000 State w sharing (bytes)7,2966,1085,3415,273

21 Summary and Future Work  Summary: Novel inference techniques that provide accurate estimates of object locations and containment relationships in noisy, dynamic environments. Distributed inference and query processing techniques that minimize the computation state transferred. Our experimental results demonstrated the accuracy, efficiency, and scalability of our techniques, and superiority over existing methods.  Future work: Exploit local tag memory for distributed inference, such as utilizing aggregate tag memory and fault tolerance. Extend work to probabilistic query processing. Explore smoothing over object (entity) relations in other data cleaning problems.

22