Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by:

Slides:



Advertisements
Similar presentations
Mobile Robot Localization and Mapping using the Kalman Filter
Advertisements

RFID Data Aggregation Dritan Bleco, Yannis Kotidis Department of Informatics Athens University of Economics and Business.
Hybrid Context Inconsistency Resolution for Context-aware Services
1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
A Deferred Cleansing Method for RFID Data Analytics IBM Almaden Research Center Jun Rao Sangeeta Doraiswamy Latha S. Colby University of California at.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Design Considerations for High Fan-in Systems: The HiFi Approach Presented by Shawn Jeffery CIDR‘05 1/7/05 Michael J. Franklin, Shawn R. Jeffery, Sailesh.
BYU Auxiliary Antenna Assisted Interference Cancellation for Radio Astronomy Imaging Arrays Brian Jeffs and Karl Warnick August 21, 2002.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,
Adaptive Sampling for Sensor Networks Ankur Jain ٭ and Edward Y. Chang University of California, Santa Barbara DMSN 2004.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
GTI 2007/08 H.Galhardas Achieving Data Quality with AJAX (first version of AJAX designed and developed at INRIA Rocquencourt, France)
Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams Amit Manjhi, Suman Nath, Phillip B. Gibbons Carnegie Mellon University.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Declarative Support for Sensor Data Cleaning Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch.
The Enforcer Laura Celentano Glenn Ramsey Michael Szalkowski.
Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.
Probabilistic Databases Amol Deshpande, University of Maryland.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
HiFi: Network-centric Query Processing in the Physical World SAP Research Forum February 2005 Mike Franklin UC Berkeley.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Adaptive Cleaning for RFID Data Streams VLDB /12/06 Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley.
RFID Shelving Final Project: 19 Mar 2007 Guy Shtub Idit Gershoni.
RFID Object Localization Gabriel Robins and Kirti Chawla Department of Computer Science University of Virginia
Adaptive Signal Processing Class Project Adaptive Interacting Multiple Model Technique for Tracking Maneuvering Targets Viji Paul, Sahay Shishir Brijendra,
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
ETL By Dr. Gabriel.
Developing RFID Application In Supply Chain
Introduction to Adaptive Digital Filters Algorithms
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
1 A Local and Remote Radio Frequency Identification Learning Environment Andrew Shields & David Butcher Wireless and Mobility Research Group, Institute.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
Associative Pattern Memory (APM) Larry Werth July 14, 2007
Mapping and Localization with RFID Technology Matthai Philipose, Kenneth P Fishkin, Dieter Fox, Dirk Hahnel, Wolfram Burgard Presenter: Aniket Shah.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Network Computing Laboratory HiFi Systems: Network-Centric Query Processing for the Physical World Michael J. Franklin, Shawn R. Jeffrey, et al UC Berkeley.
1 Robust Statistical Methods for Securing Wireless Localization in Sensor Networks (IPSN ’05) Zang Li, Wade Trappe Yanyong Zhang, Badri Nath Rutgers University.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Doc.: IEEE /0553r1 Submission May 2009 Alexander Maltsev, Intel Corp.Slide 1 Path Loss Model Development for TGad Channel Models Date:
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
A paper by: Paul Kocher, Joshua Jaffe, and Benjamin Jun Presentation by: Michelle Dickson.
Performance of Adaptive Beam Nulling in Multihop Ad Hoc Networks Under Jamming Suman Bhunia, Vahid Behzadan, Paulo Alexandre Regis, Shamik Sengupta.
Jin Yan Embedded and Pervasive Computing Center
Adaptive Cleaning for RFID Data Streams. RFID: Radio Frequency IDentification.
Information Integration 15 th Meeting Course Name: Business Intelligence Year: 2009.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.
Authors: Soamsiri Chantaraskul, Klaus Moessner Source: IET Commun., Vol.4, No.5, 2010, pp Presenter: Ya-Ping Hu Date: 2011/12/23 Implementation.
Pervasive Computing MIT SMA 5508 Spring 2006 Larry Rudolph 1 Tracking Indoors.
Dynamic Neural Network Control (DNNC): A Non-Conventional Neural Network Model Masoud Nikravesh EECS Department, CS Division BISC Program University of.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Introduction to emulators Tony O’Hagan University of Sheffield.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
RF2ID: A Reliable Middleware Framework for RFID Deployment
Effective Data-Race Detection for the Kernel
Image Based Modeling and Rendering (PI: Malik)
Load Shedding in Stream Databases – A Control-Based Approach
Adaptive Cleaning for RFID Data Streams
RFID Object Localization
Presentation transcript:

Adaptive Cleaning for RFID Data Streams Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley Presented by: Hamid Haidarian Shahri

Where Are We? Look at the Signs!

Looking at Signs – Before Jumping In S. Chaudhuri, U. Dayal, "An Overview of Data Warehousing and OLAP Technology," SIGMOD Record,  800+ citations DW and information integration “Data cleaning” term publicized  Identified its importance in integration Extensive research followed

VLDB 2001 Session R12: DATA QUALITY & CLEANING Declarative data cleaning: language, model, and algorithms Helena Galhardas (INRIA Rocquencourt), Daniela Florescu (Propel), Dennis Shasha (NYU), Eric Simon, and Cristian- Augustin Saita (INRIA Rocquencourt) Potter's wheel: an interactive data cleaning system Vijayshankar Raman and Joseph M. Hellerstein (University of California at Berkeley) Update propagation strategies for improving the quality of data on the Web Alexandros Labrinidis and Nick Roussopoulos (University of Maryland)

Data Cleaning Previous Work Hamid Haidarian Shahri, S.H. Shahri, “Eliminating Duplicates in Information Integration: An Adaptive, Extensible Framework," IEEE Intelligent Systems, Vol. 21, No. 5, 2006.

Putting Things into Context Data cleaning required after integration  No unified standard across sources  NOW: sensor/hardware errors inevitable; research opportunity Data modeling (Amol Deshpande)  An important use case is cleaning

VLDB 2006 – Three weeks ago Research Session 5: Sensor Data (dedicated to cleaning!) Title: Adaptive Cleaning for RFID Data Streams  Authors: Shawn R. Jeffery, Minos Garofalakis, Michael J. Franklin Title: A Deferred Cleansing Method for RFID Data Analytics  Authors: Jun Rao, Sangeeta Doraiswamy, Hetal Thakkar, Latha S. Colby Title: Online Outlier Detection in Sensor Data Using Non- Parametric Models  Authors: Sharmila Subramaniam, Themis Palpana, Dimitris Papadopoulos, Vana Kalogeraki, Dimitrios Gunopulos

RFID: Radio Frequency IDentification

RFID data is dirty A simple experiment: 2 RFID-enabled shelves 10 static tags 5 mobile tags

RFID Data Cleaning Time Raw readings Smoothed output RFID data has many dropped readings Typically, use a smoothing filter to interpolate SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id SELECT distinct tag_id FROM RFID_stream [RANGE ‘5 sec’] GROUP BY tag_id But, how to set the size of the window? But, how to set the size of the window? Smoothing Filter

Window Size for RFID Smoothing Fido movingFido resting Small window Reality Raw readings Large window  Need to balance completeness vs. capturing tag movement

Truly Declarative Smoothing Problem: window size non-declarative  Application wants a clean stream of data  Window size is how to get it Solution: adapt the window size in response to data

Itinerary Introduction: RFID data cleaning A statistical sampling perspective SMURF  Per-tag cleaning  Multi-tag cleaning Ongoing work Conclusions

A Statistical Sampling Perspective Key Insight: RFID data  random sample of present tags Map RFID smoothing to a sampling experiment

RFID’s Gory Details EpochTagIDReadRate Tag 1 Tag 2 Tag 3 Tag 4 Antenna & reader Tags E1E2E3E4E5E6E7E8E9E0 Read Cycle (Epoch) (For Alien readers) Tag List

RFID Smoothing to Sampling RFIDSampling Read cycle (epoch)Sample trial ReadingSingle sample Smoothing windowRepeated trials Read rateProbability of inclusion (p i )  Now use sampling theory to drive adaptation!

SMURF Statistical Smoothing for Unreliable RFID Data Adapts window based on statistical properties Mechanisms for: Per-tag and multi-tag cleaning

Per-Tag Smoothing: Model and Background Use a binomial sampling model Time (epochs) pipi 1 0 Smoothing Window w i Bernoulli trials p i avg SiSi (Read rate of tag i) E1E2E3E4E5E6E7E8E9E0

Per-Tag Smoothing: Completeness If the tag is there, read it with high probability  Want a large window pipi 1 0 Reading with a low p i Expand the window Time (epochs) E1E2E3E4E5E6E7E8E9E0

Per-Tag Smoothing: Completeness Expected epochs needed to read With probability 1-  Desired window size for tag i

Per-Tag Smoothing: Transitions Detect transitions as statistically significant changes in the data pipi 1 0 Statistically significant difference Flag a transition and shrink the window The tag has likely left by this point Time (epochs) E1E2E3E4E5E6E7E8E9E0

Per-Tag Smoothing: Transitions # expected readings Is the difference “statistically significant”? # observed readings Statistically significantStatistically significant

SMURF in Action Fido movingFido resting SMURF  Experiments with real and simulated data show similar results

Multi-tag Cleaning Some applications only need aggregates  E.g., count of items on each shelf  Don’t need to track each tag! Use statistical mechanisms for both:  Aggregate computation  Window adaptation

Aggregate Computation  –estimators (Horvitz-Thompson) Count: P[tag i seen in a window of size w]:  Use small windows to capture movement  Use the estimator to compensate for lost readings

Window Adaptation Upper bound window similar to per-tag “Transition” based on variance within subwindows Count NwNw Nw’Nw’ Time (epochs) E1E2E3E4E5E6E7E8E9E0

Multi-tag Scenario

Ongoing Work: Spatial Smoothing With multiple readers, more complicated Reinforcement  A? B? A U B? A B? Arbitration  A? C?  All are addressed by statistical framework! U A B C D Two rooms, two readers per room

Beyond RFID  -estimator for other aggregates  Use SMURF for sensor networks Use SMURF in general streaming systems (e.g., TelegraphCQ)  Remove RANGE clause from CQL Other sensor data Other streaming data

Related Work Commercial RFID middleware  Smoothing filters: need to set smoothing window RFID-related work  Rao et al., StreamClean: complementary  Intel Seattle, HiFi, ESP: static window size BBQ, MauveDB  Heavyweight, model-based  SMURF is non-parametric, sampling-based Statistical filters (digital signal processing & DB)  Non-linear digital filters inspired SMURF design

Conclusions Current smoothing filters not adequate Not declarative! SMURF: Declarative smoothing filter Uses statistical sampling to adapt window size

Thanks! Questions?