Efficiently Collecting Histograms Over RFID Tags Lei Xie, Hao Han, Qun Li, Jie Wu, and Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing.

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Adaptive Accurate Indoor-Localization Using Passive RFID Xi Chen, Lei Xie, Chuyu Wang, Sanglu Lu State Key Laboratory for Novel Software Technology Nanjing.
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 7.3 Secure and Resilient Location Discovery in Wireless.
9.1 confidence interval for the population mean when the population standard deviation is known
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Fast and Reliable Estimation Schemes in RFID Systems Murali Kodialam and Thyaga Nandagopal Bell Labs, Lucent Technologies.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Every Bit Counts – Fast and Scalable RFID Estimation Muhammad Shahzad and Alex X. Liu Dept. of Computer Science and Engineering Michigan State University.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Chapter 19 Confidence Intervals for Proportions.
New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.
Cost-Based Plan Selection Chapter 16 Section 5 ID: 213.
Chapter 8 Confidence Intervals 8.1 Confidence Intervals about a Population Mean,  Known.
Sociology 601: Class 5, September 15, 2009
Evaluating Hypotheses
RFID Cardinality Estimation with Blocker Tags
Inference on averages Data are collected to learn about certain numerical characteristics of a process or phenomenon that in most cases are unknown. Example:
Chapter 7 Inferences Regarding Population Variances.
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Review of normal distribution. Exercise Solution.
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Fast and Reliable Estimation Schemes in RFID Systems Murali Kodialam and Thyaga Nandagopal Bell Labs, Lucent Technologies Presented by : Joseph Gunawan.
Objectives (BPS chapter 14)
Aim: How do we find confidence interval? HW#9: complete question on last slide on loose leaf (DO NOT ME THE HW IT WILL NOT BE ACCEPTED)
CONGRESSIONAL SAMPLES FOR APPROXIMATE ANSWERING OF GROUP-BY QUERIES Swarup Acharya Phillip Gibbons Viswanath Poosala ( Information Sciences Research Center,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 6 Sampling Distributions.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
© 2003 Prentice-Hall, Inc.Chap 6-1 Business Statistics: A First Course (3 rd Edition) Chapter 6 Sampling Distributions and Confidence Interval Estimation.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
PARAMETRIC STATISTICAL INFERENCE
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Yafeng Yin 1, Lei Xie 1, Jie Wu 2, Athanasios V. Vasilakos 3, Sanglu Lu 1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Statistical Methods II: Confidence Intervals ChE 477 (UO Lab) Lecture 4 Larry Baxter, William Hecker, & Ron Terry Brigham Young University.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
RFID E STIMATION P ROBLEM Lee, Gunhee S URVEY. R EFERENCES Energy Efficient Algorithms for the RFID Estimation Problem –Tao Li, Samuel Wu, Shigang Chen.
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Estimates and Sample Sizes Chapter 6 M A R I O F. T R I O L A Copyright © 1998,
CONFIDENCE INTERVALS.
Inference: Probabilities and Distributions Feb , 2012.
Sampling Fundamentals 2 Sampling Process Identify Target Population Select Sampling Procedure Determine Sampling Frame Determine Sample Size.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Every Bit Counts – Fast and Scalable RFID Estimation
1 Estimation of Population Mean Dr. T. T. Kachwala.
1 Probability and Statistics Confidence Intervals.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Chapter 7: The Distribution of Sample Means
Hw 1 Prob 1 A sample of 20 cigarettes is tested to determine nicotine content and the average value observed was 1.2 mg. Compute a 99 percent two-sided.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Estimating the Value of a Parameter Using Confidence Intervals
ESTIMATION.
04/10/
Some General Concepts of Point Estimation
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Chapter 8: Inference for Proportions
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Estimating the Value of a Parameter Using Confidence Intervals
Retrieval Performance Evaluation - Measures
Presentation transcript:

Efficiently Collecting Histograms Over RFID Tags Lei Xie, Hao Han, Qun Li, Jie Wu, and Sanglu Lu State Key Laboratory for Novel Software Technology, Nanjing University, China Department of Computer Science, College of William and Mary, USA Department of Computer Information and Sciences, Temple University, USA Presenter : Dr. Lei Xie Associate Professor , Nanjing University, China

Outline

Applications with Passive RFID RFID Tag Only some useful statistical information is essential to be collected: Overall tag size Popular categories Histogram

Histogram Collection Basic Histogram CollectionLong Tail

Advanced Histogram Collection Iceberg Query over HistogramTop-k Query over Histogram

Application scenarios – Approximately show me the number of books for each category in a large bookshelf? Standard deviation

Application scenarios – Approximately show me those categories with the quantity above 15 in supermarket shelves? Find popular merchandise and control stock, e.g., automatically signaling when more products need to be put on the shelf.

Histogram Collection – In most applications, the tags are frequently moving into and out of the effective scanning area. – Traditional tag identification is not suitable for histogram collection Scanning time is proportional to the number of tags, in the order of several minutes.

Histogram Collection – In order to capture the distribution statistics in time It is essential to sacrifice some accuracy. Obtain the distribution within the order of several seconds. – Propose estimation scheme to quickly count the tag sizes of each category, while achieving the accuracy requirement. – We aim to propose a series of novel solutions satisfying the properties: Time-efficient. Simple for the tag side in the protocol. Complies with the EPCglobal C1G2 standards.

Scenario – A large number of tags (about 5000 tags) with a large number of categories (about 100 categories) in the effective scanning area. – The slotted ALOHA-based anti-collision scheme is used. – The present set of category IDs cannot be predicted in advance. Objective – Collect the histogram over RFID tags according to some categorized metric, e.g., the type of merchandise. – Achieve time-efficiency while satisfying the accuracy /population constraints.

Objective – The RFID system needs to collect the histogram for all categories in a time-efficient approach. Accuracy constraint – Suppose the estimated tag size for category (1 ≤ ≤ ) is, then the accuracy constraint is Given the exact tag size for a specified category, the estimated tag size should be in a confidence interval of width 2 ⋅, i.e., with a probability greater than 1 −. For example, if = 0.1, = 0.05, then for a category with tag size = 100, the estimated tag size should be within the range [90, 110] with a probability greater than 95%.

Two straightforward solutions – Basic Tag Identification – Separate Counting Both of the two solutions are not time-efficient. – Basic tag identification: uniquely identifying each tag in the massive set is not scalable. – Separate counting: The fixed initial frame size for each category, and the inter-cycle overhead among query cycles, make the scanning time rather large. The reader needs to scan each category with at least one query cycle, not necessarily addressed in the iceberg query or the top- query.

Ensemble sampling-based solution – Issue a query cycle by selecting a certain number of categories. – Obtain the empty/singleton/collision slots. – Estimate the overall number of tags according to the observed number of empty/singleton/collision slots. – Estimate the number of tags for each of the categories according to the sampling in the singleton slots.

RFID Reader Select C1, C3, C4 RFID Tags 0 C1 X 1 1 X 0 1 X C3 C4 Ensemble sampling C3 C4C1C3 Estimate the overall tag size 14~17 tags Estimated tag size for each category: C1: 3~5 tags C3: 4~6 tags C4: 5~7 tags

Ensemble sampling-based solution The ensemble sampling is more preferred than the separate counting in terms of reading performance. – More tags are involved in one query cycle-> more slots amortize the cost of inter-cycle overhead, the Select command, as well as the fixed initial frame size. – The overall scanning time can be greatly reduced. – In regard to iceberg query and top-k query, some optimization can be further applied with ensemble sampling to filter unqualified categories and estimate the threshold.

Ensemble sampling-based solution Accuracy Analysis – The variance of the estimator – Reduce the variance through repeated tests

Property of the Ensemble Sampling – During the ensemble sampling, the major categories occupy most of the singleton slots, those minor categories cannot obtain enough samplings in the singleton slots for accurate estimation of the tag size. – Achieve the accuracy requirement for all categories -> the scanning time depends on the category with the smallest tag size, as the other categories must still be involved in the query cycle until this category achieves the accuracy requirement. – Solution: Break the overall tags into several groups for separate ensemble sampling.

Ensemble sampling-based solution The optimal granularity for the group size in ensemble sampling Each cycle of ensemble sampling should be applied over an appropriate group-> the variance of the tag sizes for the involved categories cannot be too large-> all categories in the same group achieve the accuracy requirement with very close finishing time. Solution: dynamic programming

Dynamic Programming-based Solution (Example) A set of tags with 10 categories (ranked in non-increasing order of the estimated tag size) 100, 80, 75, 41, 35, 30, 20, 15, 12, 8 C1, C2, C3, C4, C5, C6, C7, C8, C9, C10Category: Rough tag size: Dynamic Programming: We define T(i,j) as the minimum scanning time over the categories from Ci to Cj among various grouping strategies (, ) is obtained by enumerating each possible combination of (, ) and (+1, ), and then getting the minimum of (, ) + ( + 1, ).

Dynamic Programming-based Solution Set an initial query round to roughly estimate the tag size. Use dynamic programming to break the tags into smaller groups for ensemble sampling. Ensemble sampling over each group for multiple cycles for the accuracy requirement.

Objective – The RFID system needs to collect the histogram for those categories with tag size over a specified threshold in a time-efficient approach. Accuracy constraint Population constraint Express population constraint in equivalent form

Key issues in Iceberg Query Problem – Quickly determine the qualified and unqualified categories according to the threshold. Ensemble sampling-based solution As the number of repeated tests increases, the averaged variance for each category decreases-> the confidence interval for each category is shrinking. After a certain number of query cycles, all categories can be determined as qualified / unqualifed for the population constraint.

Important clues to optimize the solution – Unqualified categories: When the estimated value, the required variance in population constraint is much larger than accuracy constraint. They can be quickly identified as unqualified, and wiped out immediately from ensemble sampling. – Long tail: those categories each of which occupies a rather small percentage, but all together they occupy a substantial proportion. Rely on the following theorem to quickly wipe out the categories in the long tail.

Ensemble sampling based solution for Iceberg Query – Qualified categories – Unqualified categories – Undetermined categories Ensemble sampling. Wipe out unqualified categories quickly. Wipe out unqualified categories in the “long tail”.

Objective – The RFID system needs to collect the histogram for those categories in the top-k list in a time-efficient approach. Accuracy constraint Population constraint Both Basic Tag Identification and Separate Counting are not suitable.

Important clues to optimize the solution – As the exact value of tag size is unknown, in order to define [ ∈ top- list], it is essential to determine a threshold so that [ ∈ top- list] = [ ≥ ]. – Ideally, should be the tag size of the th largest category; however, it is difficult to compute an exact value of in the estimation scheme. – If the threshold can be accurately estimated, then the top- query problem is reduced to the iceberg query problem. The key problem is to quickly determine the value of the threshold while satisfying the constraint.

Ensemble sampling based solution for Top-k Query Use ensemble sampling to estimate the threshold t: rapidly make it converge to t. Apply the Iceberg Query method.

The Performance in Basic Histogram Collection BI: Basic Identification SC: Separate Counting ES: Ensemble sampling with one group ES-g: Ensemble sampling with optimized grouping Varying the number of categories and tag size Varying the accuracy requirement

The Performance in Basic Histogram Collection Varying the inter-cycle overhead. Varying the overall number of categories.

The Performance in Advanced Histogram Collection Varying the values of threshold ratio in iceberg query. Varying the standard deviation of tags size for various categories.

The Performance in Advanced Histogram Collection Varying the values of k in top-k query. Evaluate the convergence for estimating the threshold in top-k query.

We propose a series of protocols to tackle the problem of efficient histogram collection – Basic histogram collection – Iceberg Query – Top-k Query To the best of our knowledge, we are the first to consider collecting histograms over RFID tags, a fundamental premise for queries and analysis in RFID applications. We propose a novel, ensemble sampling-based method to simultaneously estimate the tag size for a number of categories. Our solutions are completely compatible with current industry standards and do not require any modification to tags.

Thanks for your attention! Q & A