Download presentation
Presentation is loading. Please wait.
Published byPeter Sherman Modified over 9 years ago
1
PIRS: Query Verification on Data Streams Ke Yi, Hong Kong University of Science and Technology Feifei Li, Florida State University Marios Hadjieleftheriou, AT&T Labs George Kollios, Boston University Divesh Srivastava, AT&T Labs work done while the 1 st and 2 nd authors were working at AT&T labs.
2
Publishing Data and Outsourcing Query Service 2 Network Gigascope: analysis tool by IP Traffic Stream coming from 0 1 1 0 0 1 … 1 1 0 … statistics Results
3
Revisiting the CISCO – AT&T Example 3 Network Gigascope IP Traffic Stream 0 1 1 0 0 1 … 1 1 0 … statistics lawyers: sign the trust agreementCould we help? (computer scientists)
4
Concrete Example Continuous Query: SELECT SUM(packet_size) FROM IP_trace GROUP BY srcIP, destIP Answer: 4 pmpm p3p3 p2p2 p1p1... IP Stream: : srcIP, destIP, packet_size 123...n 510KB2KB150KB...5KB 1011KB130KB1MB...20KB 13... Time Groups
5
Continuous Query Verification (CQV) on Data Streams 5 1.Client register query 2.Server reports answer upon request Server maintains exact answer Client maintains synopsis X Both client and server monitor the same stream Source of streams Group 1 Group 2 Group 3 … … … SELECT SUM(packet_size) From IP_Trace GROUP BY src_ip, dest_ip
6
The Model for the Stream 6 9|17|i S 1|1 … 0 VTVT 000 … V1V1 V2V2 V3V3 VnVn 90 ViVi 710 T=1T=2 T=3 agg_attribute | group_id
7
Continuous Query Verification: CQV 7 0 VTVT 000 … V1V1 V2V2 V3V3 VnVn 90 ViVi 710 9|17|i S 1|1 … T=1T=2T=3 Update V XTXT Synopsis Update X 0020 … V1V1 V2V2 V3V3 VnVn 90 ViVi 510 Alarm 000 … V1V1 V2V2 V3V3 VnVn ViVi 710 no alarm
8
PIRS: Polynomial Identity Random Synopsis 8 choose prime p : chose a random number : raise alarm if not equal o/w no alarm
9
Incremental Update to PIRS 9 9|17|i S … T=1T=2 update to v 1 update to v i An update to group i with value u could be done in logu time (exponential by squaring): 1|1 update to v 1
10
It Solves CQV problem! 10 Theorem: Given anyPIRS raises an alarm with probability at least 1-δ a polynomial with 1 as the leading coefficient is completely determined by its zeroes Due to the fundamental theorem of algebra. happens at no more than m values of x Since we have p>m/ δ choices for a : the probability that X(V)=X(W) is at most δ
11
Optimality of PIRS 11 Theorem: PIRS occupies O(log m/δ + log n) bits of space (3 words only at most, i.e., p, a, X(V) ), spends O(1) time to process a tuple for count query, or O(log u) time to process a tuple for sum query. Theorem: Any synopsis for solving the CQV problem with error probability at most δ has to keep Ω(log min{n,m}/δ) bits.
12
Multiple Queries 12 Q1Q1 Q2Q2 X1X1 X2X2 Q1Q1 Q2Q2 X 9|1,8 S … update to v 1 update to v 8 Theorem: our synopses use constant space for multiple queries. V 1..n1 V 1..n2 V 1..(n1+n2)
13
Handle the Load Shedding 13 Semantic Load Shedding: drop tuples from certain groups Small number of groups having errors Random Load Shedding: All groups have small amount of errors
14
CQV with Semantic Load Shedding 14 Randomly drop certain tuples according to groups 9|17|i2|j1|14|k … 5|1 Server claims at most γ number of groups have errors To detect if more than γ groups having errors! We have designed synopses using O( γ log 1/δ log n) bits of space and achieve the error probability at most δ
15
PIRS γ: An Exact Solution 15 PIRS … k buckets Alarm v8v8 b(8)=2 If at least buckets raise alarms PIRS … … log 1/δ Alarm If at least one layer raises alarms
16
PIRS γ: An Exact Solution 16 Theorem: PIRS γ requires O(γ 2 log1/δ logn) bits, spends O( log1/δ ) time to process a tuple and solves CQV with semantic load shedding.
17
Intuition on Approximation 17 number of errors probability to raise alarm γ the ideal synopsis γ-γ-γ+γ+ the approximation
18
PIRS ±γ: An Approximate Solution 18 Theorem: PIRS ±γ requires O(γ log1/δ logn) bits, spends O(γ log1/δ ) time to process a tuple.
19
CQV with Random Load Shedding 19 Randomly drop tuples All groups have small errors To detect if any group has error greater than a claimed threshold Theorem: Any synopsis solves this problem with error probability at most δ requires at least Ω(n) bits (reducing to the problem of estimating infinite frequency moment: the number of occurrence of the most frequent item).
20
Sliding Window and Other Queries It is easy to extend PIRS to work with sliding window model since it is decomposable, i.e., X(v1+v2)=X(v1)*X(v2). Other queries that can be transformed into Group By aggregation queries. Details in the paper. 20
21
Some Experiments 21 We use real streams: World Cup Data (WC) IP traces from the AT&T network (IP) We perform the following query: WC: Aggregate on response size and group by client id/object id (50M groups) IP: Aggregate on packet size and group by source IP/destination IP (7M groups) Hardware for the client: 2.8GHz Intel Pentium 4 CPU 512 MB memory Linux Machine
22
Detection Accuracy 22 Over 100,000 random attacks, PIRS identifies all of them.
23
Memory Usage of Exact 23 PIRS using only constant 3 words (27 bytes) at all time. Exact’s memory usage is linear and expensive.
24
Update Time (per tuple) of Exact 24 1.Exact is fast when memory usage is small. 2.It becomes extremely slow due to cache misses and memory swap operations. Cache misses and memory swap
25
Running Time Analysis 25 WCIPs Count0.98 μs Sum8.01 μs6.69 μs Average Update Time IPs exhibits smaller update cost for sum query as the average value of u is smaller than that of WC
26
Multiple Queries: Exact Memory Usage 26 PIRS always using only constant 3 words (27 bytes). Exact’s memory usage is linear w.r.t number of queries and increasing over time.
27
Multiple Queries: Exact Update Time Per Tuple 27
28
Multiple Queries: PIRS Update Time Per Tuple 28
29
The Library 29 Download PIRS and other synopses at: http://www.cs.fsu.edu/~lifeifei/pirs/
30
Conclusion Space and Update efficient synopsis for verifying continuous group-by aggregation queries on streaming data; Could be generalized to handle selection query, and sliding-window semantics; How about more complicated queries? 30
31
Thanks! 31 Questions
32
Problem and Goals 32 Assumption: Client and DSMS observe the same stream Problem: Client needs to verify the results Goals: Be memory, update efficient Tolerance for a limited number of errors Tolerance for small errors Support multiple queries
33
Related Techniques to PIRS 33 Incremental Cryptography Block operation (insert, delete), cannot support arithmetic operation Program Verification Server may pass the program execution but simply return random outputs Fingerprinting Technique PIRS is a fingerprinting technique
34
CQV with Semantic Load Shedding 34
35
PIRS ±γ: An Approximate Solution 35 Theorem: PIRS ±γ : 1.raises no alarm with probability at least 1- δ on any 2.raises an alarm with probability at least 1- δ on any For any c>-lnln2=0.367 Using the intuition of coupon collector problem and the Chernoff bound.
36
PIRS ±γ: An Approximate Solution 36 PIRS … k buckets Alarm vivi b i =2 If all k buckets raise alarms PIRS … … log 1/δ Alarm If majority layers raise alarms
37
Information Disclosure on Multiple Attacks 37 R PIRS: X(V) on r Learns nothing about r Insight: server could potentially gets rid of δ portion of seeds from each notified failed attack!
38
Information Disclosure on Multiple Attacks 38 Bob Theorem: For the total of k attacks made by Bob to PIRS, the probability that none of them succeeds is at least 1-kδ.
39
Proof of the Optimality 39
40
Proof of the Optimality 40
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.