Download presentation
Presentation is loading. Please wait.
Published byMartin Fletcher Modified over 9 years ago
1
10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector Coloring the vector space Coloring the vector space Grey:: function > threshold Grey:: function > threshold White:: function <= threshold White:: function <= threshold Goal: determine color of global data vector (average). Goal: determine color of global data vector (average).
2
10/5/2015 2 Bounding the Convex Hull Observation: average is in the convex hull Observation: average is in the convex hull If convex hull monochromatic then average too If convex hull monochromatic then average too But – convex hull may become large But – convex hull may become large
3
10/5/2015 3 Drift Vectors Periodically calculate an estimate vector - the current global Periodically calculate an estimate vector - the current global Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Global average statistics vector is also the average of the drift vectors Global average statistics vector is also the average of the drift vectors
4
10/5/2015 4 The Bounding Theorem [SIGMOD’06] A reference point is known to all nodes A reference point is known to all nodes Each vertex constructs a sphere Each vertex constructs a sphere Theorem: convex hull is bounded by the union of spheres Theorem: convex hull is bounded by the union of spheres Local constraints! Local constraints!
5
10/5/2015 5 Basic Algorithm Basic Algorithm An initial estimate vector is calculated An initial estimate vector is calculated Nodes check color of drift spheres Nodes check color of drift spheres Drift vector is the diameter of the drift sphere Drift vector is the diameter of the drift sphere If any sphere non monochromatic: node triggers re-calculation of estimate vector If any sphere non monochromatic: node triggers re-calculation of estimate vector
6
10/5/2015 6 Reuters Corpus (RCV1-v2) 800,000+ news stories 800,000+ news stories Aug 20 1996 -- Aug 19 1997 Aug 20 1996 -- Aug 19 1997 Corporate/Industrial tagging Corporate/Industrial tagging n=10 10 nodes, random data distribution
7
10/5/2015 7 Trade-off: Accuracy vs. Performance Inefficiency: value of function on average is close to the threshold Inefficiency: value of function on average is close to the threshold Performance can be enhanced at the cost of less accurate result: Performance can be enhanced at the cost of less accurate result: Set error margin around the threshold value Set error margin around the threshold value
8
10/5/2015 8 Performance Analysis
9
10/5/2015 9 Performance Analysis (cntd.)
10
10/5/2015 10 Balancing Globally calculating average is costly Globally calculating average is costly Often possible to average only some of the data vectors. Often possible to average only some of the data vectors.
11
SRDC 2013 10/5/2015 11 Shape Sensitivity [PODS’08] Fitting cover to Data Fitting cover to Data Fitting cover to threshold surface Fitting cover to threshold surface Specific function classes Specific function classes
12
SRDC 2013 10/5/2015 12 Fitting Cover to Data (using the covariance matrix)
13
10/5/2015 13 Fitting Cover to Threshold Surface -- Reference Vector Selection
14
10/5/2015 14 Distance Fields Skeleton, Medial Axis
15
10/5/2015 15 Results – Shape Sensitivity
16
e ΔV1ΔV1 ΔV2ΔV2 ΔV3ΔV3 ΔV4ΔV4 ΔV5ΔV5 f(v(t)) T epep ΔVp1ΔVp1 ΔVp2ΔVp2 ΔVp3ΔVp3 ΔVp4ΔVp4 ΔVp5ΔVp5 v(t) Stricter local constraints if local predictions remain accurate Keeping up with v(t) movement Prediction-Based Geometric Monitoring [SIGMOD’12]
17
SRDC 2013 Local Constraints 17 Let the nodes communicate only when “something happens” Tell me only if your measurement is larger than 50! Send me your current measurements! Safe Zones!
18
SRDC 2013 Local Distributions 18 584510 664420 435015 784317 853021 704711 762512 65585 564715 753416 Reasonable to assume future data will behave similarly… These Safe Zones save more communication!
19
SRDC 2013 Optimal Safe Zones 19 1. Legal / Safe 2. Large: Minimize Communication
20
SRDC 2013 Example: Air quality monitoring 20 What are the optimal Safe Zones…?
21
SRDC 2013 The Optimization Problem 21 Is this Convex? Is this Linear? How many constraints are these? BAD NEWS: This problem is NP-hard.
22
SRDC 2013 The Optimization Problem Step 3: Use non-convex optimization toolboxes (e.g. Matlab’s “fmincon”). These toolboxes use sophisticated Gradient Descent algorithms and return close-to-optimal results. X
23
SRDC 2013 23 Data Set How the data looks like
24
SRDC 2013 Ratio Queries 24 Example of triangular Safe Zones
25
SRDC 2013 Improvement over convex-hull cover method 25 Why do we improve so much? Up to 200 nodes were involved in the experiment. The average improvement was by a factor of 17.5 Up to 200 nodes were involved in the experiment. The average improvement was by a factor of 17.5 5’000 hours
26
26 Higher Dimensions
27
SRDC 2013 Chi-Square Monitoring (5D) 27 Examples of axis aligned boxes as Safe Zones
28
SRDC 2013 Improvement over GM The improvement over the Geometric Method gets more substantial in higher dimensions. 28 1’000 hours 90 nodes
29
SRDC 2013 29 Safe Zones - Example
30
SRDC 2013 Biclique: Non-Convex Safe Zones 30 Safe Zone Algorithm (for 2 nodes): Take the data points, build a bipartite graph(how?), find the maximal Biclique, these are your Safe Zones!
31
SRDC 2013 Conclusions Local filtering for large-scale distributed data systems Local filtering for large-scale distributed data systems Saving in communication is unlimited Saving in communication is unlimited Bounded only by the aggregate over system lifetime Bounded only by the aggregate over system lifetime Saving bandwidth, central resources, power. Saving bandwidth, central resources, power. Not necessary to sacrifice precision and latency Not necessary to sacrifice precision and latency Less communication more Privacy Less communication more Privacy 10/5/2015 31
32
SRDC 2013
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.