Download presentation
Presentation is loading. Please wait.
Published byJocelin McLaughlin Modified over 6 years ago
1
Zhu Han University of Houston Thanks for Professor Dan Wang’s slides
Signal processing and Networking for Big Data Applications Lecture 10: Sublinear Algorithm Zhu Han University of Houston Thanks for Professor Dan Wang’s slides
2
outline Motivations Inequalities and classifications Examples
Applications
3
Motivation for Sublinear-Time Algorithms
Massive datasets world-wide web online social networks genome project sales logs census data high-resolution images scientific measurements Long access time communication bottleneck (slow connection) implicit data (an experiment per data point)
4
What Can We Hope For? What can an algorithm compute if it
reads only a sublinear portion of the data? runs in sublinear time? Some problems have exact deterministic solutions For most interesting problems algorithms must be approximate randomized Quality of approximation Resources number of queries running time
5
Types of Approximation
Classical approximation need to compute a value output should be close to the desired value example: average Property testing need to answer YES or NO Intuition: only require correct answers on two sets of instances that are very different from each other In cases when we need to compute some value, it is clear what we mean by "approximation". The output should be close to the desired value. This is a classical notion, and everybody has heard of approximating the average and median values by sampling.
6
Why is it useful Algorithms for big data used by big companies (ultra-fast (randomized algorithms for approximate decision making) Networking applications (counting and detecting patterns in small space) Distributed computations (small sketches to reduce communication overheads) Aggregate Knowledge: startup doing streaming algorithms, acquired for $150M Today: Applications to soccer
7
Puzzles 5 1 8 11 9 7 6 3 4 2
8
Which number was missing?
9
Puzzle #1
10
Puzzle #2 (google interview Question)
11
Answers to the puzzles Uniform probability for each sample even it is i>s
12
outline Motivations Inequalities and classifications Examples
Applications
13
Inequalities Markov inequality Chebyshev inequality Chernoff bound
14
Markov’s Inequality
15
Markov Inequality: Example
16
Markov Inequality: Example
17
Markov Inequality: Example
18
Markov + Union Bound: Example
19
Chernoff bound
20
Chernoff bound (corollary)
21
Chernoff: Example
22
Chernoff: Example
23
Sublinear Algorithms Classification
24
outline Motivations Inequalities and classifications Examples
Applications
25
A Housewife Example Assume that there is a group of people who can be classified into different categories. One category is the housewife. We want to know the percentage of the housewife in this group, but the group is too big to examine every person. A simple way is to sample a subset of people and see how many of these people in it belong to the housewife group. This is where the question arise: how many samples are enough?
26
A Housewife Example Not a function of data size!
27
A Housewife Example
28
A Two Cat Problem Deterministic Algorithm
29
A Two Cat Problem
30
A Two Cat Problem 1,3,6,10,15,21,28 Total number is square root of n, between the number of two samples is also square root of n When you have two pieces of resources, split them even.
31
outline Motivations Inequalities and classifications Examples
Applications
32
Pricing and Sublinear Algorithms: Motivation
Overall picture:
33
Pricing and Sublinear Algorithms
Objectives: Design a differentiating user services model for profit gain computing based on different types of users Enable the services model staying efficient in big data context with performance guarantees Underlying philosophy: classify users first and then use corresponding typical user behavior instead of actual user usage as the approximation and estimation Advantages: Able to perform prediction Fast computation speed Save storage capacity
34
Pricing and Sublinear Algorithms: Pricing Model
Differentiating user service model: Simplify into 2 types of users in total, i.e., L=2: user type indicator load profiling expectation of m-th type user N: # of users L: # of user types total bill gain bill charge for typical m-th type user
35
Pricing and Sublinear Algorithms: Pricing Model
Model the expense: Total net profit gain: Xij: i-th user energy usage at time instant j ap: cost coeff to buy energy at peak hour ao: cost coeff to buy energy at off-peak hour
36
Pricing and Sublinear Algorithms
Classify users to compute α and β: Algorithm quality:
37
Pricing and Sublinear Algorithms
Sublinear on percentage calculation: “no need of every user for the computation” Not a function of N, complexity O(1)
38
Pricing and Sublinear Algorithms
Sublinear on classification/distribution comparison: “no need of every data points for the comparison” Existent sublinear algorithm for L2-distance test:
39
Pricing and Sublinear Algorithms
Drawbacks: confidence remains undetermined when the L2-distance of two testing distributions is truly in interval [ε2/2, ε2] Proposed solution: utilize the existent algorithm twice
40
Pricing and Sublinear Algorithms
1> Employ the traditional sublinear sampling and obtain labeled results as set {S1} 2> Employ the traditional sublinear sampling with twice larger of the error bound and obtain labeled results as set {S1} 3> Keep the labeled 1 in {S1} and reject all the labeled 2 4> Keep the labeled 2 in {S2} and reject all the labeled 1 5> Combine the retained labels into {S3}: if the same user is both labeled as 1 in {S1} and 2 in {S2}, his/her label is randomly decided 6> Output {S3} as the final classification results
41
Pricing and Sublinear Algorithms
Overall algorithm flow: Call AlgoPercent() to sample a small portion of users for classification Call AlgoDist() to sample a small portion of each user’s distribution data points.
42
Pricing and Sublinear Algorithms: Numerical Results
Bounded error vs. different parameterizations: Estimation errors vs. number of sub-sampling data points from the entire distribution Performance on estimating α
43
Pricing and Sublinear Algorithms: Numerical Results
Profit gains vs. other pricing plans; reduced computation burdens: Net profits from different pricing strategies Reduced data amount vs. overall confidence parameter
44
Pricing and Sublinear Algorithms: Numerical Results
reduced computation burdens vs. varying parameters and error & confidence settings: Reduced data amount vs. overall error bound parameter
45
summary Sublinear algorithms are much more efficient than linear algorithms for massive data sets A good sample strategy is needed Many applications in graph theory
46
Reference Slides from Dr. Ronnit Rubinfeld’s website
Slides from Dr. Dana Ron’s website D. Wong, Y. Long, F. Ergun, “A layered architecture for delay sensitive sensor networks” Dan Wang and Zhu Han, “Sublinear Algorithms for Big Data Applications,” Springer, 2015s
47
Thanks
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.