Gal Yehuda Daniel Keren

Slides:



Advertisements
Similar presentations
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Advertisements

5.1 Real Vector Spaces.
On the Density of a Graph and its Blowup Raphael Yuster Joint work with Asaf Shapira.
CSE 330: Numerical Methods
Bipartite Matching, Extremal Problems, Matrix Tree Theorem.
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Fast Algorithms For Hierarchical Range Histogram Constructions
MS&E 211 Quadratic Programming Ashish Goel. A simple quadratic program Minimize (x 1 ) 2 Subject to: -x 1 + x 2 ≥ 3 -x 1 – x 2 ≥ -2.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
An Introduction to Variational Methods for Graphical Models.
Computational problems, algorithms, runtime, hardness
CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Asaf Cohen (joint work with Rami Atar) Department of Mathematics University of Michigan Financial Mathematics Seminar University of Michigan March 11,
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
GRAPH Learning Outcomes Students should be able to:
Decision Procedures An Algorithmic Point of View
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
The Lower Bounds of Problems
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.
1 Quasi-randomness is determined by the distribution of copies of a graph in equicardinal large sets Raphael Yuster University of Haifa.
Final Review Chris and Virginia. Overview One big multi-part question. (Likely to be on data structures) Many small questions. (Similar to those in midterm.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
CSE 330: Numerical Methods. What is true error? True error is the difference between the true value (also called the exact value) and the approximate.
Localization by TDOA ©Thomas Haenselmann – Department of Computer Science IV – University of Mannheim Lecture on Sensor Networks Historical Development.
Geometric Approach Geometric Interpretation:
3.3 Dividing Polynomials.
Trigonometric Identities
The NP class. NP-completeness
Lap Chi Lau we will only use slides 4 to 19
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
New Characterizations in Turnstile Streams with Applications
Optimization problems such as
Topics in Algorithms Lap Chi Lau.
Markov Chains Mixing Times Lecture 5
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
The Variable-Increment Counting Bloom Filter
Ch 11.1: The Occurrence of Two-Point Boundary Value Problems
§7-4 Lyapunov Direct Method
Georgina Hall Princeton, ORFE Joint work with Amir Ali Ahmadi
Haim Kaplan and Uri Zwick
CS4234 Optimiz(s)ation Algorithms
Definite Integrals Finney Chapter 6.2.
Chapter 5. Optimal Matchings
Possibilities and Limitations in Computation
Degree and Eigenvector Centrality
Objective of This Course
Instructor: Shengyu Zhang
Matrix Martingales in Randomized Numerical Linear Algebra
Integer Programming (정수계획법)
University of Crete Department Computer Science CS-562
On the effect of randomness on planted 3-coloring models
Chapter 11 Limitations of Algorithm Power
Recurrences (Method 4) Alexandra Stefan.
3.3 Network-Centric Community Detection
The Byzantine Secretary Problem
Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)
Lecture 6: Counting triangles Dynamic graphs & sampling
Trevor Brown DC 2338, Office hour M3-4pm
Approximation Algorithms
Chapter 1. Formulations.
1 Newton’s Method.
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Presentation transcript:

Gal Yehuda Daniel Keren Monitoring Properties of Large, Distributed, Dynamic Graphs (IPDPS 2017) Gal Yehuda Daniel Keren

Graphs of interest used to be static and centralized. For some time, dynamic graphs (edges arrive and are deleted) have been a focus of research. It is almost always assumed that the graph is centralized. However, many graphs which are prevalent in real-life applications are both dynamic and distributed (Youtube, Google, Internet…).

S2 S1 We assume that: Vertices are fixed and known to all servers. Edges can appear and disappear; each server knows only about the edges it inserted or deleted. Edges are not duplicated between servers.

It follows that, at any time step, the global adjacency matrix is the sum of the local ones (i.e. held at the distinct servers). We are interested in computing, approximating, or bounding some global function over the graph, but w/o continuously centralizing it. An instance of the distributed monitoring problem – i.e. G. Cormode, 2013. Solution: define conditions which can be checked locally and efficiently, and imply the global condition. These conditions should be resilient, so as to have a minimal number of “false alarms”. For some problems, this is quite difficult… e.g. distinct count.

Taking a step back… simpler example: detecting spam in a distributed e-mail server. Reminder: given a feature-category contingency table its mutual information is defined as spam not spam e-mail w/ term 0.2 0.7 e-mail w/o term 0.01 0.09 If : SPAM, if : NOT SPAM

Need to infer on global threshold crossing from the local threshold crossings… that’s NOT always easy. For general functions, what does the average of the values tell us about the value at the average?

The general architecture we consider   As common in distributed monitoring, we forgo exact computation of , and settle for threshold queries.

How then to define “good” LOCAL conditions? GM – the Geometric Method. Work since 2006. Is NP-Complete, even for one-dimensional data (under appropriate definitions of optimality). SIGMOD 2016, VLDB 2013+2015, KDD 2015, TODS 2014, ICDE 2014… Good results, general solution. Improved state-of-the-art for previously studied functions, and solved for functions which previous methods could not handle. However, a major difficulty for some problems: a geometric constraint needs to be checked at each server, which can be exceedingly difficult. For monitoring cosine similarity, running time for a single check, using state-of-the-art software, was in the minutes.

An alternative solution (Lazerson/Keren/Schuster, KDD 2016): convex functions. Reminder: a function is convex iff the value at the average is smaller than the average of the values: So they are beautifully suited to monitoring: if the threshold condition locally holds at every node, it automatically holds globally!

Another reminder – a function is convex iff it lies above all its tangent planes (this will soon prove useful).

What about non-convex functions? Solution – tightly bound the monitored function with a larger convex function (CB, for convex/concave bound), and monitor the CB. Our previous work used approximations with convex sets; see also Lipman, Yagev, Poranne, Jacobs, Basri: Feature Matching with Bounded Distortion. Need different functions for different initial points WLG, assume that the data at all nodes starts at a fixed point and drifts from it.

In a nutshell System problem (minimize communication) Mathematical problem (good convex bound)

Some simple examples… If is convex, the optimal bound is of course… If is concave, the optimal bound at is… If is neither, for example the optimal bound is… ??? You’re probably guessing… and you’re correct to some extent.

Ideally, an optimal (upper) convex bound, , for at the point should satisfy: If is a convex bound for satisfying the above two requirements, it must hold that Turns out that such a bound is impossible to achieve (except for the trivial cases, in which is either convex or concave): Theorem: for , the family of convex quadratics defined by are all minimal upper bounds for , however, no pair of them can be compared – i.e. given two of them, , then Neither nor holds

Two members of the family of bounding functions (green) for (blue). The two bounds super-imposed – non is larger than the other.

This is bad news… since every function which is not convex nor concave contains a “copy” of ! One can argue that is optimal in some sense, as it is the “slowest changing” in the family of bounds. But… MORE bad news! How to find an even “reasonable” convex bound? Guess: expand to a second-order Taylor series, truncate the terms of order >= 3, and remove the “negative part” of the Hessian. BUT – this is not guaranteed to be a bound!

So… “There is a town in North Ontario…” Seems like a rather difficult problem. Recall that min of convex is not convex.

Still, we hacked some solutions Still, we hacked some solutions. This is how the bounds look for the Pearson Correlation Coefficient (KDD 2016): CB Concave lower bound p p Convex upper bound PCC Also: PCA (effective) dimension, inner product, cosine similarity.

For the distributed, dynamic graphs, we looked at two popular functions: eigenvalue gap and number of triangles. Both can be expressed as homogenous functions (of degrees 1 and 3) of the adjacency matrix eigenvalues, so we can reduce to the case in which the global matrix is the average of the local ones. It remains to find good convex bounds. Remark: since all local matrices are initially set to the average of the global matrix, the monitoring can commence even if some of the local graphs do NOT abide by the monitored condition, e.g. have a gap smaller than the threshold.

Why is it necessary? Generally, impossible to infer on the global values from the local ones.  Erdős–Rényi  Scale Free

Concave (lower) bound for eigenvalue gap Assume a matrix is known at time 0. Find a “good” (tight) concave lower bound for Note that is convex, alas is neither convex nor concave. Use a variational definition of (Min-max): Now define , where Is the Leading eigenvector at time 0. Finally, define , where is the tangent plane to at time 0.

Two (rather technical) theorems: The bound is optimal to second order. It can be computed very fast by a modification of power methods. Works better for denser graphs (under the change of the same percentage of edges). Ratio of bound to actual value. Works well for real graphs as well (Youtube and Flickr).

Number of triangles = sum of cubes of eigenvalues. Let’s start by bounding a cubic in one variable. Optimal convex bound around 0 is simply . Use the following beautiful theorem to extend to sum of cubes of the eigenvalues: (Davis, 1957): any symmetric convex function of the eigenvalues of a symmetric matrix is a convex function of that matrix. To handle general matrices (i.e. not around the zero matrix but a general one, ), use

Thank you! Questions? Future work Improve these results, esp. for number of triangles (complexity is high, need to compute many eigenvalues). Use sketches and other tools of the trade. Further investigate how to proceed when a local violation occurs – can we do something more efficient than centralizing the data. Drop the sum-of-local-models assumption, e.g study decision trees in which the nodes are “horizontally partitioned” between the servers. Thank you! Questions?