Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008.

Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008

Outline Theoretical Computer Science  What’s the deal with research?  Models, Techniques and Algorithms Distributed Computing Model  Motivation & Definition A Randomized, Distributed Algorithm  (get friendly with your Cycle Space)

Theoretical Computer Science (TCS) in a Nutshell You may already know about algorithms and data structures (in the “RAM model”)  (BFS, DFS, Dijkstra, Floyd-Warshall, Euclidean, quicksort, binary search, flows…) This is only the tip of the iceberg in TCS TCS’s flavour: mathy (cool ideas and proofs) but applicable to real problems

One-slide TCS Taxonomy Algorithms/data structures in many models  Sequential (RAM, FSA, Turing machines)  Parallel (dual-core, parallel RAM [PRAM])  Distributed (cluster/distributed computing) Complexity: P, NP, coNP, PH, P P, #P, … Approximation and randomized algorithms Cryptography, quantum, geometry…

Why study TCS? Immediately applicable bits (Google maps, credit cards, operations research) Determine fundamental limitations on our power of computation (halting problem) My view: combines most interesting parts of mathematics and computer programming  lots of room for creativity  natural field to study if you like contest problems (but in research, you don’t know if there’s a nice answer)

Part 2: Distributed Computing Model

Distributed Computing Model Graph (V, E) = network of “computers”  Nodes store data and perform computations  Edges relay messages between nodes

Distributed Computing Model Goal: want the graph to compute properties of its initially unknown shape  e.g. shortest path from 5F1 to 308  e.g. max flow (bandwidth) from 5F1 to 308 Motivating situations: internet, ad-hoc wireless networks, cellular telephone networks, sensor nets, social networks 5F1 308 xyz

Formal Definition of Model Unique ID (1, 2, 3…) for each node. #1 is leader Initially nodes only know their own ID and the ID of each of their neighbours In each round every node can send a O(log |V|)- bit message to each neighbour  Messages are received next round Node has ∞ storage & power between rounds Need to design a local program, a copy of which will run at each node, to achieve goal

Formal Definition of Complexity The time complexity of an algorithm is the number of rounds that elapse before termination The message complexity is the total number of messages that are sent We don’t care about time/space requirements at individual nodes

Distributed Computing 101 What if IDs are “ugly”? e.g. in sensor network, (3D8,1FE…) instead of (1,2…)  or if graph is not connected  Need a leader election algorithm How can we communicate to all nodes? How can we count # nodes? How can we adapt to edge/node failures?

Basic Problem: Spanning Tree Required: mark a subset of edges so that there is exactly one path from each node to the leader  “Mark:” each node keeps a list of which of its adjacent edges are in the tree Each non-leader must know its parent  Again, each node stores parent and child IDs

Solution: (Breadth-First) Spanning Tree Algorithm 1. Initialize only leader to be in the tree 2. In each round, at node v,  if this is the first round v is in the tree, send msg to each neighbour asking to join tree  else if (v not in tree) and v got msg from u add v and uv to tree & set u to be the parent of v 3. (Stop when all nodes are in tree)

Illustration of Spanning Tree Construction Legend: computer; leader; edge w/ msg sent; tree edge (head=parent) Done! Now… can use tree to broadcast msg from leader to all nodes, or census

Distributed Census Algorithm Each leaf node reports “1” to parent For each nonleaf, sum reports from children, add 1, and send to parent 1 1 1 1 1 1 1 1 3 3 4 6 2 5 7 8 10 18

Time Analysis Construction of T, broadcast, and census, take time proportional to height(T)  Also proportional to diameter Diam of network := max distance between any 2 nodes Compare: sequential model always has time complexity >= |E| due to reading input  Diam can be much smaller than |V|, |E|

Part 3: Randomized (Distributed) Algorithm for Cut Edges/Pairs

Types of Cuts in Graphs A cut is a part of a connected graph that, when deleted, makes it disconnected Cut edge: (“bridge”) Cut pair: Motivation to find these: want to attack or reinforce a network

Part 3 Summary I’ll show you a simple new approach that lets you find cut edges and cut pairs Yields O(E)-time RAM algorithm  Older algorithms match this, but are complex Yields O(Diam)-time distributed alg’s  Beats previous best. In publication. Tools: randomization, cycle space

Circulations circulation ~ flow without source or sink modulo k, there are finitely many circulations on a graph For talk: focus on useful case k=2  useful because we don’t have to worry about orienting undirected edges  we will call these “binary circulations”

The Cycle Space An even graph has even degree at each vertex; for graph (V, E) The cycle space is “all subsets F of E such that (V, F) is an even graph” If F is in the cycle space, we call F a binary circulation Is a vector space (algebra 101) F marked in red

Examples of Binary Circulations For this graph,  some binary circulations Φ 1 shown in green Φ 2 shown in red Φ 3 shown in orange  another one is the empty graph (Φ ≡ 0)

Get To Know Your Cycle Space Lemma 1: If F1 and F2 are binary circulations, so is F1 xor F2 Lemma 2: If e is a cut edge and F is a binary circulation, then e is not in F Lemma 3: If {e, f} is a cut pair, and F is a binary circulation, then either (1) both e, f are in F or (2) neither e nor f are in F

b-bit circulations To denote many binary circulations (Φ 1,…,Φ b ) at once:  b-bit circulation: function Φ:E→{0,1} b where ith bit of Φ(e) is Φ i (e)  e.g. for edges e* and f* & Φ 1, Φ 2, Φ 3 as before, Φ(e*)=001, Φ(f*)=111 e* f*

Constructing Binary Circulations For spanning tree T, E\T = “non-tree edges” Claim: for any T and subset S of E\T, a unique subset S’ of T exists so that S u S’ is a circ.  S u S’ is “unique completion” of “partial circulation” S Corollary: given b-bit values on each non-tree edge, exists a unique assignment of values to tree edges that makes a b-bit circulation  Next: proof/implementation of claim x x x x x x x x S’ S

Binary Circulation Construction (Completion) Fixed: which edges of E\T to include Idea: for uv in T, v a leaf:  conservation at v determines if uv should be included  repeat! h(t) distributed rounds  at end each v knows incident Φ values tree T & edges of E\T to include or exclude x x x x x u u u u v v v v Must include Must exclude x x x x

Random Binary Circulations Where randomness comes into play:  include each non-tree edge w/ indep. prob. ½  then, compute completion Fact: Pr[we obtain Φ*] = 2 E-V+1, for any Φ*  So all binary circulations are equally likely Distributed implementation easy

Application 1: Cut Edges Folklore: for circ. Φ & cut edge e, Φ(e)=0. Conversely, with a little work we can show:  For random binary (resp. b-bit) Φ, if e is not a cut edge, Pr[Φ(e)=0] = ½ (resp. (½) b ) S V\S δ(S) e

Application 1: Cut Edges Distributed algorithm:  Get random b-bit circulation Φ, b = 3lg(V)  Output that each e is a cut edge if Φ(e)=0 Analysis:  For cut edge e, Φ(e)=0  For non-cut edge e, Pr[Φ(e)=0] = 2 -b = V -3  Union bound  correct with prob. 1-1/V  O(D) distributed time, using BFS tree

Application 2: Cut Pairs WOLOG G has no cut edges With a little work we can show:  For random b-bit circulation Φ, Pr[Φ(e)=Φ(f)] is 1 if {e,f} is a cut pair, 2 -b otherwise S V\S δ(S) e f

Application 2: Cut Pairs Sketch of algorithm:  Generate a 5lg(V)-bit random circulation Φ  Sort all edges using Φ(e) as key for e  Output “cut pairs are {{e,f}|Φ(e)=Φ(f)}” Each pair is correct with probability 1-V -5 Thus probability of failure < E 2 V -5 < 1/V

Cut Pairs: Details Cut pairs can be described more compactly by cut classes  Idea: if {e,f} and {f,g} are cut pairs, so is {e,g} To get linear-time sequential algorithm use linear-time sort e.g. radix sort Major distributed hurdle: not easy to find all edge pairs {e,f} with Φ(e)=Φ(f)!

In Closing Notice that we’re gambling  Gives wrong answer with probability ~1/|V|  Always fast, usually correct: Monte Carlo We can convert it to one which checks output for correctness and starts over in the event of an error  Always correct, usually fast: Las Vegas

Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008.

Similar presentations

Presentation on theme: "Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008.

Similar presentations

Presentation on theme: "Theoretical Computer Science in a Nutshell David Pritchard CCC Second Stage 2008."— Presentation transcript:

Similar presentations

About project

Feedback