Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug 25 2010 ERC Workshop.

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Random variables 1. Note  there is no chapter in the textbook that corresponds to this topic 2.
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Link Analysis: PageRank
Markov Chains Modified by Longin Jan Latecki
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Randomized Algorithms Kyomin Jung KAIST Applied Algorithm Lab Jan 12, WSAC
Discrete Time Markov Chains
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
TCOM 501: Networking Theory & Fundamentals
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Random Walks Ben Hescott CS591a1 November 18, 2002.
Lecture 3: Markov processes, master equation
Entropy Rates of a Stochastic Process
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Continuous Time Markov Chains and Basic Queueing Theory
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 Discrete Structures & Algorithms Discrete Probability.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
1 On the Computation of the Permanent Dana Moshkovitz.
Great Theoretical Ideas in Computer Science.
Monte Carlo Methods in Partial Differential Equations.
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
1 Lesson 3: Choosing from distributions Theory: LLN and Central Limit Theorem Theory: LLN and Central Limit Theorem Choosing from distributions Choosing.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Monte Carlo Methods Versatile methods for analyzing the behavior of some activity, plan or process that involves uncertainty.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.
CompSci 100E 3.1 Random Walks “A drunk man wil l find his way home, but a drunk bird may get lost forever”  – Shizuo Kakutani Suppose you proceed randomly.
Ranking Link-based Ranking (2° generation) Reading 21.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Distributed Computing Seminar Lecture 5: Graph Algorithms & PageRank Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet Summer 2007 Except.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
COMS Network Theory Week 5: October 6, 2010 Dragomir R. Radev Wednesdays, 6:10-8 PM 325 Pupin Terrace Fall 2010.
CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.
CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.
Google's Page Rank. Google Page Ranking “The Anatomy of a Large-Scale Hypertextual Web Search Engine” by Sergey Brin and Lawrence Page
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Introduction to Sampling based inference and MCMC
Quality of a search engine
Random walks on undirected graphs and a little bit about Markov Chains
Intro to Sampling Methods
Markov Chains Mixing Times Lecture 5
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Markov chain monte carlo
Haim Kaplan and Uri Zwick
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
September 1, 2010 Dr. Itamar Arel College of Engineering
L31. The PageRank Computation
Presentation transcript:

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop

Contents Randomized Algorithm & Random Sampling Application Markov Chain & Stationary Distribution Markov Chain Monte Carlo method Google's page rank 2

Randomized Algorithm A randomized algorithm is defined as an algorithm that is allowed to access a source of independent, unbiased random bits, and it is then allowed to use these random bits to influence its computation. Ex) Computer games, randomized quick sort… Input Output Algorithm Random bits 3

Why randomness can be helpful? A Simple example Suppose we want to check whether an integer set has an even number or not. Even when A has n/2 many even numbers, if we run a Deterministic Algorithm, it may check n/2 +1 many elements in the worst case. A Randomized Algorithm: At each time, choose an elements (to check) at random. Smooths the “worst case input distribution” into “randomness of the algorithm” 4

Random Sampling What is a random sampling?  Given a probability distribution, pick a point according to.  e.g. Monte Carlo method for integration Choose numbers uniformly at random from the integration domain, and compute the average value of f at those points 5

How to use Random Sampling? Volume computation in Euclidean space. Can be used to approximately count discrete objects. Ex) # of matchings in a graph 6

Application : Counting How many ways can we tile with dominos? 7

Sample tilings uniformly at random. Let P 1 = proportion of sample of type 1. N * : estimation of N. N * = N 1 * / P 1 = N 11 * / (P 1 P 11 )… N1N1 N2N2 N = N 1 + N 2 Application : Counting 8 N 1 = N 11 + N 12

How to Sample? Ex: Hit and Run Hit and Run algorithm is used to sample from a convex set in an n-dimensional Euclidean space. It converges in time. (n: dimension) 9

How to Sample? : Markov Chain (MC) “States” can be labeled 0,1,2,3,… At every time slot a “jump” decision is made randomly based on current state 1 0 p q 1-q 1-p a d f c b e 10

Ex of MC: 1-D Random Walk Time is slotted The walker flips a coin every time slot to decide which way to go X(t) p 1-p 11

Markov Property “Future” is independent of “Past” and depend only on “Present” In other words: Memoryless Useful for modeling and analyzing real systems 12

Stationary Distribution Define Then ( is a row vector) Stationary Distribution: if the limit exists. If exists, it satisfies that 13

Conditions for  to Exist (I) The Markov chain is irreducible. Counter-examples: p=1 1 14

Conditions for  to Exist (II) The Markov chain is aperiodic.  A MC is aperiodic if all the states are aperiodic. Counter-example:

Special case It is known that a Markov Chain has stationary distribution if the detailed balance condition holds: 16

Monte Carlo principle Consider a card game: what’s the chance of winning with a properly shuffled deck? Hard to compute analytically Insight: why not just play a few games, and see empirically how many times win? More generally, can approximate a probability density function using samples from that density? ? Lose WinLose Chance of winning is 1 in 4! 17

Markov chain Monte Carlo (MCMC) Recall again the set X and the distribution p(x) we wish to sample from Suppose that it is hard to sample p(x) but that it is possible to “walk around” in X using only local state transitions Insight: we can use a “random walk” to help us draw random samples from p(x) X p(x) 18

Markov chain Monte Carlo (MCMC) In order for a Markov chain to useful for sampling p(x), we require that for any starting state x (1) Equivalently, the stationary distribution of the Markov chain must be p(x). Then we can start in an arbitrary state, use the Markov chain to do a random walk for a while, and stop and output the current state x (t). The resulting state will be sampled from p(x)! 19

Random Walk on Undirected Graphs At each node, choose a neighbor u.a.r and jump to it 20

Random Walk on Undirected Graph G=(V,E)  =V Irreducible  G is connected Aperiodic  G is not bipartite 21

The Stationary Distribution Claim: If G is connected and not bipartite, then the probability distribution induced by the random walk on it converges to  (x)=d(x)/ Σ x d(x). Proof: detailed balance condition holds. Σ x d(x)=2|E| 22

PageRank: Random Walk Over The Web If a user starts at a random web page and surfs by clicking links and randomly entering new URLs, what is the probability that s/he will arrive at a given page? The PageRank of a page captures this notion  More “popular” or “worthwhile” pages get a higher rank  This gives a rule for random walk on The Web graph (a directed graph). 23

PageRank: Example 24

PageRank: Formula Given page A, and pages T 1 through T n linking to A, PageRank of A is defined as: PR(A) = (1-d) + d (PR(T 1 )/C(T 1 ) PR(T n )/C(T n )) C(P) is the out-degree of page P d is the “random URL” factor (≈0.85) This is the stationary distribution of the Markov chain for the random walk. 25

T1 PR=0.5 T2 PR=0.3 T3 PR=0.1 A PR(A)=(1-d) + d*(PR(T1)/C(T1) + PR(T2)/C(T2) + PR(T3)/C(T3)) = *(0.5/ /4+ 0.1/5)

PageRank: Intuition & Computation Each page distributes its PR i to all pages it links to. Linkees add up their awarded rank fragments to find their PR i+1. d is the “random jump factor” Can be calculated iteratively : PR i+1 is computed based on PR i. PR i+1 (A)= (1-d) + d ( PR i (T 1 )/C(T 1 ) PR i (T n )/C(T n )) 27