Markov chains – Part 2 Prof. Noah Snavely CS1114

Slides:



Advertisements
Similar presentations
Markov Models.
Advertisements

Authorship recognition Prof. Noah Snavely CS1114
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Pagerank CS2HS Workshop. Google Google’s Pagerank algorithm is a marvel in terms of its effectiveness and simplicity. The first company whose initial.
The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
Information Networks Link Analysis Ranking Lecture 8.
Link Analysis: PageRank
. Markov Chains as a Learning Tool. 2 Weather: raining today40% rain tomorrow 60% no rain tomorrow not raining today20% rain tomorrow 80% no rain tomorrow.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
What is the probability that the great-grandchild of middle class parents will be middle class? Markov chains can be used to answer these types of problems.
Experiments with MATLAB Experiments with MATLAB Google PageRank Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University, Taiwan
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Markov chains Prof. Noah Snavely CS1114
Stochastic Processes Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
Image transformations, Part 2 Prof. Noah Snavely CS1114
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
Blobs and Graphs Prof. Noah Snavely CS1114
Markov Models. Markov Chain A sequence of states: X 1, X 2, X 3, … Usually over time The transition from X t-1 to X t depends only on X t-1 (Markov Property).
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
Clustering and greedy algorithms Prof. Noah Snavely CS1114
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.
Robust fitting Prof. Noah Snavely CS1114
The effect of New Links on Google Pagerank By Hui Xie Apr, 07.
Presented By: - Chandrika B N
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
Piyush Kumar (Lecture 2: PageRank) Welcome to COT5405.
Methods of Computing the PageRank Vector Tom Mangan.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
1 Random Walks on Graphs: An Overview Purnamrita Sarkar, CMU Shortened and modified by Longin Jan Latecki.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.
CompSci 100E 3.1 Random Walks “A drunk man wil l find his way home, but a drunk bird may get lost forever”  – Shizuo Kakutani Suppose you proceed randomly.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
Ranking Link-based Ranking (2° generation) Reading 21.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
1 CS 430: Information Discovery Lecture 5 Ranking.
CompSci 100E 4.1 Google’s PageRank web site xxx web site yyyy web site a b c d e f g web site pdq pdq.. web site yyyy web site a b c d e f g web site xxx.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation of.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
PageRank Google : its search listings always seemed deliver the “good stuff” up front. 1 2 Part of the magic behind it is its PageRank Algorithm PageRank™
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
The PageRank Citation Ranking: Bringing Order to the Web
CS 290N / 219: Sparse Matrix Algorithms
Search Engines and Link Analysis on the Web
Lecture #11 PageRank (II)
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
DTMC Applications Ranking Web Pages & Slotted ALOHA
Iterative Aggregation Disaggregation
CS 188: Artificial Intelligence Spring 2007
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Junghoo “John” Cho UCLA
Sequences II Prof. Noah Snavely CS1114
Presentation transcript:

Markov chains – Part 2 Prof. Noah Snavely CS1114

Administrivia  Guest lecture on Thursday, Prof. Charles Van Loan  Assignments: –A5P2 due on Friday by 5pm –A6 will be out on Friday  Quiz next Thursday, 4/23 2

Administrivia  Final projects –Due on Friday, May 8 (tentative) –You can work in groups of up to 3 –Please form groups and send me a proposal for your final project by Wednesday Not graded, but required 3

Least squares 4 Model: Objective function:

Sorting and convex hull  Suppose we had an O(n) algorithm for sorting  What does this tell us about the problem of computing a convex hull? 5

Incremental convex hull  Starting from a triangle, we add n points, one at a time  Given a convex hull with k points assume: –It takes k operations to test if a new point is inside the convex hull –It takes k operations to add a new point to the convex hull –Our convex hull never shrinks (in terms of number of points) 6

Incremental convex hull  How long does it take if: –All of the points are inside the original triangle? Answer: … (n times) … + 3 = O(n) –None of the points are inside the prev. conv. hull? Answer: … (n times) … + 2n = O(n 2 ) –The first 10% of points are added, last 90% aren’t Answer: … (0.1n times) … + 0.2n + 0.1n + … (0.9n times) … + 0.1n = O(n 2 ) –n – sqrt(n) are not added, then sqrt(n) are Answer: … (n – sqrt(n) times) … … (sqrt(n) times) … + 2sqrt(n) = O(n) 7

Computing nearest neighbors  Suppose we have two sets of (d-dimensional) vectors, A and B A has m vectors, B has n vectors  For each vector in A, we want to find the closest vector in B  How do we do this quickly?  The straightforward way (nested for loops) is horribly slow  You learned a better way in section  You’ll learn another one tomorrow 8

Markov chains  Example: Springtime in Ithaca  We can represent this as a kind of graph  (N = Nice, S = Snowy, R = Rainy) 9 N N R R S S NRS N R S Transition probabilities

Markov chains  What’s the weather in 20 days?  Almost completely independent of the weather today  The row [ ] is called the stationary distribution of the Markov chain  Over a long time, we expect 20% of days to be nice, 44% to be rainy, and 36% to be snowy 10

Stationary distributions  Constant columns are mapped to themselves:  So why do we converge to these particular columns? 11

Stationary distributions  The vector [ ]’ is unchanged by multiplication by P :  The vector [ ]’ is unchanged by multiplication by P T :  Where have we seen this before? 12

Stationary distributions  [ ]’ is called an eigenvector of P –a vector x such that Px = x –(in linear algebra, you’ll learn that the exact definition is a vector x such that Px = λx)  [ ]’ is an eigenvector of P’ –P’x = x  If we look at a long sequence, the proportion of days we expect to be nice/rainy/snowy  You won’t have to remember this (at least for this class) 13

Markov Chain Example: Text “A dog is a man’s best friend. It’s a dog eat dog world out there.” 2/31/ a dog is man’s best friend it’s eat world out there dog isman’sbestfriendit’seat world outthere a.. (Slide credit: Steve Seitz)

Text synthesis  “It’s a dog is a man’s best friend. It’s a dog eat dog eat dog is a dog”  This is called a random walk  Questions about random walks: –What’s the probability of being at the word dog after generating n words? –How long does it take (on average) to generate all words at least once?  These have important applications 15

Google’s PageRank 16 Page, Lawrence; Brin, Sergey; Motwani, Rajeev and Winograd, Terry (1999). The PageRank citation ranking: Bringing order to the WebThe PageRank citation ranking: Bringing order to the Web. See also: J. Kleinberg. Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998.Authoritative sources in a hyperlinked environment.

Google’s PageRank 17 A A B B C C D D E E F F H H I I J J G G Graph of the Internet (pages and links)

Google’s PageRank 18 A A B B C C D D E E F F H H I I J J G G Start at a random page, take a random walk. Where do we end up?

Google’s PageRank 19 A A B B C C D D E E F F H H I I J J G G Add 15% probability of moving to a random page. Now where do we end up?

Google’s PageRank 20 A A B B C C D D E E F F H H I I J J G G PageRank(P) = Probability that a long random walk ends at node P

(The ranks are an eigenvector of the transition matrix) 21

Back to text  We can use Markov chains to generate new text  Can they also help us recognize text? –In particular, the author? –Who wrote this paragraph? “Suppose that communal kitchen years to come perhaps. All trotting down with porringers and tommycans to be filled. Devour contents in the street. John Howard Parnell example the provost of Trinity every mother's son don't talk of your provosts and provost of Trinity women and children cabmen priests parsons fieldmarshals archbishops.” 22

Author recognition  We can use Markov chains to generate new text  Can they also help us recognize text? –How about this one? „Diess Alles schaute Zarathustra mit grosser Verwunderung; dann prüfte er jeden Einzelnen seiner Gäste mit leutseliger Neugierde, las ihre Seelen ab und wunderte sich von Neuem. Inzwischen hatten sich die Versammelten von ihren Sitzen erhoben und warteten mit Ehrfurcht, dass Zarathustra reden werde.“ 23

The Federalist Papers  85 articles addressed to New York State, arguing for ratification of the Constitution ( )  Written by “Publius” (?)  Really written by three different authors: –John Jay, James Madison, Andrew Hamilton  Who wrote which one? –73 have known authors, 12 are in dispute 24

Can we help?  Suppose we want to know who wrote a given paragraph/page/article of text  Idea: for each suspected author: –Download all of their works –Compute the transition matrix –Find out which author’s transition matrix is the best fit to the paragraph 25

Author recognition  Simple problem: Given two Markov chains, say Austen (A) and Dickens (D), and a string s (with n words), how do we decide whether A or D wrote s?  Idea: For both A and D, compute the probability that a random walk of length n generates s 26

Probability of a sequence  What is the probability of a given n-length sequence s? s = s 1 s 2 s 3 … s n  Probability of generating s = the product of transition probabilities: 27 Probability that a sequence starts with s 1 Transition probabilities

Likelihood  Compute this probability for A and D 28 Jane Austen wrote s Charles Dickens wrote s ??? “likelihood” of A “likelihood” of D

2/31/ a dog is man’s best friend it’s eat world out there dog isman’sbestfriendit’seat world outthere a.. Pr( “is a dog is a dog”) = ? Pr( “is dog man’s best friend”) = ?

Next time  Section: more information on how to actually compute this  Guest lecture by Charles Van Loan 30