More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013.

Slides:



Advertisements
Similar presentations
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Advertisements

Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
The math behind PageRank A detailed analysis of the mathematical aspects of PageRank Computational Mathematics class presentation Ravi S Sinha LIT lab,
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
The PageRank Citation Ranking “Bringing Order to the Web”
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
Course Review COMP171 Spring Hashing / Slide 2 Elementary Data Structures * Linked lists n Types: singular, doubly, circular n Operations: insert,
Heaps & Priority Queues Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Link Analysis, PageRank and Search Engines on the Web
Presented By: Wang Hao March 8 th, 2011 The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd.
Source: Muangsin / Weiss1 Priority Queue (Heap) A kind of queue Dequeue gets element with the highest priority Priority is based on a comparable value.
Journal Status* Using the PageRank Algorithm to Rank Journals * J. Bollen, M. Rodriguez, H. Van de Sompel Scientometrics, Volume 69, n3, pp , 2006.
Important Problem Types and Fundamental Data Structures
Google and the Page Rank Algorithm Székely Endre
0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
Google’s PageRank: The Math Behind the Search Engine Author:Rebecca S. Wills, 2006 Instructor: Dr. Yuan Presenter: Wayne.
Graph Algorithms Ch. 5 Lin and Dyer. Graphs Are everywhere Manifest in the flow of s Connections on social network Bus or flight routes Social graphs:
Heapsort Based off slides by: David Matuszek
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Priority Queues and Heaps Bryce Boe 2013/11/20 CS24, Fall 2013.
Google’s Billion Dollar Eigenvector Gerald Kruse, PhD. John ‘54 and Irene ‘58 Dale Professor of MA, CS and I T Interim Assistant Provost Juniata.
Data Structures Arrays both single and multiple dimensions Stacks Queues Trees Linked Lists.
CHAPTER 71 TREE. Binary Tree A binary tree T is a finite set of one or more nodes such that: (a) T is empty or (b) There is a specially designated node.
Compiled by: Dr. Mohammad Alhawarat BST, Priority Queue, Heaps - Heapsort CHAPTER 07.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
CSCA48 Course Summary.
WAES 3308 Numerical Methods for AI
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
ADT Table and Heap Ellen Walker CPSC 201 Data Structures Hiram College.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
Lecture #10 PageRank CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
How works M. Ram Murty, FRSC Queen’s Research Chair Queen’s University or How linear algebra powers the search engine.
Final Exam Review CS Total Points – 60 Points Writing Programs – 50 Points Tracing Algorithms, determining results, and drawing pictures – 50.
Chapter 16 – Data Structures and Recursion. Data Structures u Built-in –Array –struct u User developed –linked list –stack –queue –tree Lesson 16.1.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CPSC 252 Binary Heaps Page 1 Binary Heaps A complete binary tree is a binary tree that satisfies the following properties: - every level, except possibly.
Queues, Stacks and Heaps. Queue List structure using the FIFO process Nodes are removed form the front and added to the back ABDC FrontBack.
Heaps & Priority Queues
Lecture 8 : Priority Queue Bong-Soo Sohn Assistant Professor School of Computer Science and Engineering Chung-Ang University.
AVL Trees and Heaps. AVL Trees So far balancing the tree was done globally Basically every node was involved in the balance operation Tree balancing can.
1 CS 430: Information Discovery Lecture 5 Ranking.
1Computer Sciences. 2 HEAP SORT TUTORIAL 4 Objective O(n lg n) worst case like merge sort. Sorts in place like insertion sort. A heap can be stored as.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
CS 440 Database Management Systems Web Data Management 1.
A Sublinear Time Algorithm for PageRank Computations CHRISTIA N BORGS MICHAEL BRAUTBA R JENNIFER CHAYES SHANG- HUA TENG.
Nov 2, 2001CSE 373, Autumn Hash Table example marking deleted items + choice of table size.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Programming Abstractions
Search Engines and Link Analysis on the Web
Link-Based Ranking Seminar Social Media Mining University UC3M
PageRank and Markov Chains
Source: Muangsin / Weiss
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Graph Algorithms Ch. 5 Lin and Dyer.
CS 440 Database Management Systems
PageRank PAGE RANK (determines the importance of webpages based on link structure) Solves a complex system of score equations PageRank is a probability.
Important Problem Types and Fundamental Data Structures
Graph Algorithms Ch. 5 Lin and Dyer.
Heaps Chapter 6 Section 6.9.
Presentation transcript:

More Algorithms for Trees and Graphs Eric Roberts CS 106B March 11, 2013

Outline for Today The plan for today is to walk through some of my favorite tree and graph algorithms, partly to demystify the many real-world applications that make use of those algorithms and partly to emphasize the elegance and power of algorithmic thinking. These algorithms include: –Google’s Page Rank algorithm for searching the web –The Directed Acyclic Word Graph format used in Lexicon –The heap data structure used to create efficient priority queues

Google Larry Page and Sergey Brin The big innovation of the late 1990s is the development of search engines, which began with Alta Vista at DEC’s Western Research Lab and reaching its modern pinnacle with Google, which was founded by Stanford graduate students Larry Page and Sergey Brin in 1998.

Page Rank The heart of the Google search engine is the page rank algorithm, which was described in a 1999 by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web January 29, 1998 Abstract The importance of a Webpage is an inherently subjective matter, which depends on the reader’s interests, knowledge and attitudes. But there is still much that can be said objectively about the relative importance of Web pages. This paper describes PageRank, a method for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them. We compare PageRank to an idealized random Websurfer. We show how to efficiently compute PageRank for large numbers of pages. And,we show how to apply PageRank to search and to user navigation.

Page Rank Algorithm The page rank algorithm gives each page a rating of its importance, which is a recursively defined measure whereby a page becomes important if other important pages link to it. One way to think about page rank is to imagine a random surfer on the web, following links from page to page. The page rank of any page is roughly the probability that the random surfer will land on a particular page. Since more links go to the important pages, the surfer is more likely to end up there.

Markov Processes Google’s random surfer is an example of a Markov process, in which a system moves from state to state, based on probability information that shows the likelihood of moving from each state to every other possible state. A simple example of a Markov process is illustrated by this table, which shows the likelihood of a particular weather pattern for tomorrow given the weather for today. What, then, is the likely weather two days from now, given that you know what the weather looks like today? What if you then repeat the process for ten days? If today is That far out, it doesn’t matter what today’s weather is Tomorrow will beThe day after tomorrow will beTen days from now will be

Google’s Page Rank Algorithm

The Page Rank Algorithm AB D E C Start with a set of pages.1.

The Page Rank Algorithm AB D E C Crawl the web to determine the link structure.2.

The Page Rank Algorithm AB D E C Assign each page an initial rank of 1 / N

The Page Rank Algorithm Successively update the rank of each page by adding up the weight of every page that links to it divided by the number of links emanating from the referring page. 4. D E C 0.2 In the current example, page E has two incoming links, one from page C and one from page D. Page C contributes 1/3 of its current page rank to page E because E is one of three links from page C. Similarly, page D offers 1/2 of its rank to E. The new page rank for E is PR( E ) = PR( C )PR( D ) =  0.17

The Page Rank Algorithm If a page (such as E in the current example) has no outward links, redistribute its rank equally among the other pages in the graph. 5. D E C 0.2 In this graph, 1/4 of E ’s page rank is distributed to pages A, B, C, and D. The idea behind this model is that users will keep searching if they reach a dead end.

The Page Rank Algorithm AB D E C Apply this redistribution to every page in the graph

The Page Rank Algorithm AB D E C Repeat this process until the page ranks stabilize

The Page Rank Algorithm AB D E C In practice, the Page Rank algorithm adds a damping factor at each stage to model the fact that users stop searching

Page Rank Relies on Mathematics

Directed Acyclic Word Graphs The Lexicon class provides a highly time- and space-efficient representation of a word list. The data representation used in the Lexicon class is called a Directed Acyclic Word Graph or DAWG, which was first described in a 1988 paper by Appel and Jacobson. The DAWG structure is based on a much older data structure called a trie, developed by Ed Fredkin in (The trie data structure is described in the text on page 490.)

The Trie Data Structure The trie representation of a word list uses a tree in which each arc is labeled with a letter of the alphabet. In a trie, the words themselves are represented implicitly as paths to a node.

Going to the DAWGs The new insight in the DAWG is that you can combine nodes that represent common endings. As the authors note, “this minimization produces an amazing savings in space; the number of nodes is reduced from 117,150 to 19,853.”

The Heap Algorithm If you implement them in the obvious way using either arrays or linked lists, priority queues require O(N) time. Because priority queues are essential to both Dijkstra’s and Kruskal’s algorithms, making them efficient improves performance. The standard algorithm for implementing priority queues uses a data structure called a heap, which makes it possible to implement priority queue operations in O(log N) time. A heap is a binary tree with three additional properties: –The tree is complete, which means that it is not only completely balanced but that each level of the tree is filled as far to the left as possible. –The root node of the tree has higher priority than the root of either of its subtrees. –Every subtree is also a heap.

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order:

Exercise: Tracing the Heap Algorithm Insert in order: Dequeue the top priority element11

Exercise: Tracing the Heap Algorithm Insert in order: Dequeue the top priority element11

The End

Exercise: Tracing the Heap Algorithm Insert in order: Dequeue 11