Download presentation
Presentation is loading. Please wait.
Published byGeorge Robbins Modified over 9 years ago
1
PageRank
2
Un Motor de Búsqueda
3
“obama”
4
PageRank Model: Final Version The Web: a directed graph Vertices (pages) Edges (links) fa eb dc
5
Input Structure 41.5 million edges 5.4 million nodes document-with-linkdocument-linked
6
Step 1. Dictionary Encode Links Strings difficult to fit in memory Encode strings as OIDs (object ids = integers) Input line: http://es.dbpedia.org/resource/Ciencia_ficción http://es.dbpedia.org/resource/Robot Output line: 1203952673 Dictionary: 12039http://es.dbpedia.org/resource/Ciencia_ficción … 52673http://es.dbpedia.org/resource/Robot … OIDCompress -i [folder]/page_links_es.tsv.gz -igz -o [folder]/page_links_es.oid.gz -ogz -d [folder]/page_links_es.dict.gz -dgz
7
Step 2. Write PageRank Algorithm PageRankGraph.rankGraph(int[][] graph) int[] out = graph[i]; – out contains the nodes linked from node i – it might be empty or null if node i doesn’t link to anything! two rank vectors: rank[graph.length], nextRank[graph.length] initial rank values set as 1d / graph.length run ITERS number of iterations – compute edge-invariant rank once per iteration (red and blue) need to keep track of sum of ranks of nodes with no outlinks from prev. round – for each node (orange) split it’s rank[] by the number of outlinks it has, and add the result to the nextRank[] of each node it links to – the sum of the ranks after each round should be very very close to 1 test on –i data/test-graph.txt –o data/test-data.txt
8
Step 3. Rank full data Run ranking -i [folder]/page_links_es.oid.gz -igz -o [folder]/page_ranks_es.oid.gz –ogz Sort by rank -i [folder]/page_ranks_es.oid.gz -igz -o [folder]/page_ranks_es_s.oid.gz –ogz Decompress the file -d [folder]/page_links_es.dict.gz -dgz -i [folder]/page_ranks_es_s.oid.gz -igz -n 0 -o [folder]/page_ranks_es_s.tsv.gz -ogz
9
Course Marking 45% for Weekly Labs (~3% a lab!) 35% for Final Exam 20% for Small Class Project
10
Class Project Done in pairs (Except Alejandro :P) Goal: Use what you’ve learned to do something cool (basically) Expected difficulty: More than a lab’s worth – But from scratch / without my help! Marked on: Difficulty, appropriateness, scale, good use of techniques, presentation, coolness – Ambition is appreciated, even if you don’t succeed: feel free to bite off more than you can chew! Process: – Pair up (default random) by Wednesday – Decide on a topic (by June 9 th ) or let me assign one – If you need data or get stuck, I will (try to) help out Deliverables: 10 minute presentation (June 23 rd ) & 4-page report
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.