Social Networking Algorithms related sections to read in Networked Life: 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1.

Slides:



Advertisements
Similar presentations
Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Pagerank CS2HS Workshop. Google Google’s Pagerank algorithm is a marvel in terms of its effectiveness and simplicity. The first company whose initial.
Our purpose Giving a query on the Web, how can we find the most authoritative (relevant) pages?
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Authoritative Sources in a Hyperlinked Environment Hui Han CSE dept, PSU 10/15/01.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Zdravko Markov and Daniel T. Larose, Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage, Wiley, Slides for Chapter 1:
1 CS/INFO 430 Information Retrieval Lecture 17 Web Search 3.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Advances & Link Analysis
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
1 COMP4332 Web Data Thanks for Raymond Wong’s slides.
Link Analysis HITS Algorithm PageRank Algorithm.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
The Further Mathematics network
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Web Information Retrieval Projects Ida Mele. Rules Students can work in teams (max 3 people) The project must be delivered by the deadline that will be.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
ITIS 1210 Introduction to Web-Based Information Systems Internet Research Two How Search Engines Rank Pages & Constructing Complex Searches.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Social Networking Algorithms related sections to read in Networked Life: 2.1,
Using Hyperlink structure information for web search.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
A Graph-based Friend Recommendation System Using Genetic Algorithm
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Chapter 23: Probabilistic Language Models April 13, 2004.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
Ranking Link-based Ranking (2° generation) Reading 21.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Ljiljana Rajačić. Page Rank Web as a directed graph  Nodes: Web pages  Edges: Hyperlinks 2 / 25 Ljiljana Rajačić.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Search Engine Architecture
Quality of a search engine
HITS Hypertext-Induced Topic Selection
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
Boolean Retrieval Term Vocabulary and Posting Lists Web Search Basics
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Data Mining Chapter 6 Search Engines
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Link Mining.
COMP5331 Web databases Prepared by Raymond Wong
Presentation transcript:

Social Networking Algorithms related sections to read in Networked Life: 2.1,

The Network Effect Metcalfe's law - the value of a telecommunications network is proportional to the square of the number of connected users of the system (n 2 ) Facebook friends Twitter followers collective opinions on news/products/movies... videos or products or memes going “viral” –if you tell two friends, and they each tell 2 friends...it scales up exponentially to thousands of people in just a few steps

Small Worlds phenomenon social networks not same as physical network (because your friends can be remote) also a scale-free topology (power law/Long-tail distribution) 6 degrees-of-separation (Milgram’s exper.) community structure

Exploiting the Network Effect Ebay – price discovery through auctions Netflix - recommendations based on others’ preferences Reddit – reputation based on others’ opinions on your posts Crowd-sourcing –is there value in the aggregate opinion? –examples: ratings on Amazon or TripAdvisor or YouTube –combines multiple experts (as well as non-experts) –filters out bias of a few extreme opinions (since you don’t know who to trust)

Google Search PageRank algorithm crawling (follow hyperlinks embedded in HTML) >50 billion pages indexed (2012) (not counting intranets) source: indexing assessing relevance: –number times keyword mentioned –proximity/order –title/heading, bold/fontsize –what makes a page “authoritative”? users only look at top 3-10 hits, so what gets ranked at the top is crucial

Inverted Index Basic document retrieval –Build an index of all pages that contain each search term –For multi-word searches, like “functional programming languages”, take intersection of documents with each search term –Does it matter how many times a page mentions a search term? (does this reflect importance? No) –what about dealing with spelling errors, stemming, synonyms, semantic relationships? –more complex Boolean queries (or, not) How do you do this for 50 billion pages? –Google distributes computation over a cluster of computers using MapReduce –programming functions to distribute tasks and assemble results

Which search hits are most important? –having many Twitter followers does not make you an expert (populartity ≠ expertise) –similarly, lots of hyperlinks to a page does not mean it is authoritative The web-graph: G=(V,E) –hyperlinks = directed edges –strongly connected components –adjacency matrix (sparse) Joe Student’s Home page. I am a student at Texas A&M I write code in Java Texas A&M Java java.sun.com Bowling League Members... Joe

PageRank need trust/reputation models? “importance” of a node x i is based on: –importance neighbors who link to you (x J ) –weights 1/d j distribute a node’s importance over the nodes it links to –modify the equations to handle unlinked pages xjxj xixi

system of coupled equations –iterative solutions –algorithms that start with random importances and adjust them until all the x i ’s are mutually consistent (convergence) in matrix form, this becomes an eigenvalue problem (hard to calculate) –x is a vector of importances –H is the weighted adjacency matrix x = Hx x1=0.128 x2=0.159 x3=0.202 x4=0.150 x5=0.106 x6=0.044 x7=0.060 x8=0.145