Social Networking Algorithms related sections to read in Networked Life: 2.1,2.3 3.1 4.1 5.1 6.1-6.2 8.1 9.1.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Voting and social choice Vincent Conitzer
Our purpose Giving a query on the Web, how can we find the most authoritative (relevant) pages?
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
“ The Anatomy of a Large-Scale Hypertextual Web Search Engine ” Presented by Ahmed Khaled Al-Shantout ICS
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
CPS Voting and social choice
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Authoritative Sources in a Hyperlinked Environment Jon M. Kleinberg Presented By: Talin Kevorkian Summer June
 Manmatha MetaSearch R. Manmatha, Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst.
Link Structure and Web Mining Shuying Wang
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
Recommender systems Ram Akella November 26 th 2008.
Link Analysis HITS Algorithm PageRank Algorithm.
Overview of Web Data Mining and Applications Part I
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
The Further Mathematics network
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Performance of Recommender Algorithms on Top-N Recommendation Tasks
1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
Redeeming Relevance for Subject Search in Citation Indexes Shannon Bradshaw The University of Iowa
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Social Networking Algorithms related sections to read in Networked Life: 2.1,
Author(s): Rahul Sami and Paul Resnick, 2009 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution.
Presented by, Lokesh Chikkakempanna Authoritative Sources in a Hyperlinked environment.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Chapter 6: Information Retrieval and Web Search
Collaborative Information Retrieval - Collaborative Filtering systems - Recommender systems - Information Filtering Why do we need CIR? - IR system augmentation.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Mechanism Design Ruta Mehta. Game design (not video games!) to achieve a desired goal, like fairness, social welfare maximization, etc.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Recommendation Systems By: Bryan Powell, Neil Kumar, Manjap Singh.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
Presented By: Madiha Saleem Sunniya Rizvi.  Collaborative filtering is a technique used by recommender systems to combine different users' opinions and.
CS 440 Database Management Systems Web Data Management 1.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
Matrix Factorization and Collaborative Filtering
CS728 The Collaboration Graph
The PageRank Citation Ranking: Bringing Order to the Web
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2017 Lecture 7: Information Retrieval II Aidan Hogan
Machine Learning With Python Sreejith.S Jaganadh.G.
1.3 The Borda Count Method.
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
A Comparative Study of Link Analysis Algorithms
Q4 : How does Netflix recommend movies?
Ensembles.
Voting and social choice
Graph and Link Mining.
CPS Voting and social choice
Presentation transcript:

Social Networking Algorithms related sections to read in Networked Life: 2.1,

Google Search PageRank algorithm crawling (follow hyperlinks embedded in HTML) >50 billion pages indexed (2012) (not counting intranets) source: indexing assessing relevance: –number times keyword mentioned –proximity/order –title/heading, bold/fontsize –what makes a page “authoritative”? users only look at top 3-10 hits, so what gets ranked at the top is crucial

Inverted Index Document Collection (web pages): doc[0] = “all about the banana slug" doc[1] = “nutritional content of bananas" doc[2] = "bananas of the world“ doc[3] = “nutrition for athletes” all0 about0 athlete(s)3 banana(s)0,1,2 content1 for3 nutrition/al1,3 of1,2 the0,2 slug0 world2 query: “banana nutrition” {0,1,2}∩{1,3}={1} document retrieval –intersection of search terms –what about spelling errors, stemming, synonyms, semantic relationships? –more complex Boolean queries (or, not) computation distributed over many computers using MapReduce –programming functions to distribute tasks and assemble results

the web-graph G=(V,E) –hyperlinks = directed edges –strongly connected components –adjacency matrix (sparse) which pages are important? –number of connections (degree, centrality)? –number of in-edges (mentions/references)? Joe Student’s Home page. I am a student at Texas A&M I write code in Java Texas A&M Java java.sun.com Bowling League Members... Joe...

PageRank need trust/reputation models? “importance” of a node x i is based on: –importance neighbors who link to you (x J ) –weights 1/d j distribute a node’s importance over the nodes it links to –modify the equations to handle unlinked pages xjxj xixi

system of coupled equations –iterative solutions –algorithms that start with random importances and adjust them until all the x i ’s are mutually consistent (convergence) in matrix form, this becomes an eigenvalue problem (hard to calculate) –x is a vector of importances –H is the weighted adjacency matrix x = Hx x1=0.128 x2=0.159 x3=0.202 x4=0.150 x5=0.106 x6=0.044 x7=0.060 x8=0.145

The Network Effect Metcalfe's law - the value of a telecommunications network is proportional to the square of the number of connected users of the system (n 2 ) going viral (videos and memes) –if you tell two friends, and they each tell 2 friends...it exponentially scales up to thousands of people in just a few steps Small Worlds phenomenon –social networks not same as physical network –also scale-free topology (Power Law) –6 degrees-of-separation (Milgram); community structure crowd-sourcing – is there value in the aggregate opinion? –combines multiple experts (as well as boneheads and malefactors) –filters out bias of a few extreme opinions (since you don’t know who to trust)

Recommender Systems Netflix, Pandora –how can we benefit from evaluations of others? long-tail distribution for media –there are MANY movies, songs, etc. –most are rarely listened to –yet each individual has eclectic tastes –if a person likes X and Y, how to predict other Z? similarity (collaborative filtering) –not just intersection of common features of X and Y –exploit what other people with similar tastes like –each user makes sparse recommendations –merge, and extract correlations; latent factors?

Machine Learning –other people who have watched movies with Ron Perlman tend to also like... –given a set of recommendations of users u for movies i: {(u,i)} or {r ui }, build a predictive model –accuracy: –Netflix Prize around 100 million anonymous ratings released as training set ( ), 480k users, 17k movies 2009: the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%.

Aggregating Ratings reviews on Amazon, TripAdvisor, Rotten Tomatoes (movies)... trust, reputation, shills –weight each reviewer by consistency? wisdom of the crowd –Galton’s experiment (1906), guessing the weight of an ox –subjectivity of hotel recommendations can you trust the average weighting? –also depends on number of reviews, and dispersion (do # of 1’s matter?) rat- ing item A item B item C ave:

Auctions examples: –Ebay –Google ad space (companies bid on search terms, position on page) –broadcasting spectrum (airwaves, FCC) efficient, decentralized mechanism for resource allocation among many parties (exploit market forces) goals: –maximize value for auctioneer –minimize cost for buyers; make bidding simple, not strategic –fairness, free of manipulation utility functions (values to self-interested agents)

Auctions types of auction mechanisms –public (open-outcry) vs. sealed-bid –ascending vs. descending –first-price vs. second-price Vickrey (second-price, sealed-bid) auction –no incentive to under- or over-bid –no winner’s remorse –can show this is a Nash equilibrium strategy current research: combinatorial auctions –bids for multiple items coupled together –algorithms for winner determination? (NP-hard)

Electronic Voting Rank Aggregation –a social choice mechanism –unlike the US system, imagine you can vote for N candidates by ranking them in order of preference –other applications: vote for Olympics venues or baseball all-stars out a defined list of possibilities candidatevoter 1voter 2vote 3voter 4 A1112 B3231 C2323

Another example: Meta-search –merging search-engine results –Cynthia Dwork (WWW, 2001) –by merging top hits from google, bing, yahoo, altaVista, etc., could you get a better combined list? –search results are usually sparse – a given page might not be on every list of results –how should you rank page ranked 2 nd, 3 rd, and 101 st ? –what if one of the engines is paid to rank certain sites highly? (web-search “spam”)

among the many possible orderings (A<B<C, B<A<C...) is there a final ranking that is “most similar” to the most voters (representative)? the Borda count –add up the voted ranks as weights –pros: sample, anonymous, neutral, consistent –cons: can be influenced by extreme votes that drag good candidates down candidatevoter 1voter 2vote 3voter 4Borda count A11125 B C23218

Condorcet alternative: the candidate that beats all others in pairwise comparisons –in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q candidatevoter 1voter 2vote 3Borda count P1146 Q2215 R3328 S44311

Condorcet alternative: the candidate that beats all others in pairwise comparisons –in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q candidatevoter 1voter 2vote 3Borda count P1146 Q2215 R3328 S44311 P vs. Q: 2/3 prefer P P vs. R: 2/3 prefer P P vs. S: 2/3 prefer P Q vs. R: 3/3 prefer Q Q vs. S: 3/3 prefer P R vs. S: 3/3 prefer P P Q R S

Condorcet alternative: the candidate that beats all others in pairwise comparisons –in this example, candidate Q wins based on Borda count, even though the majority of voters preferred P over Q candidatevoter 1voter 2vote 3Borda count P1146 Q2215 R3328 S44311 generalization: Condorcet criterion –for each pair of candidates A and B, A must be ranked over B if the majority prefer A over B –Dwork showed there is a polynomial-time algorithm based on computing “locally Kemeney-optimal” rankings

Electronic Voting complex (weighted) votes of preferences for multiple outcomes example voting on funding of public projects to maximize public welfare avoid the “free-rider” syndrome “VCG” mechanism: penalize the winner by charging a tax based on how much he influenced result over alternative outcomes encourages voters to vote their true beliefs ballot: a% new stadium b% new library c% fix roads d% hire new police 100%=1 vote

Summary The value of networks grows more than linearly (quadratically?) with the number of people participating. Algorithms like PageRank can identify “important” nodes in networks by analyzing connectivity (small-worlds topology). There is “wisdom” in crowds. Algorithms can aggregate preferences or rankings or ratings over multiple users to allow robust methods for determining combined/community opinion.