EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam, Damn Spam and Statistics ) Spam, Damn Spam and Statistics.

Slides:



Advertisements
Similar presentations
The Structure of the Web Mark Levene (Follow the links to learn more!)
Advertisements

Analysis and Modeling of Social Networks Foudalis Ilias.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Discrete Random Variables
Indian Statistical Institute Kolkata
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining.
Information Retrieval Lecture 8 Introduction to Information Retrieval (Manning et al. 2007) Chapter 19 For the MSc Computer Science Programme Dell Zhang.
Power Laws: Rich-Get-Richer Phenomena
Sampling distributions. Example Take random sample of 1 hour periods in an ER. Ask “how many patients arrived in that one hour period ?” Calculate statistic,
Web Graph Characteristics Kira Radinsky All of the following slides are courtesy of Ronny Lempel (Yahoo!)
CS728 Lecture 5 Generative Graph Models and the Web.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
CS246 Search Engine Bias. Junghoo "John" Cho (UCLA Computer Science)2 Motivation “If you are not indexed by Google, you do not exist on the Web” --- news.com.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
How to Analyse Social Network? : Part 2 Power Laws and Rich-Get-Richer Phenomena Thank you for all referred contexts and figures.
Control Charts for Attributes
Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 6 Sampling and Sampling.
Peer-to-Peer and Social Networks Random Graphs. Random graphs E RDÖS -R ENYI MODEL One of several models … Presents a theory of how social webs are formed.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Statistical Analysis & Techniques Ali Alkhafaji & Brian Grey.
Web Characterization: What Does the Web Look Like?
Introduction to Data Analysis Probability Distributions.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
4.5 Comparing Discrete Probability Distributions.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
COLOR TEST COLOR TEST. Social Networks: Structure and Impact N ICOLE I MMORLICA, N ORTHWESTERN U.
Aditya Akella The Performance Benefits of Multihoming Aditya Akella CMU With Bruce Maggs, Srini Seshan, Anees Shaikh and Ramesh Sitaraman.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 6 Section 1 – Slide 1 of 34 Chapter 6 Section 1 Discrete Random Variables.
1 Statistical Properties for Text Rong Jin. 2 Statistical Properties of Text  How is the frequency of different words distributed?  How fast does vocabulary.
Copyright © 2012 by Nelson Education Limited. Chapter 6 Estimation Procedures 6-1.
(with an application of Web Spam detection) CS315-Web Search and Mining Power Laws and Rich-Get-Richer Phenomena.
Statistics 300: Elementary Statistics Sections 7-2, 7-3, 7-4, 7-5.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
Upward Bound Statistics
Ch4: 4.3The Normal distribution 4.4The Exponential Distribution.
KAIST TS & IS Lab. CS710 Know your Neighbors: Web Spam Detection using the Web Topology SIGIR 2007, Carlos Castillo et al., Yahoo! 이 승 민.
(with an application of Web Spam detection) CS315-Web Search and Mining Power Laws and Rich-Get-Richer Phenomena.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
Models of Web-Like Graphs: Integrated Approach
Construction Engineering 221 Probability and statistics Normal Distribution.
Generative Model To Construct Blog and Post Networks In Blogosphere
Counting How Many Elements Computing “Moments”
We want to analyze the general notion of popularity.
Sampling Distribution of the Sample Mean
The likelihood of linking to a popular website is higher
Sampling Distribution
Sampling Distribution
Peer-to-Peer and Social Networks Fall 2017
Network Science: A Short Introduction i3 Workshop
Detecting Phrase-Level Duplication on the World Wide Web
Peer-to-Peer and Social Networks
INT 1 UNIT 2 Revision You can also log on to the SQA website and download past papers. Remember we are studying INT 1 Applications. Use the following.
Common Core Vs Kansas Standards
Statistics and Probability-Part 2
Diffusion in Networks
Welcome to Math Mrs. Manning Room 201.
Presentation transcript:

EK Ch 17: Power laws and rich-get-richer phenomena (with an application of Web Spam detection Spam, Damn Spam and Statistics ) Spam, Damn Spam and Statistics

Numbers Your grades so far in this class. The weight of an apple. The temperature in Chicago on July 4 th. The height of a Dutch man. The speed of a car on I-90. Most instances are typical. Seeing a rare number is very surprising. These numbers are well-characterized by the average and the standard deviation.

City populations 1. New York8,310, Los Angeles 3,834, Chicago2,836, Cambridge, MA 101, Gainesville, FL 95, McKinney, TX 54,369 A few cities with high population Many cities with low population

City populations

Power Law: Fraction f(k) of items with popularity k is proportional to k -c. f(k) k -c log [f(k)] log [k -c ] log [f(k)] -c log [k]

City populations

Number of Web page in-links (Broder+)

Other examples

Length of the URL’s host

Number of host name resolutions to a single IP

Web page out-degrees

Web page in-degrees

Word count variance

Content evolution

Cluster size

… because they care to know ;-)

Why does data exhibit power laws? ImitationPower law

Constructing the web 1. Pages are created in order, named 1, 2, …, N 2. When created, page j links to a page by a) With probability p, picking a page i uniformly at random from 1, …, j-1 b) With probability (1-p), pick page i uniformly at random and link to the page that i links too Imitation

The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too 1/43/4

The rich get richer 2 b) With prob. (1-p), pick page i uniformly at random and link to the page that i links too Equivalently, 2 b)With prob. (1-p), pick a page proportional to its in- degree and link to it

Food for thought Why is Harry Potter popular? If we could re-play history, would we still read Harry Potter, or would it be some other book?

Information cascades and the rich Information cascade = so some people get a little bit richer by chance and then rich-get-richer dynamics = the random rich people get a lot richer very fast

Music download site – 8 worlds 1.“Let’s go driving,” Barzin 2.“Silence is sexy,” Einstu ̈ rzende Neubauten 3.“Go it alone,” Noonday Underground 10.“Picadilly Lilly,” Tiger Lillies 1.“Let’s go driving,” Barzin 2.“Silence is sexy,” Einstu ̈ rzende Neubauten 3.“Go it alone,” Noonday Underground 10.“Picadilly Lilly,” Tiger Lillies