Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences


Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Power Laws: Rich-Get-Richer Phenomena
Anatomy of the Long Tail: Ordinary People with Extraordinary Tastes Presented by Maria Avraleva To Prof. Dr. Eduard Heindl.
Chapter 14 with Duane Weaver
Evaluating Search Engine
1. Estimation ESTIMATION.
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
Behavioural Science II Week 1, Semester 2, 2002
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Statistics & Modeling By Yan Gao. Terms of measured data Terms used in describing data –For example: “mean of a dataset” –An objectively measurable quantity.
PRESENTED BY: ADITI BHATNAGAR Anatomy of Long tail: Ordinary people with Extraordinary tastes UNDER THE GUIDANCE OF: AUGUSTIN CHAINTREAU.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Recommender Systems; Social Information Filtering.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
Recommender systems Ram Akella November 26 th 2008.
Next = click UvA Catalogue Combine search terms. Boolean operators for combining search terms AND: all terms must occur e.g. computer AND education (restrict)
Demand and Supply. Demand  Consumers influence the price of goods in a market economy.  Demand : the amount of a good or service that consumers are.
CS246 Link-Based Ranking. Problems of TFIDF Vector  Works well on small controlled corpus, but not on the Web  Top result for “American Airlines” query:
OCR A2 MEDIA.  Online media… What is it?  Online media… where did it begin?  Example of online production  Example of online distribution  Example.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Search Engines: Information Retrieval in Practice,
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Copyright © 2011 Pearson Education CHAPTER 9. Copyright © 2011 Pearson Education  Successful companies embrace the Internet as a mechanism for transforming.
Emerging Topic Detection on Twitter (Cataldi et al., MDMKDD 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences
Retail Site Location Factors that affect the choice of suitable retail site location: Economic conditions Strategic fit Competition Operating costs.
Student Engagement Survey Results and Analysis June 2011.
Anindya Ghose Sha Yang Stern School of Business New York University An Empirical Analysis of Sponsored Search Performance in Search Engine Advertising.
Kristina Lerman Aram Galstyan USC Information Sciences Institute Analysis of Social Voting Patterns on Digg.
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
User Study Evaluation Human-Computer Interaction.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
Online Shopping.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
1 Statistical Properties for Text Rong Jin. 2 Statistical Properties of Text  How is the frequency of different words distributed?  How fast does vocabulary.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 17 l Chi-Squared Analysis: Testing for Patterns in Qualitative Data.
Ch 14. Link Analysis Padmini Srinivasan Computer Science Department
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Chapter 4 Notes Week of September 14, Chapter 4 Section 1 Notes Demand is a combination of desire, ability, and willingness to buy a product. Demand.
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
A Connectivity-Based Popularity Prediction Approach for Social Networks Huangmao Quan, Ana Milicic, Slobodan Vucetic, and Jie Wu Department of Computer.
Selling on Using Print on Demand ( 按需印刷 ) and Amazon Advantage to Reach US Book Buyers By Kurt Beidler 白驹逸 This report contains non-public information.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Search Engine Optimization
Recommender Systems & Collaborative Filtering
Online Retailing The consumer is not primarily price-driven when shopping on the Internet but instead considers brand name, trust, reliability, delivery.
Chapter 7 Demand and Supply.
Social media use by retailers & Consumers; Adoption & Success factors
Search and the New Economy Long Tail
Understanding Results
Introduction to Web Mining
Marketing Your Food Product
Movie Recommendation System
Bricks and Mortar analytics
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1
Introduction to Web Mining
CS 345A Data Mining Lecture 1
Presentation transcript:

Ch 5 + Anatomy of the Long Tail (Goel et al., WSDM 2010) Padmini Srinivasan Computer Science Department Department of Management Sciences

Compression (Ch 5)


Zipf’s law

Broder et al. Graph Structure of the Web Note that the exponent is different. Note also the deviation In the low end of the out-degree. Probability page has in-degree k = 1/k 2 Actual exponent slightly larger than 2.

Infinite-inventory retailers Amazon, Netflix, iTunes music store, Long tail markets Items not in brick and mortar stores: – 30% sales – 25% Netflix Success because of long tail markets. Two different hypotheses – Majority prefer popular and minority prefer niche items – Everyone likes some popular and some niche items Different impact on inventory control. If keeping mainstream items: – Satisfy most people nearly all the time – Irritate most people at least some of the time Knowing which model works/fits/explains behaviour better is important

Infinite-inventory retailers Two different hypotheses – Majority prefer popular and minority prefer niche items – Everyone likes some popular and some niche items Different impact on inventory control. If keeping mainstream items: – Satisfy most people nearly all the time – Irritate most people at least some of the time Knowing which model works better is important Their work supports the second hypothesis. Also availability of tail items may boost sale of ‘head’ items ~ one- stop shopping convenience Not just the direct impact on revenue: second-order gains: customer satisfaction.

Datasets examined Web queries: stemming Urls: restricted to domains (click search data) Browsing: Nielsen data (domains) Data trimming done

Long Tail What is it? – A relatively small number of items accounts for large number of consumptions – old 80 – 20 rule. – Definition: popularity: fraction of total consumption fulfilled by an item. Eg. fraction of checkouts associated with a particular book. – Popularity of a movie: total times rated/total number of ratings

Two Long-Tail Graphs Netflix & Yahoo! Music Typical inventory: 3000 (netflix) 50,000 (Yahoo! Music) Web search: 10 web sites > over 15% page views Top 10,000 web sites leaves 20% unaccounted.

More Long Tail Graphs

Eccentric Tastes? An inventory: k-ranked (most popular) items Definition User is p-percent satisfied if at least p percent of consumption is in the k-ranked set. Analysis: What percent of users are p-percent satisfied? Netflix (k = 3000) only 11% of users are 100% satisfied; 63% are 90% satisfied Yahoo! Music (k=50,000), only 5% users 100% satisfied; 32% are 90% satisfied With brick and mortar almost none of the users completely satisfied.

Eccentric Tastes? Netflix & Yahoo! music Upper: 90% satisfaction; lower: 100 % satisfaction

Ratings versus Popularity The more obscure the less appreciated an item. So the more aware the more appreciated? – Studied with movies and music. – Relationship between popularity (rank) and rating Value of tail over emphasized because there is disproportionate dissatisfaction or satisfaction. – Tail end less dissatisfaction/satisfaction?

Ratings versus Popularity Pattern present Netflix but not in music dataset. (more obscure songs get even higher ratings).

Ratings versus Popularity Tail end less dissatisfaction/satisfaction? (users disproportionately dissatisfied with tail end) 85% netflix users and 91% yahoo! Music users rated an item outside physical stores. (original 89% & 95% resp.) So can’t dismiss the long tail ends Even typical users have a need for tail end items 32% Netflix users, 56% of Yahoo! Music users had at least 10% items rated high in the tail

Null Hypothesis model Random model – Each user decides how many items to consume (consistent with the empirical data. Fix number of users, number of items, and number selected/viewed/clicked/rated by users). – Item selection by user also random but constrained to be according to popularity and without replacement. – What are the limitations in this null model?

Null Model Netflix & Yahoo! music Upper: 90% satisfaction; lower: 100 % satisfaction Null models: users are much harder to satisfy. Eg: only 14% of users in null model are 90% satisfied compared to 64% (movies) with k=3000.

Implications? Though most users consume tail content part of the time – Sizeable fraction of users prefer head over tail content that goes beyond the draw of popularity. – To compensate other users draw disproportionately from the tail.

Consumption patterns: Users vs Popularity

Some patterns By moving from k = 3000 to 3500 movies, cumulative popularity increases 2% from 87 to 89% while 90% satisfaction increases more (7%) (63 to 70%). Movies that by popularity alone account for only 2% of the demand could potentially grow the overall customer base by 7% by attracting newly satisfied users. Searching: moving 95 to 96% along the tail increases 90% user satisfaction from 80 to 86%

Individual eccentricity: median rank of his/her consumed items.

More on eccentricity – Are those who are more ‘engaged’ (i.e., consume more) more eccentric? No: correlations between two at individual level (low) – But some observations at the group level

More on eccentricity ~ web pages Unique urls

Theoretical Analysis Independent model Sticky model – Winner take all. Shared inventory approach

Summary Nice analysis long tail Different perspectives combined – Popularity (cumulative and individual) – 90%, 100% satisfaction – Engagement versus ratings – Use of a null model to make predictions and compare – Nice graphs – Long tail helps in capturing user satisfaction and retention