The Ratings Game: Scoring Washington Reds Christopher Bitter University of Washington
Introduction Motivation Data: Questions U.S. consumers are “buying based on points” / ratings have a huge impact on wine sales Is this a viable strategy? How relevant are ratings? Data: 1,293 Washington State red wines rated by Wine Advocate, Wine Enthusiast, and Wine Spectator (3,879 total ratings) 2007-2012 vintages; 11 varietals; 8 AVAs; $11 to $150 (median $45); average score: 90.7 points Questions Do the publications agree with one another? Are the differences in scoring systematic? In other words – can they be explained by subjective preferences? Simplicity All know that a single number can’t capture the nuances in wine and the circumstances surrounding its enjoyment Can it help use choose higher quality wines that we will enjoy more?
Prior Work U.S. Wine Competitions Bordeaux en Primeur Tastings Hodgson (2008; 2009); Ashton (2012); Cao (2014); etc. Low correlations in scoring across judges – lack consensus Judges also lack reliability – unable to replicate scores in subsequent tastings of the same wine Bordeaux en Primeur Tastings Moderate degree of consensus (Ashton 2013, etc.) Differences are systematic – indicative of subjectivity (Masset et al. 2015; Cardebat & Vivat 2016) Unique settings - not entirely relevant to the typical U.S. wine drinker - ability to generalize results is uncertain Stuen et al. (2015) – study of CA and WA wines
Agreement?: Scoring Distributions Wine Enthusiast gives the highest scores / Wine Spectator the lowest (bias) Wine Spectator uses a narrower scoring range – 98% fall within a 9 point range (discriminates less) Do they use the 100 point scale in a consistent manner
Agreement? Correlations Low-to-moderate degree of consensus regarding wine quality Correlations intermediate between wine competition and Bordeaux settings
Agreement? Variation in Scores Mean standard deviation is 1.40 for the 1,293 wines Range is 4 or more 40% of the time May just focus on the range here
Disagreement Potential causes of disagreement in scoring Lack of accuracy / reliability Subjective preferences Testing for subjectivity If preferences play a role – scoring differences should be systematically related to wine attributes Difference in score between two publications modelled as a function of: price, vintage, varietal, appellation, and winery Ordinary least squares estimation
Regression Results Price and label attributes explain: 33% of the difference between Advocate & Enthusiast 43% of the difference between Advocate & Spectator 21% of the difference between Enthusiast & Spectator
Implications Consumers Producers Single score is not always representative of consensus opinion – limits relevance – better to consider multiple scores 63% of all wines in the $15 - $25 range achieved a max score of 90 or above – only 9% had a “consensus” score of 90 Subjectivity not necessarily negative - but implies that some ratings may be more relevant than others Ratings are relevant – but a blunt instrument Producers Good producers should be rewarded in the end – but variability in scoring favors those with better access to the review system Probability of getting a 90 point score in the $15 - $25 category improves from 28% with 1 rating to 63% with 3 Opportunity to exploit knowledge of scoring differences and preferences in order to improve ratings and sales? Superscoring
The End. Email Bitter@UW.edu for a copy of the paper or more information
Regression Coefficients: Raw Score Models