The impact of piracy (and free) on book sales A BEA update on the piracy project
This morning … Project history Why look at this now? Approach and publishing partners Primary findings to date Next steps for this research
Born (Wakefield, Massachusetts) Catholic school education College, business school Finally left Massachusetts Time Inc. weekly magazines Married, kid
More kids Hammond Started to consult “Faster, better, cheaper” Got a logo
MIP 2008: Andrew Savikas and Mac Slocum wonder, “Can we measure the impact of piracy on book sales?” Brian: “How about using a co-op marketing model to assess whether it helps or hinders paid sales?” O’Reilly’s editors: “All of our content is pirated as soon as we publish it!” Random House joins the research, contributing several experiments with “free” content distribution Intrepid NYU grad student starts tracking O’Reilly’s 2008 front list to find pirated content Brian secretly worries that any analysis will not have a “quiet period” against which to measure baseline sales
TOC 2009: Research paper and preliminary results are announced Piracy research on O’Reilly titles Search for more participants BEA 2009: Piracy and “free” results are updated Still consulting …
Why tell you all of this? You need to know me to trust me Trust helps you hear what I have to say Success (to me) = a fighting chance at recruitment and participation
My point of view IP matters There are niches, and titles, for which piracy is a direct loss and enforcement makes sense There are niches, and titles, for which piracy may help spur paid sales This research is structured to find out which is which
“Perhaps on the rare occasion that pursuing the right course demands an act of piracy, piracy itself can be the right course?” Governor Swann, in “Pirates of the Caribbean” (itself heavily pirated)
“Free” is not “new” … A long and successful history Galleys, ARCs, blads, sample chapters Digital sampling on the rise … but only a small set of experiments using fully “free” content 10
Book marketing: growing content discovery and access High Discovery High Access Low Discovery Low Access Appearance on Oprah Coop Marketing Corporate Web Site Museum Stores Amazon Promotion Catalog & BEA Over time, increase both discovery and access
Why look at this topic now? 12 More digital content Better ebook readers Piracy “threat” Ongoing “free” debate
Our research approach The research is data-driven, open (without compromising publisher data) and structured to share knowledge. 13 Document and assess prior work Address data quality Analyze and share results Assess implications Identify next steps Collate experiments Segment attributes Identify data gaps Use a consistent data source (POS feeds) Measure pre- and post- release Populate a structured matrix Look at combined results Initial take on the impact of pirated content Initial impact of “free” Share the analysis Invite discussion Grow the test sample
A hands-on research project Wil Johnson Searched P2P sites for O’Reilly titles every day since fall 2008 Built his own scraper Running it made his (roommate’s) ISP cut off service Figured out a way around it! NYU, candidate for M.S. in Publishing, Dec 2009
A value in structured testing … 15 A robust set of variables Appropriate segmentation Captures content characteristics Can collate like experiments Can develop and test specific hypotheses and track results over time
Why O’Reilly and Random House? O’Reilly MediaPioneered discussion of the distribution of free content Active in promoting widespread access to its content Perceived as vulnerable to a piracy threat Random HouseLargest U.S. publisher A wide range of book types reaching a variety of audiences Engaged in a number of experiments with “free”
Primary research findings P2P “threat” may be overstated – Low incidence – Significant lag – Technical skills are not commonly held The value of “free” is not binary 17
Proposing a more nuanced model “White” market “Gray” market “Back channel” Print sales DRM-protected digital sales “Trialware” Unprotected digital sales Galleys, ARCs “Free” promotions Unauthorized duplication Pirated content 18 Our current question: what impact does “free” have on sales?
What we tested … Monitoring P2P (O’Reilly) Sample set for peer-to-peer piracy Monitored three BitTorrent sites; only one (PirateBay) had more than a handful of O’Reilly titles posted Tracked activity of seeds (uploads) and leeches (downloads) for any 2008 O’Reilly front list titles found on these sites Testing free (Random House) Free PDF downloads (three for 1 day, one for 3 days, one for three weeks) Free PDF excerpt (3 weeks) Free ebook download (1 day) Free PDF, MP3 and ebook downloads (3 weeks) 19
What we found initially … Monitoring P2P (O’Reilly) 8 titles that were posted O’Reilly front list in 4Q 2008 Average post-seed sales were 6.5% higher in the four weeks after Ranged from 18.2% up to 33.1% down Low seed and leech volume Average first seeds appeared 20 weeks after publication date Testing free (Random House) 8 titles, 12 formats tested in the first half of 2008 Sales up 19.1% during promotional period Sales up 6.5% during promotional and post-promotional periods Ranged from 155% up to 74% down 20
Some research surprises… Low volume of P2P incidence Lag time on P2P seeding Number and range of “under the radar” free experiments available for analysis Interest among trade publishers 21
The piracy research continued … 13 more 2008 front-list titles Average post-seed sales down 4.2% in the four weeks after seeds first seen Ranged from 15.1% up to 48.7% down Average first seeds appeared 19 weeks after publication date
The number of seeds peaks quickly 23
The number of leeches peaks immediately and quickly declines 24
Lag time before seeding varies Average = 19 weeks 25
Noodling over the data… MeasureFirst sampleP2P to date Titles821 Post-seed change in sales+6.5%-4.2% Biggest gain+18.1%+15.1% Largest loss-33.1%-48.7% Lag before piracy20 weeks19 weeks The spread in results made us wonder if we had missed something in the bigger sample set.
Where piracy may help … Looked at sales patterns of pirated and un- pirated content “Normed” the data to a common start point Plotted the average sales per week for pirated and un-pirated titles Uncovered a visual correlation between piracy onset and unit sales Because of different pub dates, the average time on sale for pirated content in this sample is shorter (35 weeks) than that for un-pirated content (47) weeks. Comparisons at the end of the on-sale period are not reliable.
Average sales (weeks after pub date) Average week at which seeded content first seen
Four-week rolling averages Average week at which seeded content first seen
Data in what was not pirated… SeriesExamples Head FirstHF Servlets and JSP HF C Sharp HF PHP and MySQL HF Ajax OtherPhotoshop Elements 7 Programming.Net 3.5 Real World Haskell Windows Server 2008 Learning Python “Head First books break from the linear tradition, instead using a trajectory filled with successes, failures and lessons… The non-linear format makes them tough to reproduce in a digital form -- they're full of illustrations, thought bubbles, photos, quizzes, etc.”
Since TOC, a new research project John Hilton, a doctoral candidate at BYU Used Bookscan data 8 weeks before & after free promotions Found that Random’s five recent promotions coincided with an average 11% lift in sales A parallel evaluation of Tor promotions saw sales drop an average of 26%
Three useful cautions Correlation isn’t causality Larger data sets may uncover a sample skew What works today may not work as well at some future date 32
Next steps Additional tests are in work now We continue to monitor P2P activity More publishers can help fill in the test matrix Now gathering feedback Will continue to refine the analysis 33
For more information “Rough Cut” research paper out now – Includes this research and future updates – Also provides background on free and P2P –