Download presentation
Presentation is loading. Please wait.
Published byGordon Ellis Modified over 9 years ago
1
Netflix Prize and Heritage Health Prize Philip Chan
2
Cash Prizes to Stimulate Research Ansari X Prize for Private Spaceflight (2004) [$10M] 100 km above earth twice within 2 weeks DAPRA Grand Challenge (2005) [$2M] autonomous vehicle: 131 miles in 10 hours Archon X Prize for Genomics (2006) [$10M] map 100 human genomes in 10 days
3
Cash Prizes to Stimulate Research Netflix Prize (2006) [$1M] Recommend movies with 10% improvement Heritage Health Prize (2011) [$3M] Days in hospital next year with 0.4 error
4
Netflix Prize netflixprize.com
5
Netflix Prize Task Given customer ratings on some movies Predict customer ratings on other movies If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Porter”, … ? Performance Error rate (accuracy)
6
Cash Award Grand Prize $1M 10% improvement by 2011 (in 5 years) Progress Prize $50K per year 1% improvement
7
Intellectual Property Netflix has a non-exclusive license to the algorithm Authors tell the world what the algorithm is
8
Participation 51K contestants 41K teams 186 countries
9
Leader Board Started on Oct 2, 2006 Improvement by the top algorithm after a week: ~0.9% after two weeks: ~4.5% after a month: ~5% after a year: ~8.4% after two years: ~9.4% July 26, 2009 (less than 3 years): 10%
10
Winner BellKor’s Pragmatic Chaos 7 members Merger of 3 teams BellKor AT&T Labs, USA & Yahoo! Research, Israel PragmaticTheory telecommunications, Canada BigChaos started a company, Austria A combination of different algorithms
11
Runner-up The Ensemble ~30 members “last-minute” merger teams had 30 days to beat the first team that crossed the 10% threshold same accuracy behind by 20 minutes!
12
Heritage Health Prize heritagehealthprize.com
13
Health Care 71M individuals admitted to US hospitals each year Unnecessary admissions cost $30B
14
Heritage Provider Network Has a network of doctors in California Can we identify earlier those most at risk and ensure they get the treatment they need? Can we reduce unnecessary hospitalizations?
15
Heritage Health Prize Launch http://www.youtube.com/watch?v=GuZ8nkpygAs Given patient data Predict how many days a patient will spend in a hospital in the next year The prediction helps develop strategies to reduce emergencies and hence hospitalizations
16
Grand Prize $3M At most 0.4 in error (~0.5 day) By Apr 4, 2013 [2 years] $500K Consolation Prize not below 0.4 error
17
Milestone Prizes top 2 performers at each milestone Aug 31, 2011 $30K, $20K Feb 13, 2012 $50K, $30K http://www.youtube.com/watch?v=pkmkNnGyihY Sep 4, 2012 $60K, $40K
18
Performance of Algorithms Prediction Error Rate (RMSLE) where real = log ( actual # of days + 1 ) prediction = log ( predicted # of days + 1 ) Prediction error threshold = 0.4 (~0.5 day)
19
Intellectual Property Exclusive license to Sponsor and participant’s own use Algorithms not previously published Use of data sets is for the competition only written consent for other purposes
20
Data Sets Training and validation data sets For participants to design algorithms Feedback data set For calculating standings on Leaderboard Scoring data set For determining winners for prizes http://www.heritagehealthprize.com/c/hhp/Data
21
Data (in CSV format) Members Data (113K members) Claims Data (2.7M claims) Drug Count Data (818K prescriptions) Lab Count Data (361K labs) Outcome Data (76K in Y2, 71K in Y3) Target (71K in Y4 for prediction) Total ~264 MB (including other files)
22
Members Data MemberID AgeAtFirstClaim Sex
23
Claims Data MemberID ProviderID Vendor ID PCP (Primary care physician) ID Year Specialty (of physician/vendor?) PlaceSvc (place of service) office, outpatient hospital, inpatient hospital, … PayDelay (between service and payment)
24
Claims Data [continued] LengthOfStay (in hospital) DSFS (days since first claim) PrimaryConditionGroup (diagnostic categories) CharlsonIndex (affect of diseases on illness) ProcedureGroup (intervention categories) SupLOS (supplement to LengthOfStay) 1 if LenghtOfStay is NULL because of de- identificaiton
25
Drug Count Data MemberID Year DSFS (Days since first service) DrugCount (unique prescription drugs)
26
Lab Count Data Member Id Year DSFS (Days since first service) LabCount (unique lab or pathology tests)
27
Outcome Data MemberID DaysInHospital_Y2 (claims in Y1) ie, Predict Y2 based on Y1 DaysInHospital_Y3 (claims in Y2) ClaimedTruncated 1 if members with “truncated” claims
28
Using Other Data? Yes Freely available to anyone (public source) URL needs to be published to the forum Except for demographic, socioeconomic or clinical information about the members
29
Naive Algorithms For predicting the number of Days in Hospital in the next year Posted as “benchmarks” on the Leaderboard
30
Always Predict 15 (max) Everyone goes to the hospital for at least 15 days
31
Always Predict 15 (max) Everyone goes to the hospital for at least 15 days RMSLE = 2.628062 550+% over threshold
32
Always Predict Zero no one goes to the hospital
33
Always Predict Zero no one goes to the hospital RMSLE = 0.522226 31% over threshold
34
Predict Random Values between 0 and 15
35
Predict Random Values between 0 and 15 RMSLE = 0.752297 88% over threshold
36
Always Predict Average Average ~= 0.209179
37
Always Predict Average Average ~= 0.209179 RMSLE = 0.486459 22% over threshold
38
Leader Board Competition started on Apr 4, 2011 with partial data All data were released on June 4, 2011 Sep 9, 2011
39
Leader Board Competition started on Apr 4, 2011 with partial data All data were released on June 4, 2011 Sep 9, 2011 RMSLE: 0.456384 ~14.1% over threshold Aug 29, 2012 RMSLE: 0.450426 ~12.6% over threshold
40
Teams Sep 9, 2011 914 teams 6021 entries Aug 29, 2012 1292 teams
41
Considerations Accurate Prediction algorithms Efficiency time space
42
Teams Form your own teams www.heritagehealthprize.com Join my team CSE 4403 Independent Study CSE 5801 Independent Research
43
THANK YOU www.heritagehealthprize.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.