Presentation is loading. Please wait.

Presentation is loading. Please wait.

Netflix Prize and Heritage Health Prize Philip Chan.

Similar presentations


Presentation on theme: "Netflix Prize and Heritage Health Prize Philip Chan."— Presentation transcript:

1 Netflix Prize and Heritage Health Prize Philip Chan

2 Cash Prizes to Stimulate Research Ansari X Prize for Private Spaceflight (2004) [$10M] 100 km above earth twice within 2 weeks DAPRA Grand Challenge (2005) [$2M] autonomous vehicle: 131 miles in 10 hours Archon X Prize for Genomics (2006) [$10M] map 100 human genomes in 10 days

3 Cash Prizes to Stimulate Research Netflix Prize (2006) [$1M] Recommend movies with 10% improvement Heritage Health Prize (2011) [$3M] Days in hospital next year with 0.4 error

4 Netflix Prize netflixprize.com

5 Netflix Prize Task Given customer ratings on some movies Predict customer ratings on other movies If John rates “Mission Impossible” a 5 “Over the Hedge” a 3, and “Back to the Future” a 4, how would he rate “Harry Porter”, … ? Performance Error rate (accuracy)

6 Cash Award Grand Prize $1M 10% improvement by 2011 (in 5 years) Progress Prize $50K per year 1% improvement

7 Intellectual Property Netflix has a non-exclusive license to the algorithm Authors tell the world what the algorithm is

8 Participation 51K contestants 41K teams 186 countries

9 Leader Board Started on Oct 2, 2006 Improvement by the top algorithm after a week: ~0.9% after two weeks: ~4.5% after a month: ~5% after a year: ~8.4% after two years: ~9.4% July 26, 2009 (less than 3 years): 10%

10 Winner BellKor’s Pragmatic Chaos 7 members Merger of 3 teams BellKor  AT&T Labs, USA & Yahoo! Research, Israel PragmaticTheory  telecommunications, Canada BigChaos  started a company, Austria A combination of different algorithms

11 Runner-up The Ensemble ~30 members “last-minute” merger teams had 30 days to beat the first team that crossed the 10% threshold same accuracy behind by 20 minutes!

12 Heritage Health Prize heritagehealthprize.com

13 Health Care 71M individuals admitted to US hospitals each year Unnecessary admissions cost $30B

14 Heritage Provider Network Has a network of doctors in California Can we identify earlier those most at risk and ensure they get the treatment they need? Can we reduce unnecessary hospitalizations?

15 Heritage Health Prize Launch http://www.youtube.com/watch?v=GuZ8nkpygAs Given patient data Predict how many days a patient will spend in a hospital in the next year The prediction helps develop strategies to reduce emergencies and hence hospitalizations

16 Grand Prize $3M At most 0.4 in error (~0.5 day) By Apr 4, 2013 [2 years] $500K Consolation Prize not below 0.4 error

17 Milestone Prizes top 2 performers at each milestone Aug 31, 2011 $30K, $20K Feb 13, 2012 $50K, $30K http://www.youtube.com/watch?v=pkmkNnGyihY Sep 4, 2012 $60K, $40K

18 Performance of Algorithms Prediction Error Rate (RMSLE) where real = log ( actual # of days + 1 ) prediction = log ( predicted # of days + 1 ) Prediction error threshold = 0.4 (~0.5 day)

19 Intellectual Property Exclusive license to Sponsor and participant’s own use Algorithms not previously published Use of data sets is for the competition only written consent for other purposes

20 Data Sets Training and validation data sets For participants to design algorithms Feedback data set For calculating standings on Leaderboard Scoring data set For determining winners for prizes http://www.heritagehealthprize.com/c/hhp/Data

21 Data (in CSV format) Members Data (113K members) Claims Data (2.7M claims) Drug Count Data (818K prescriptions) Lab Count Data (361K labs) Outcome Data (76K in Y2, 71K in Y3) Target (71K in Y4 for prediction) Total ~264 MB (including other files)

22 Members Data MemberID AgeAtFirstClaim Sex

23 Claims Data MemberID ProviderID Vendor ID PCP (Primary care physician) ID Year Specialty (of physician/vendor?) PlaceSvc (place of service) office, outpatient hospital, inpatient hospital, … PayDelay (between service and payment)

24 Claims Data [continued] LengthOfStay (in hospital) DSFS (days since first claim) PrimaryConditionGroup (diagnostic categories) CharlsonIndex (affect of diseases on illness) ProcedureGroup (intervention categories) SupLOS (supplement to LengthOfStay) 1 if LenghtOfStay is NULL because of de- identificaiton

25 Drug Count Data MemberID Year DSFS (Days since first service) DrugCount (unique prescription drugs)

26 Lab Count Data Member Id Year DSFS (Days since first service) LabCount (unique lab or pathology tests)

27 Outcome Data MemberID DaysInHospital_Y2 (claims in Y1) ie, Predict Y2 based on Y1 DaysInHospital_Y3 (claims in Y2) ClaimedTruncated 1 if members with “truncated” claims

28 Using Other Data? Yes Freely available to anyone (public source) URL needs to be published to the forum Except for demographic, socioeconomic or clinical information about the members

29 Naive Algorithms For predicting the number of Days in Hospital in the next year Posted as “benchmarks” on the Leaderboard

30 Always Predict 15 (max) Everyone goes to the hospital for at least 15 days

31 Always Predict 15 (max) Everyone goes to the hospital for at least 15 days RMSLE = 2.628062 550+% over threshold

32 Always Predict Zero no one goes to the hospital

33 Always Predict Zero no one goes to the hospital RMSLE = 0.522226 31% over threshold

34 Predict Random Values between 0 and 15

35 Predict Random Values between 0 and 15 RMSLE = 0.752297 88% over threshold

36 Always Predict Average Average ~= 0.209179

37 Always Predict Average Average ~= 0.209179 RMSLE = 0.486459 22% over threshold

38 Leader Board Competition started on Apr 4, 2011 with partial data All data were released on June 4, 2011 Sep 9, 2011

39 Leader Board Competition started on Apr 4, 2011 with partial data All data were released on June 4, 2011 Sep 9, 2011 RMSLE: 0.456384 ~14.1% over threshold Aug 29, 2012 RMSLE: 0.450426 ~12.6% over threshold

40 Teams Sep 9, 2011 914 teams 6021 entries Aug 29, 2012 1292 teams

41 Considerations Accurate Prediction algorithms Efficiency time space

42 Teams Form your own teams www.heritagehealthprize.com Join my team CSE 4403 Independent Study CSE 5801 Independent Research

43 THANK YOU www.heritagehealthprize.com


Download ppt "Netflix Prize and Heritage Health Prize Philip Chan."

Similar presentations


Ads by Google