Rob Lancaster, Orbitz Worldwide Survival Analysis & TTL Optimization.

Slides:



Advertisements
Similar presentations
The analysis of survival data in nephrology. Basic concepts and methods of Cox regression Paul C. van Dijk 1-2, Kitty J. Jager 1, Aeilko H. Zwinderman.
Advertisements

Surviving Survival Analysis
Survival Analysis. Key variable = time until some event time from treatment to death time for a fracture to heal time from surgery to relapse.
Survival Analysis-1 In Survival Analysis the outcome of interest is time to an event In Survival Analysis the outcome of interest is time to an event The.
Population Ecology. Population Demographics Demographics are the various characteristics of a population including, Population Size, Age Structure, Density,
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Survival Analysis. Statistical methods for analyzing longitudinal data on the occurrence of events. Events may include death, injury, onset of illness,
Survival analysis 1 The greatest blessing in life is in giving and not taking.
بسم الله الرحمن الرحیم. Generally,survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of.
x – independent variable (input)
Intermediate methods in observational epidemiology 2008 Instructor: Moyses Szklo Measures of Disease Frequency.
Main Points to be Covered
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Review of the fundamental concepts of probability Exploratory data analysis: quantitative and graphical data description Estimation techniques, hypothesis.
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
Prelude of Machine Learning 202 Statistical Data Analysis in the Computer Age (1991) Bradely Efron and Robert Tibshirani.
Warranty Forecasting of Electronic Boards using Short- term Field Data Mustafa Altun, PhD Assistant Professor Istanbul Technical University
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Survival analysis Brian Healy, PhD. Previous classes Regression Regression –Linear regression –Multiple regression –Logistic regression.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011.
Simple Linear Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and international.
NASSER DAVARZANI DEPARTMENT OF KNOWLEDGE ENGINEERING MAASTRICHT UNIVERSITY, 6200 MAASTRICHT, THE NETHERLANDS 22 OCTOBER 2012 Introduction to Survival Analysis.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Contaminated lake sediment can destroy an ecosystem through bioaccumulation; it harms not only sediment dwelling organisms, but also the fish that depend.
G Lecture 121 Analysis of Time to Event Survival Analysis Language Example of time to high anxiety Discrete survival analysis through logistic regression.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 19.
1 Introduction to medical survival analysis John Pearson Biostatistics consultant University of Otago Canterbury 7 October 2008.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Bayesian Analysis and Applications of A Cure Rate Model.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
INTRODUCTION TO SURVIVAL ANALYSIS
Applied Epidemiologic Analysis Fall 2002 Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie KranickSylvia Taylor Chelsea MorroniJudith.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 9 Survival Analysis Henian Chen, M.D., Ph.D.
Censoring an observation of a survival r.v. is censored if we don’t know the survival time exactly. usually there are 3 possible reasons for censoring.
Survival Analysis approach in evaluating the efficacy of ARV treatment in HIV patients at the Dr GM Hospital in Tshwane, GP of S. Africa Marcus Motshwane.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Lecture 4: Likelihoods and Inference Likelihood function for censored data.
Satistics 2621 Statistics 262: Intermediate Biostatistics Jonathan Taylor and Kristin Cobb April 20, 2004: Introduction to Survival Analysis.
01/20151 EPI 5344: Survival Analysis in Epidemiology Hazard March 3, 2015 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine,
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
INTRODUCTION TO CLINICAL RESEARCH Survival Analysis – Getting Started Karen Bandeen-Roche, Ph.D. July 20, 2010.
Topic 19: Survival Analysis T = Time until an event occurs. Events are, e.g., death, disease recurrence or relapse, infection, pregnancy.
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Describing a Score’s Position within a Distribution Lesson 5.
Computacion Inteligente Least-Square Methods for System Identification.
REGRESSION MODEL FITTING & IDENTIFICATION OF PROGNOSTIC FACTORS BISMA FAROOQI.
Chi-Square Chapter 14. Chi Square Introduction A population can be divided according to gender, age group, type of personality, marital status, religion,
SURVIVAL ANALYSIS PRESENTED BY: DR SANJAYA KUMAR SAHOO PGT,AIIH&PH,KOLKATA.
Methods and Statistical analysis. A brief presentation. Markos Kashiouris, M.D.
April 18 Intro to survival analysis Le 11.1 – 11.2
Basic Estimation Techniques
Basic Estimation Techniques
Survival Analysis {Chapter 12}
Anja Schiel, PhD Statistician / Norwegian Medicines Agency
Introduction to Logistic Regression
Estimating the number of components with defects post-release that showed no defects in testing C. Stringfellow A. Andrews C. Wohlin H. Peterson Jeremy.
EVENT PROJECTION Minzhao Liu, 2018
Presentation transcript:

Rob Lancaster, Orbitz Worldwide Survival Analysis & TTL Optimization

Outline The Problem Survival Analysis Intro Key Terms Techniques & Models: Kaplan-Meier Estimates Parametric Models Optimizing Cache TTL Methods Results

The Problem The hotel rate cache and TTL optimization.

The Hotel Rate Cache

Key/Value Store Key: Search Criteria Value: Hotel Rate Information Benefit = Reduce looks & latency Cost = Increased re-price errors hotel idcheck-in# people hostcheck-out# rooms

The Hotel Rate Cache Each cache entry is given a time-to-live (TTL) TTLs set based on intuition ages ago. Goal: Optimize TTL to decrease looks, control re-price errors How? Ideally, find greatest TTL value at which probability of rate change is below an acceptable threshold.

Survival Analysis A brief? introduction.

What is Survival Analysis? Statistical procedures for predicting time until an event occurs. Event: death, relapse, recovery, failure. Examples: Heart transplant patients: Time until death. Leukemia patients in remission: Time until relapse. Prison parolees: Re-arrest.

Key Terms Survival Time, T vs. t Failure Censoring Survival Function

Censoring Period of no information Left-censored. Right-censored. Causes: Individual is “lost” to follow-up Death from cause unrelated to event of interest Study ends Models assume either failure or censoring.

Survival Function Survival Function: S(t) Probability of survival greater than t, i.e. that T > t Properties: Non-increasing S(t) = 1, for t=0. S(t) = 0, t=∞

Kaplan-Meier Estimates tjtj mjmj qjqj njnj t j : observation time m j : number of failures q j : number of censored observations n j : number at risk

Kaplan-Meier Estimates tjtj mjmj qjqj njnj

Parametric Models Accelerated Failure Time Assume distribution Use regression to fit parameters. λ is parameterized in terms of predictor variables and regression parameters. DistributionS(t) Exponential Weibull Log-logistic

Optimizing Cache TTL Methods and early results.

Data Collection Data is collected from service hosts in our hotel stack. Includes every live rate search (aka burst) performed by our hotel stack. Raw data: ~200 GB, compressed, 10 8 records. Extraction: <40 GB compressed, 10 9 records.

Data Preparation Map/Reduce Job Key: unique search criteria (including hotel id) Sorted by date of occurrence Most important output: Does rate ever change? (how long) Does status ever change? (how long) Results stored in Hive Table Predictors: location, lead time, los, chain, etc. Survival Analysis Variables: event, survival time

Data Preparation: Sample Key: hotelid:checkin:checkout:ppl:rmsTimestampStatusRateStatus Change Hours Until Status ChangeRate Change Hours Until Rate Change 12345: : :2: :00Available$100TRUE : : :2: :00Available$100TRUE : : :2: :00UnavailableN/ATRUE8N/A 12345: : :2: :00UnavailableN/ATRUE6N/A 12345: : :2: :00UnavailableN/ATRUE5N/A 12345: : :2: :00UnavailableN/ATRUE2N/A 12345: : :2: :00Available$120FALSEN/ATRUE : : :2: :00Available$120FALSEN/ATRUE : : :2: :00Available$150FALSEN/AFALSEN/A 12345: : :2: :00Available$150FALSEN/AFALSEN/A 12345: : :2: :00Available$150N/A

KM Estimates GlobalBy Traffic Volume

Fitting the Survival Curve Assume exponential: Apply simple linear regression. Full data R 2 : hrs R 2 : 0.999

Survival Regression Using survreg, we can fit our data to a given distribution. Allows us to capture influence of predictor values on survival rate.

Model Families

Production Testing Divided hotels in 8 markets into A & B groups Modified TTL values for unavailable rates for B Prediction: Reduce the number of “looks” to B Reduce the unavailability percentage for B No negative impact on bookings or look-to-books for B

Production Results

Conclusions and Next Steps Conclusions Survival Analysis is well-suited for our problem. Great success in experiments for unavailable rates. What’s next? Available rates Introduction of predictor variables On-the-fly TTL calculation Beyond TTL…

Thank you! Questions?