Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011.

Slides:



Advertisements
Similar presentations
Armstrong Cabinets was ranked "Highest in Customer Satisfaction with Cabinets" Local Search Program Results & Findings Mike Saad Global eMarketing Manager.
Advertisements

Chapter 1 Business Driven Technology
6.1 © 2007 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
Chapter 3 Database Management
Introduction to Data Warehousing. From DBMS to Decision Support DBMSs widely used to maintain transactional data Attempts to use of these data for analysis,
Data Mining.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
PRASHANTHI NARAYAN NETTEM.
Retail and Consumer Roadmap to Retailing in the Digital Era Strictly Private and Confidential 17 June 2015.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
AdWords Instructor: Dawn Rauscher. Quality Score in Action 0a2PVhPQhttp:// 0a2PVhPQ.
DESIGNING FOR MOBILE Lunch & Learn Series | February 20, 2014.
IMA CIM Overview. IMA Mission “Provide a knowledge-sharing platform for business professionals where proven Internet.
© 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker May 24, 2012 Page: 1 © 2012-Robert G Parker.
Rob Lancaster, Orbitz Worldwide Survival Analysis & TTL Optimization.
P2P Architecture Case Study: Gnutella Network
Understanding Data Warehousing
OneView Benefits Sales collaboration across network A “one-stop” utility portal Ease of use, scalability & accessibility Useful reports and metrics Improved.
MarketLine HQ ADVANTAGE – your subscription service Explore today at
Marco Nasca Senior Director, Client Solutions TRANSFORMING DISCOVERY THROUGH DATA MANAGEMENT.
@ ?!.
Experiences with Big Data and Learning What’s Worth Keeping Roger Liew, CTO Orbitz Worldwide Wolfram Data Summit 2011.
Wireless Networks Breakout Session Summary September 21, 2012.
CSC8320. Outline Content from the book Recent Work Future Work.
Michael Corcoran Sr. Vice President & CMO New Data Requirements Driven By Analytics 1.
BUSINESS DRIVEN TECHNOLOGY
Web Site Usability. Benefits of planning usability Increased user satisfaction, which translates directly to trust and brand loyalty Increased user productivity,
Data Mining By Dave Maung.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Using the Right Method to Collect Information IW233 Amanda Murphy.
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Business Solutions. Agenda Overview Business Solutions Benefits Company Summary.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
HP PPM Center release 8 Helping IT answer the tough questions
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
7 Strategies for Extracting, Transforming, and Loading.
ApproxHadoop Bringing Approximations to MapReduce Frameworks
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Implementation recommendations 1st COPRAS review Presentation at 2nd COPRAS annual review, 15 March 2006, CEN/CENELEC meeting centre, Brussels Bart Brusse.
1 Copyright © Oracle Corporation, All rights reserved. Business Intelligence and Data Warehousing.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
Information Systems in Organizations Managing the business: decision-making Growing the business: knowledge management, R&D, and social business.
Cognos BI. What is Cognos? Cognos (Cognos Incorporated) was an Ottawa, Ontario-based company that makes Business Intelligence (BI) and Performance Management.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Virtual Memory (Section 9.3). The Need For Virtual Memory Many computers don’t have enough memory in RAM to accommodate all the programs a user wants.
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Free SEO for Blogs & YouTube Channels.
Virtual memory.
Measurement-based Design
IBM Tivoli Web Site Analyzer Training Document
Management Support Systems: An Overview by Dr. S. Sridhar,Ph. D
Creating New Business Value with Big Data
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.
Hadoop Market
Introduction to Spark.
Information Integration for Digital Libraries
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Global Enterprise Search
Computer Architecture
Chapter 6 Foundations of Business Intelligence: Databases and Information Management.
Big DATA.
KEY INITIATIVE Financial Data and Analytics
Map Reduce, Types, Formats and Features
Presentation transcript:

Machine Learning at Orbitz Robert Lancaster and Jonathan Seidman Strata 2011 February 02 | 2011

page 1 Launched: 2001, Chicago, IL

Why Start the Machine Learning Team at Orbitz? Team was created in 2009 with the goal to apply machine learning techniques to improve the customer experience. For example: –Hotel sort optimization: How can we improve the ranking of hotel search results in order to show consumers hotels that more closely match their preferences? –Cache optimization: can we intelligently cache hotel rates in order to optimize the performance of hotel searches? –Personalization/segmentation: can we show targeted search results to specific consumer segments? page 2

Data Challenges The team immediately faced challenges getting access to data: –Performing required analysis requires access to large amounts of data on user interaction with the site. –This data is available in web analytics logs, but required fields were not available in our data warehouse because of size considerations. –Even worse, we had no archive of the data beyond several days. –Size constraints aside, there’s considerable time and effort to get new data added to the data warehouse. page 3

New Data Infrastructure to Address These Challenges Hadoop provides a solution to these challenges by: –Providing long-term storage of entire raw dataset without placing constraints on how that data is processed. –Allowing us to immediately take advantage of new web analytics data added to the site. –Providing a platform for efficient analysis of data, as well as preparation of data for input to external processes for further analysis. Hive was added to the infrastructure to provide structure over the prepared data, facilitating ad-hoc queries and selection of specific data sets for analysis. Data stored in Hive not only supports machine learning efforts, but also provides metrics to analysts not available through other sources. page 4

New Data Infrastructure – Cont’d Hadoop and Hive are now being used by the machine learning team to: –Extract data from logs for hotel sort and cache optimization analyses. –Distribute complex cross-validation and performance evaluation operations. –Extracting data for clustering. Hadoop and Hive have also gained rapid adoption in the organization beyond the machine learning team: evaluating page download performance, searching production logs, keyword analysis, etc. page 5

Use Case – Hotel Cache Optimization Overview: Search methodology: Subset of total properties in a location (1 page at a time). Get “just enough” information to present to consumers. Caching: Reduces impact to suppliers (maintain “look-to-book” ratio). Reduces latency. Increases “coverage.” Optimization Goal: Improve the customer experience (reduce latency, increase coverage) when searching for hotel rates while controlling impact on suppliers (maintain look-to-book). page 6

Hotel Cache Optimization – Early Attempts Early approaches were well intended, but were not driven by analysis of the available data. For example: Theory: High amount of thrashing leads to eviction of more useful cache entries. Attempted Solution: Increase cache size. Result: No increase in measured coverage. Problem: No actual analysis on required cache size. Theory: Locally managed inventory represents “free” information and can be requested without limit to improve coverage. Attempted Solution: Don’t cache locally managed inventory. Increase the amount of local inventory requested with each user search. Result: No increase in measured coverage. Problem: Locally managed inventory doesn’t represent a large percentage of total inventory and is already highly preferenced. page 7

Hotel Cache Optimization – Data Driven Approaches Data Driven Approaches: Traffic Partitioning: Identify the subset of traffic that is most efficient and optimize that subset through prefetching and increased bursting. TTL Optimization: Use historic logs of availability and rate change information to predict volatility of hotel rates and optimize cache TTL. page 8

Hotel Cache Optimization– Traffic Distribution page 9 A small number of queries (3%) make up more than a third of search volume.

Optimize Hotel Cache – Traffic Partitioning Evaluate possible mechanisms for determining most frequent queries. Favor mechanisms that gives high search/query ratio for the greatest percentage of search volume. Test for stability of mechanism across multiple time periods. Partion StrategyDescriptionPct QueriesPct SearchesSearches/Query BaselineAll traffic100.00% 2.19 Top 50Top 50 searched markets14.88%26.76%3.94 Heuristic Top 50 searched markets, weekend stay within 1 month.0.87%8.52%21.4 EnumerationQueries repeated 5 or more times.3.45%28.80%18.29 PredictionTBD page 10

Conclusions and Lessons Learned Start with a manageable problem (ease of measuring success, availability of data, etc.) Avoid thinking of machine learning team as an R&D organization. Instead, foster machine learning approaches throughout the organization: –Embed resources on actual feature teams. –Machine learning study groups, etc. page 11