Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World | New York City | September 29th, 2016
Agenda Who is Choice Hotels Platform Architecture Implementation Value Add Strata Hadoop Word | New York City | September 29th, 2016
Who is Choice Hotels? Strata Hadoop Word | New York City | September 29th, 2016
Who is Choice Hotels? Canada Hotels open 323 United States & Caribbean Hotels open 5,276 Hotels under development 606 Rooms open & under dev. 446,813 Canada Hotels open 323 Hotels under development 45 Rooms open & under dev. 30,135 South America Hotels open 64 Hotels under development 7 Rooms open & under dev. 9,737 Asia Pacific Hotels open 315 Hotels under development 25 Rooms open & under dev. 23,289 Europe Hotels open 402 Hotels under development 31 Rooms open & under dev. 50,388 Mexico Hotels open 28 Hotels under development 4 Rooms open & under dev. 3,219 Central America Hotels open 14 Hotels under development 0 Rooms open & under dev. 1,468 Middle East Hotels open 1 Hotels under development 2 Rooms open & under dev. 564
How About a Technology Company? Strata Hadoop World | New York City | September 29th, 2016
Evolution of Guest Experience Page 7 Strata Hadoop Word | New York City | September 29th, 2016
Project Goals Business Drivers Self Service Reporting and Analytics Requirements for near real-time analytics Simplify Governance, Compliance and Auditing Better support for new applications Technical Drivers Unable to handle volume, velocity, and veracity Retire Legacy Systems Difficult to find skillset (Informix 4GL) Simplify Technology Stack Page 8 Strata Hadoop Word | New York City | September 29th, 2016
Key Design Tenets Separation of Compute and Storage Independently scale compute and storage Data Democratization and Governance Bring your own Compute (BYOC) Lift and Shift between cloud provider(s) and On-premise HA / DR Open Source Stack
Separation of Compute and Storage Scale storage and compute independently (up or down) Shifts bottleneck from Disk IO to Network Centralized Data Storage Write once & read everywhere Data Democratization Easier Hardware upgrade paths Flexibile Architecture Servers Storage
BYOC (Bring Your Own Cluster) Eliminates the need for very large clusters Easier to administer and maintain Reduces multi-tenancy issues Clusters can be upgraded independently Enables on-demand computing Lower costs Marketing Cluster Centralized Storage Personalization Main
Platform Architecture
Platform Architecture – Data Ingestion Layer DB Ingestor Stream Ingestor Kafka and Spark Streaming File Ingestor FTP / SFTP / Logs Ingestion using Service API
Platform Architecture – Data Processing Layer Storage layer carved into logical buckets Landing, Raw, Derived and Delivery Schema stored with data (no guesswork) Platform Jobs for Converting text to Parquet Saving streaming data Parquet Derivatives Compaction Standardization
Platform Architecture – Data Delivery Layer SQL - Spark Thrift Server / Impala Tableau, SQL IDE, Applications SparkR Self Service Derivatives Represented Via SQL on Delivery Layer Stored in Derived Storage Layer Metadata driven Derived Layer Generators Long running Spark Job Derivative Refresh
Implementation CDH Cloud ready-ness Cloudera Director Limitations Multi-Availability zone, regions Spark Thrift Server Support Performance Tuning Concurrency, partition strategy Cache Tables Security Sentry Integration Kerberos Ticket Renewal Navigator Integration
Implementation Rapidly Changing Technology Feature addition Documentation Bugs Jar hell Compression Codec for Parquet S3 Eventual Consistency Small files Performance Issues Compaction
Implementation Partition Strategy Parquet Files Balancing parallelism and throughput Table Partitions Cluster sizing, optimization and tuning Integrating with Corporate infrastructure Deployment practices Monitoring and Alerting Information Security Policies
Enabling predictive analytics and real-time decisions Value Add Enabling predictive analytics and real-time decisions Integrated Scorecards – Daily /Weekly / Monthly Insights Near Real Time / Hourly / Daily Insights Multivariate Testing, APT (Test vs. Control Analysis), and Text Analytics Testing for Both Hotel and Customer / Research For Guest Insights Personalized Display Ad Serving Real-time Actions (Machine Learning) Across Guest Touch Points Hotel Lifecycle Data Real-time Alerts for Hotel Related Actions Strata Hadoop World | New York City | September 29th, 2016
CLAIRVOYANT BACKGROUND AWARDS & RECOGNITION One of the fastest growing big data companies Extensive experience in providing strategic and architectural consulting on Big Data platforms and implementations Global delivery experience across multiple locations in US, Asia and Latin America 100+ big data experts worldwide - US, Latin America and Asia AWARDS & RECOGNITION C L A I R V O Y A N T S O F T . C O M
We are hiring! http://careers.choicehotels.com/careers.html Strata Hadoop World | New York City | September 29th, 2016