9/10/2018 Largest US Healthcare Dataset in Hadoop enables Patient-level Analytics in Near Real Time September 28, 2016 Navdeep Alam Director of Data Warehousing nalam@us.imshealth.com
Agenda Who is IMS Health Health care data ecosystem at IMS Opportunity and Challenges: Make a Greater Difference in Patient Healthcare Solution – Anonymous Patient Longitudinal Analysis Lessons Learned
Who is IMS Health? Market Healthcare Information, Technology and Services Solutions Deliver unique insights into diseases, treatments, costs and outcomes Experience Founded 1954, Operation 100+ Countries, 15,000 Employees, 55+ Billion Health Transactions Annually # of Customers 5000+ clients
Health Care Data Ecosystem IMS Health – Where Does Our Data Come From
Future Data Growth is Exponential Social Media, IOT, Genomics Billions More Transactions Billions of Anonymous Patients
Make a Greater Difference in Patient Healthcare Precision Medicine, Better Outcomes, Propel Research towards Cures Longitudinal Studies Find Patterns Across All Patients Predict and Influence Outcomes Help Reduce Healthcare Costs Clinical Trials and Drug Research Improvements Improve Provider Care
Challenges Obstacles to Realizing the Greater Opportunity Data Silos Reduced Data Currency Analytics Away from the Data Analytics Too Time Consuming and Expensive Cost High on Current Systems
Solution - Patient Longitudinal Records Organized for Fast Access and Reduced Data Shuffle Traditional Warehoused Data Rx n Big Data Factory Dx t Patient Longitudinal Records n t EMR r New Source Type n Each color = Unique de-identified patient ID. Each shape = A type of patient data. Filled shapes = Data of interest Complex Nested Data Type
Solution - Different Storage Engines Storage to Match the Access Pattern Aggregates/Counts Web Speed (ms) Faceted Search Solr Complex Nested Type Patient Longitudinal Records n t Web Applications Fast lookup of longitudinal Entity (i.e. Patient) HBase HUE RDBMS ETL Process Rest Deep Learning Analytics Longer Running Queries (min vs. days) Hive Nested Bucketed JDBC/SQL ETL Process Hive Partitioned BI/DW Workloads SQL
Hadoop Storage Engines Parquet/Hive vs. HBase vs. Kudu
Evolution of Different Storage Engines Storage to Match the Access Pattern with Kudu Complex Nested Type Patient Longitudinal Records n t Aggregates/Counts Web Speed (ms) Faceted Search Solr Web Applications HUE RDBMS Fast lookup of longitudinal Entity (i.e. Patient) ETL Process Rest Deep Learning Analytics Longer Running Queries (min vs. days) JDBC/SQL Kudu BI/DW Workloads SQL
Anonymous Patient Longitudinal Analysis Rx (Prescriptions) and Dx (Medical Claims) Longitudinal Analysis
What does this do for us? Value Proposition See Patterns in Data Explore the Data Before Analysis Variety of Analysis in Parallel Time-to-Value Greatly Increased Reduced Cost Innovation
Rethink Everything! Lessons Learned Technology, Cultural, and Process Management Changes Rethink Everything!
Thank You Navdeep Alam Director of Data Warehousing 9/10/2018 nalam@us.imshealth.com