September 11, Ian R Brooks Ph.D.

Slides:

Advertisements

Similar presentations

Data in R. General form of data ID numberSexWeightLengthDiseased… 112m … 256f3.61 NA1… 3……………… 4……………… n91m5.1711… NOTE: A DATASET IS NOT A MATRIX!

Advertisements

Management Information Systems, Sixth Edition

BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.

Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.

United Nations Economic Commission for Europe Statistical Division NTTS 2015 – Satellite Workshop on Big Data March 9, 2015 The Big Data Project – The.

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,

“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”

` tuplejump The data engineering platform. A startup with a vision to simplify data engineering and empower the next generation of data powered miracles!

Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.

USING MULTIPLE PERSISTENCE LAYERS IN SPARK TO BUILD A SCALABLE PREDICTION ENGINE Richard Williamson

Institute for Personal Robots in Education (IPRE)‏ CSC 170 Computing: Science and Creativity.

INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.

Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.

Andy Roberts Data Architect

Dato Confidential 1 Danny Bickson Co-Founder. Dato Confidential 2 Successful apps in 2015 must be intelligent Machine learning key to next-gen apps Recommenders.

AZURE MACHINE LEARNING Bringing New Value To Old Data SQL Saturday #

CPSC8985 FA 2015 Team C3 DATA MIGRATION FROM RDBMS TO HADOOP By Naga Sruthi Tiyyagura Monika RallabandiRadhakrishna Nalluri.

A Suite of Products that allow you to Predict Outcomes, Prescribe Actions and Automate Decisions.

Introduction to Mongo DB(NO SQL data Base)

Big Data Analytics and HPC Platforms

Platform as a Service (PaaS)

Connected Infrastructure

Data Platform and Analytics Foundational Training

Big Data is a Big Deal!.

Hadoop and Spark Dynamic Data Models Amila Kottege Software Developer

Platform as a Service (PaaS)

SAS users meeting in Halifax

Big Data Enterprise Patterns

Sushant Ahuja, Cassio Cristovao, Sameep Mohta

Introduction to Spark Streaming for Real Time data analysis

Hadoop and Analytics at CERN IT

Metis Data Science Meetup:

INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER

The importance of being Connected

Sqoop Mr. Sriram

Big-Data Fundamentals

Connected Infrastructure

Building Analytics At Scale With USQL and C#

A Big Data Cheat Sheet: The Big Pharma Edition

Open Source on .NET A real world use case.

ICT Database Lesson 1 What is a Database?.

SNOW ONLINE TRAINING IN HYDERABAD

Azure Machine Learning & ML Studio

Operationalize your data lake Accelerate business insight

Designed for Big Data Visual Analytics, Zoomdata Allows Business Users to Quickly Connect, Stream, and Visualize Data in the Microsoft Azure Platform MICROSOFT.

Big Data - in Performance Engineering

Introduction to PIG, HIVE, HBASE & ZOOKEEPER

Design engineer deliver.

Ch 4. The Evolution of Analytic Scalability

Data science and machine learning at scale, powered by Jupyter

Overview of big data tools

Introduction to SAP HANA

Big Data Young Lee BUS 550.

Databricks: the new kid on the block

Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve.

relational thoughts on NoSql

Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper

Agenda Need of Cloud Computing What is Cloud Computing

ETL Patterns in the Cloud with Azure Data Factory

IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.

The Student’s Guide to Apache Spark

UCLA Health Data Analytics Strategy

Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.

UNIT 6 RECENT TRENDS.

Continuous Integration and Delivery (CI/CD) in Azure Data Factory

SQL Server 2019 Bringing Apache Spark to SQL Server

The “perfect” data platform

Pig Hive HBase Zookeeper

Presentation transcript:

September 11, 2018 - Ian R Brooks Ph.D. A New Hope: Improving Patient Screening By Applying Predictive Analytics To Electronic Medical Records

Who Is The Data Jedi? Completed MS in Computer Science in 2008. Software Engineer at Flextronics Medical and Peterbilt Motor Company. Completed Ph.D. in Computer Science in 2015. Involved in Big Data and Hadoop since 2015. Joined Hortonworks in 2016 DS SME Lead 2018 Who Is The Data Jedi?

Why Are We Here Today? Commons Problems for IT Large volume of EMR Data that pushes limits of modern architecture Variety of data sources from traditional sources and IoT data sources make it difficult manage Why Are We Here Today?

Commons Problems For Data Science and Engineering Teams All data of their organization’s isn’t available to them Models are easy to build difficult to share leaving them in isolation Deploying pipelines models into their IT infrastructure is easy Accessing and storing model results Why Else Are We Here?

Electronic Medical Records The dataset used for the demo are from EMR Bots at http://www.emrbots.org/ Synthetic EMR data that can used to without worrying about patient privacy concerns Table 1 – Patient information Table 2 – Patient admission Table 3 – Lab results Table 4 – Patient Diagnosis Electronic Medical Records

End To End Work Flow Load data files & explore dataset Sessionalize patient data Feature engineering Pipeline model building Model deployment End To End Work Flow

Load Files and Explore Dataset Load data files into Apache Spark and build a Spark Dataframe, which will include rows and columns Dataframes can be saved as temporary tables and queried used Spark SQL Load Files and Explore Dataset

Sessionalize Patient Data Find all patients ID with and without target condition. Sessonalize all related patient data to patient IDs Patient information Lab results Diagnosis records Merge the two data sets Sessionalize Patient Data

Feature Engineering String Index the categorical variables Select features to send to model Build feature vector Feature Engineering

Pipeline Model Building Apache Spark provides an API to build pipeline models Pipeline models encapsulate all of the transformation steps that are required to build the model’s features Once the model is exported, predictions can be made on raw inputs Pipeline Model Building

Model Deployment Once model has been built, it can be deployed Model deployment options often provide an RESTful interface The working interface can be embedded into an Apache NiFi flow Model predictions can now be automated into a work flow Model Deployment

Reference Architecture

Divisions Of Roles

Final Thoughts Highly scalable storage platform Flexible storage platform for both structured and unstructured datasets Centralized data repository Tools and libraries to build pipeline models Model deployment into an existing IT infrastructure Final Thoughts

No Questions, Only Answers Thanks for coming Y’all!