Bob Marshall, MD MPH MISM DoD Clinical Informatics Fellowship

Slides:



Advertisements
Similar presentations
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Advertisements

Modeling and simulation of systems Slovak University of Technology Faculty of Material Science and Technology in Trnava.
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Overview of Nursing Informatics
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
Classical Techniques: Statistics, Neighborhoods, and Clustering.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
Classification and Prediction: Regression Analysis
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Data Mining Techniques
By Saparila Worokinasih
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Study Designs Afshin Ostovar Bushehr University of Medical Sciences Bushehr, /4/20151.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 Copyright © 2011 by Saunders, an imprint of Elsevier Inc. Chapter 8 Clarifying Quantitative Research Designs.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Research Design. Selecting the Appropriate Research Design A research design is basically a plan or strategy for conducting one’s research. It serves.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
The purposes of nursing theory?
Logistic Regression: Regression with a Binary Dependent Variable.
Statistica /Statistics Statistics is a discipline that has as its goal the study of quantity and quality of a particular phenomenon in conditions of.
Statistics & Evidence-Based Practice
Quantitative Methods for Business Studies
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS 9633 Machine Learning Support Vector Machines
DATA COLLECTION METHODS IN NURSING RESEARCH
Data Transformation: Normalization
Chapter 7. Classification and Prediction
By Arijit Chatterjee Dr
as presented on that date, with special formatting removed
Forecasting Methods Dr. T. T. Kachwala.
Regression Analysis Module 3.
Machine Learning Logistic Regression
Classification of Research
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
© LOUIS COHEN, LAWRENCE MANION AND KEITH MORRISON
Strategies to incorporate pharmacoeconomics into pharmacotherapy
Machine Learning Basics
Overview of Supervised Learning
Conceptual Frameworks, Models, and Theories
Machine Learning Logistic Regression
Introductory Econometrics
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Analyzing Reliability and Validity in Outcomes Assessment Part 1
11/20/2018 Study Types.
MIS2502: Data Analytics Classification using Decision Trees
Pre-Activity: 1. Recap? 2. Research Says?
Supporting End-User Access
Selecting the Right Predictors
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
MECH 3550 : Simulation & Visualization
Analyzing Reliability and Validity in Outcomes Assessment
DESIGN OF EXPERIMENTS by R. C. Baker
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Regression and Clinical prediction models
Is Statistics=Data Science
Presentation transcript:

Bob Marshall, MD MPH MISM DoD Clinical Informatics Fellowship Predictive Analytics Bob Marshall, MD MPH MISM DoD Clinical Informatics Fellowship

Definition Branch of data mining concerned with prediction of future probabilities/outcomes and trends Central element of predictive analytics is the predictor, variable that can be measured for individual or other entity to predict future behavior Multiple predictors combined into predictive model, which, when subjected to analysis, can be used to forecast future probabilities with an acceptable level of reliability

Definition cont In predictive modeling, data is collected, statistical model formulated, predictions are made and the model is validated (or revised) as additional data becomes available Model may employ simple linear equation or complex neural network, mapped out by sophisticated software Models capture relationships among many factors to allow assessment of risk or potential (i.e., future) associated with particular set of conditions

End Effect Defining functional effect of these technical approaches is that predictive analytics provides a predictive score (probability) for each individual in order to determine, inform, or influence organizational processes that pertain across large numbers of individuals Core of predictive analytics relies on capturing relationships between explanatory variables and predicted variables from past occurrences, and exploiting them to predict the unknown outcome

Dependencies Accuracy and usability of results depend greatly on the level of data analysis and the quality of assumptions Also very dependent on the accuracy or integrity of the different variable/factor data Inaccurate variable/factor date = inaccurate predictions

Predictive Analytics and Big Data Predictive analytics is enabler of big data Predictive analytics enable organizations to use big data (both stored and real time) to move from an historical view to a forward-looking perspective of the customer/ patient

Predictive Models models of the relation between the specific performance of a unit in a sample and one or more known attributes or features of the unit. The objective of the model is to assess the likelihood that a similar unit in a different sample will exhibit the specific performance. Predictive models often perform calculations during live transactions, for example, to evaluate the risk or opportunity of a given customer or transaction, in order to guide a decision

Predictive Models Available sample units with known attributes and known performances are referred to as the “training sample.” Units in other samples, with known attributes but unknown performances, are referred to as “out of [training] sample” units. Out of sample units bear no chronological relation to training sample units The out of sample unit may be from the same time as the training units, from a previous time, or from a future time

Examples Example 1: training sample may consists of literary attributes of writings by Victorian authors, with known attribution, and the out of sample unit may be newly found writing with unknown authorship; a predictive model may aid in attributing a work to a known author. Example 2: analysis of blood splatter in simulated crime scenes in which the out of sample unit is the actual blood splatter pattern from a crime scene

Descriptive Models Descriptive models quantify relationships in data in a way that is often used to classify customers or prospects into groups. Unlike predictive models that focus on predicting a single customer behavior (such as credit risk), descriptive models identify many different relationships between customers or products. Descriptive models can be used, for example, to categorize customers by their product preferences and life stage. Descriptive modeling tools can be utilized to develop further models that can simulate large number of individualized agents and make predictions

Decision Models Decision models describe relationship between all elements of a decision —known data (including results of predictive models), decision, and forecast results of the decision — in order to predict results of decisions involving many variables Decision models generally used to develop decision logic or set of business rules that will produce desired action for every customer (patient) or circumstance

Clinical Decision Support Predictive analysis in health care primarily used to determine which patients at risk of developing certain conditions, like diabetes, asthma, heart disease, and other lifetime illnesses Additionally, sophisticated clinical decision support systems incorporate predictive analytics to support medical decision making at the point of care Working definition proposed by Robert Hayward of the Centre for Health Evidence: “Clinical Decision Support Systems link health observations with health knowledge to influence health choices by clinicians for improved health care.”

Population Health Can use predictive analytics to forecast disease burden and healthcare resource utilization rates for a given population Can use zip code analysis for median income, age distribution, gender prevalence, education level, crime statistics, and more readily available historical/demographic factors Combine these with national statistics on disease burden, accidents/trauma and healthcare utilization rates to predict future use/needs

Analytic Techniques The approaches and techniques used to conduct predictive analytics can broadly be grouped into regression techniques and machine learning techniques Regression models are mainstay of predictive analytics. Focus lies on establishing mathematical equation as model to represent interactions between the different variables in consideration. Machine learning originally employed to develop techniques to enable computers to learn For very complex underlying relationships with unknown dependencies, machine learning techniques can emulate human cognition and learn from training examples to predict future events

Linear Regression Model Analyzes relationship between response or dependent variable and set of independent or predictor variables Relationship expressed as an equation that predicts response variable as linear function of the parameters Parameters adjusted so that a measure of fit is optimized Much of the effort in model fitting focused on minimizing size of residual, as well as ensuring that it is randomly distributed with respect to model predictions Generally used when response variable is continuous and has an unbounded range

Discrete Choice Models Response variable may not be continuous but rather discrete While mathematically feasible to apply multivariate regression to discrete ordered dependent variables, some of the assumptions behind the theory of multivariate linear regression no longer hold Are other techniques, such as discrete choice models, better suited for this type of analysis If dependent variable discrete, some of those superior methods are logistic regression, multinomial logit and probit models Logistic regression and probit models used when dependent variable is binary

Time Series Models Time series models used for predicting or forecasting future behavior of variables These models account for fact that data points taken over time may have internal structure (such as autocorrelation, trend or seasonal variation) that should be accounted for Modeling dynamic path of a variable can improve forecasts since predictable component of series can be projected into the future Two commonly used forms of these models are autoregressive models (AR) and moving-average (MA) models

Survival/Duration Analysis Survival analysis is another name for time to event analysis These techniques were primarily developed in the medical and biological sciences Important concept in survival analysis is hazard rate, defined as probability that the event will occur at time t conditional on surviving until time t Another concept related to hazard rate is survival function which can be defined as probability of surviving to time t

Machine Learning Techniques Neural networks Multilayer perceptron (MLP) Radial basis functions (RBF) Support vector machines (SVM) Naïve Bayes k – nearest neighbors (KNN) Geospatial predictive modeling

Implementing Predictive Analytics The first thing you need to get started using predictive  analytics is a problem to solve Second, you’ll need data You will need someone with data management experience to help cleanse and prep the data for analysis You need someone who understands both the data and the problem to be solved to the prepare the data for predictive modeling How you define your target is essential to how you can interpret the outcome After that, the predictive model building begins

Questions