Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease.

Similar presentations


Presentation on theme: "1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease."— Presentation transcript:

1 1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease Research, Operations, and Policy National Academy of Sciences Pan American Health Organization Washington, D.C.

2 2 Focusing on three challenges 1.Deciphering a big data architecture. 2.Figuring out where the analytics fit in. 3.How to build analytics applications. –Ex. 1 – NLP –Ex. 2 – Geospatial –Ex. 3 – FDA

3 The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

4 One size does not fit all. It begins with your use case, your question forms the blueprint. The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

5 How much data do you need to start your project? It may not be “big data” The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

6 The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

7 Consider the expertise and talent of your current staff. Many times, technologies are by preference and work-arounds are often times feasible. The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

8 The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

9 These technologies play a supporting role to your analytics! Don’t be intimidated! The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

10 10 Wrangle, Normalize, Clean! Collect and curate data. Extract, Transform, Load data. Store the data. Explore data! Modeling Report, Visualize, Recommend May occur locally, or outside the system There is no 1-800-My-Statistician! Consider the limitations of your model based on sampling – outliers, small clusters, rare events, missing values, how to maximize predictive power

11 11 Example 1- Natural language processing on social media data to extract real-time case counts of foodborne illness NLP – tokenization | NLP – normalization | Entity extraction | TF-IDF | Grammar rules | Descriptive analytics | Time series ARIMA

12 NLTK ANALYTICS: Natural Language Processing (NLP) (TF-IDF, grammar rules, feature extraction) on unstructured data (Twitter) to extract quantitative information, and generate case counts

13 13 Example 1b - Hypothesis testing and forecasting on multiple data sources for disease awareness Linear regression | Hypothesis testing | Correlation | LOESS | SIR modeling | Time series

14 14 Amazon Linux AMI (EC2) ANALYTICS: Frequency distribution plots, map with case counts, Hypothesis Testing using Linear Regression, Temporal & Spatial Trend Analysis with Loess Smoothing, SIR model

15 15 Potentially start a MVP without big data  A minimum viable product (MVP) is the most pared down version of a product that can still be released. 1.It has enough value that people are willing to use it or buy it initially 2.It demonstrates enough future benefit to retain early adopters 3.It provides a feedback loop to guide future development

16 16 Example 2 - Geospatial analytics for situational awareness in complex emergencies Spatial clustering | Raster analysis | Density computation | Spatial correlations | Spatial trend analysis | Path analysis

17 17 kml/xml Events Outbreaks Emergencies web client and broker mobile client and broker MapCart EpiGeo An OGC-compliant Geospatial standards API Health Scout Health Diviner Health Trends Vectors Diseases Demographics Environment Land use Transportation Medical Infrastructure Hosts ANALYTICS: Path analysis, spatial and temporal correlations Data standards vary across data types and formats, but need to be considered in the analytics pipeline for data processing and preparation.

18 18 Example 3 - Signal detection of adverse events and medication errors through public mobile app reporting Pharmacovigilence metrics: GPS | ROR | PRR | IC | Descriptive analytics | Synthetic data generation | Time series

19 AE reports (FAERS) Synthetic data RAPID data Clinical data csv xls

20 Security still remains critical. Consider options including encryption of data at rest, data in transit, and/or GovCloud.

21 21 Summary of Considerations A good use case forms the blueprint. Then you can consider the technologies and data issues. Start with a question Consider your team’s capabilities. Statisticians and SMEs are vital! This is a team sport To start, do you need “big data”? What about an MVP? Don’t over-engineer

22 22 Thanks for your generous time!  Email: ordun_catherine@bah.comordun_catherine@bah.com  Twitter: @nudro Before we demand more of our data, we need to demand more of ourselves – Nate Silver “ “


Download ppt "1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease."

Similar presentations


Ads by Google