1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease.

Slides:



Advertisements
Similar presentations
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Advertisements

Principles of Outbreak Management
Rapid Deployment and Adoption of Health Information Technology for Real Time Biosurveillance Primary support: NCI, NLM, CDC, and the DF/HCC.
Faculty of Computer Science © 2006 CMPUT 605February 11, 2008 A Data Warehouse Architecture for Clinical Data Warehousing Tony R. Sahama and Peter R. Croll.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
SEARO –CSR Early Warning and Surveillance System Module GIS in EWAR.
Overview of Search Engines
Project Communications Management J. S. Chou Assistant Professor.
The API Advantage Utilizing Cloud Data Sources for Transit Modeling Presented By: Catherine Theresa Lawson, PhD Chair, Geography and Planning SUNY at Albany.
Esri International User Conference | San Diego, CA Technical Workshops | Esri Tracking Solutions: Working with real-time data Adam Mollenkopf David Kaiser.
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 19 – GETTING DATA AND VISUALIZING IT SEAN J. TAYLOR.
Preliminary Assessment Tribal Emergency Response Preparedness Dean S. Seneca, MPH, MCURP Agency for Toxic Substances and Disease Registry Centers for Disease.
JumpStart the Regulatory Review: Applying the Right Tools at the Right Time to the Right Audience Lilliam Rosario, Ph.D. Director Office of Computational.
Improving and Advancing Communications Around “Foodborne Illness Source Attribution” Dana Pitts, MPH Associate Director of Communications Division of Foodborne,
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Energy Issues in Data Analytics Domenico Talia Carmela Comito Università della Calabria & CNR-ICAR Italy
Advancing Requirements-Based Testing Models to Reduce Software Defects Craig Hale, Process Improvement Manager and Presenter Mara Brunner, B&M Lead Mike.
Department of Business Information & Analytics MSMESB: Experience with Adding Analytics to the Academic Program Kellie Keeling University of Denver.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
Presented by Abirami Poonkundran.  Introduction  Current Work  Current Tools  Solution  Tesseract  Tesseract Usage Scenarios  Information Flow.
Integrating Digital and Mobile Health: From Next Generation Sensors to Cloud Analytics Speakers: Yohan Lee, PhD; Ernest Sohn DISCLAIMER: The views and.
material assembled from the web pages at
Surveillance, Events and the Semantic Web From E-Gov to Connected Governance: the Role of Cloud Computing, Web 2.0 and Web 3.0 Semantic Technologies Washington,
CSCI 5980: From GPS and Google Earth to Spatial Computing Fall 2012 Midterm Presentation Chapter 7: Architectures Team 9: Thao Nguyen, Nathan Poole October.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
Generating and sharing large datasets: Moving out of our measurement comfort Rita Kukafka and Pamela M. Kato October 16-17, 2012 Bruxelles, Belgique.
Knowing Our Market and Ourselves Rene Seidel The SCAN Foundation & Lori Peterson Collaborative Consulting.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Border Health Alliance Regional Exercise Proposal & Discussion Health Emergency Management Working Group Breakout Session Pacific Northwest Border Health.
Innovative Trends in Providing Global Extreme Weather Warnings Jonathan Porter Vice President, Innovation & Development WMO CAP.
Introduction for Basic Epidemiological Analysis for Surveillance Data National Center for Immunization & Respiratory Diseases Influenza Division.
Developer TECH REFRESH 15 Junho 2015 #pttechrefres h Understand your end-users and your app with Application Insights.
Cloud Market Readiness Report Finance, Media, and Legal Sectors March 2014 Trend Consulting 2013.
11 Mayview Regional Service Area Plan (MRSAP) Tracking: Supporting Individuals in the Community June 18, 2008.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Creating SmartArt 1.Create a slide and select Insert > SmartArt. 2.Choose a SmartArt design and type your text. (Choose any format to start. You can change.
Big Data – Big Opportunity Mohammad Khansari ITRC President Jan 2015 ITRC, Tehran, Iran.
Disease Outbreak Maria del Rosario, MD, MPH Infectious Disease Epidemiology Program WVDHHR/BPH/DSDC February
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Thrust IIB: Dynamic Task Allocation in Remote Multi-robot HRI Jon How (lead) Nick Roy MURI 8 Kickoff Meeting 2007.
Patterns and Trends CE/ENVE 424/524. Classroom Situation Option 1: Stay in Lopata House 22 pros: spacious room desks with chairs built in projector cons:
LEADERSHIP & MANAGEMENT Ülkümen Rodoplu,MD. Who is the leader ? What is leadership ? What is management ? Why do we need this ?
1 APPROVED FOR PUBLIC RELEASE U.S. Army Research, Development and Engineering Command ARL-CISD Social Network Analysis Team Leader Visual Analytics Consortium,
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
Big Data Analysis. Data Mining versus Data Analytics DATA ANALYSIS HYPOTHESIS CONCLUSION.
Unifying Talent Management. Harnessing the Power of Workforce Intelligence in Talent Planning to Drive Business Performance.
© 2016 TM Forum Live! 2016 | 1 Anywhere Point-of-Care Diagnostics Vodafone, Cepheid, Guavus, InSTEDD, FIND.
1 CDC Health Information Exchange (HIE) Accelerating State-wide Public Health Situational Awareness in New York Through Health Information Exchanges August.
Using Analytics and Metrics to Turn App Users into Gold Brian G. Burton, Ed.D. Assistant Professor of Digital Entertainment & Information Technology Abilene.
Big Picture for Success. Areas of Excellence Framework Lead Generation Lead Capture Lead Capture Lead Nurture Lead Nurture Sales Conversion Sales Conversion.
Unlock your Big Data with Analytics and BI on Office365 Brian Culver ● SharePoint Fest Seattle● BI102 ● August 18-20, 2015.
Hadoop in the Wild CMSC 491 Hadoop-Based Distributed Computing Spring 2016 Adam Shook.
3 Ways to Integrate Business Systems to Partners
Daniel R. Harris Center for Clinical and Translational Sciences
Preliminary Assessment Tribal Emergency Response Preparedness
Assistance App Client Name Date.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Introduction to Azure Machine Learning Studio
General stroke hospital
Data Warehousing in the age of Big Data (2)
Web-based Biosurveillance Decision Support Tools: AIDO and RED Alert
Text Analytics and Machine Learning Workshop
MODULE 5 Make the Plan & Test the Plan
Diagnostics and Remedial Measures
Big DATA.
GEO - Define an Architecture Integrated Solutions
Presentation transcript:

1 Catherine Ordun, MBA, MPH May 10, 2016 Challenges and Considerations of Big Data Analytics Workshop on Big Data and Analytics for Infectious Disease Research, Operations, and Policy National Academy of Sciences Pan American Health Organization Washington, D.C.

2 Focusing on three challenges 1.Deciphering a big data architecture. 2.Figuring out where the analytics fit in. 3.How to build analytics applications. –Ex. 1 – NLP –Ex. 2 – Geospatial –Ex. 3 – FDA

The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

One size does not fit all. It begins with your use case, your question forms the blueprint. The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

How much data do you need to start your project? It may not be “big data” The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

Consider the expertise and talent of your current staff. Many times, technologies are by preference and work-arounds are often times feasible. The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

These technologies play a supporting role to your analytics! Don’t be intimidated! The data you’re analyzing. Where the data lives. How it’s queried. How data communicates. Where reports and visualization occurs.

10 Wrangle, Normalize, Clean! Collect and curate data. Extract, Transform, Load data. Store the data. Explore data! Modeling Report, Visualize, Recommend May occur locally, or outside the system There is no My-Statistician! Consider the limitations of your model based on sampling – outliers, small clusters, rare events, missing values, how to maximize predictive power

11 Example 1- Natural language processing on social media data to extract real-time case counts of foodborne illness NLP – tokenization | NLP – normalization | Entity extraction | TF-IDF | Grammar rules | Descriptive analytics | Time series ARIMA

NLTK ANALYTICS: Natural Language Processing (NLP) (TF-IDF, grammar rules, feature extraction) on unstructured data (Twitter) to extract quantitative information, and generate case counts

13 Example 1b - Hypothesis testing and forecasting on multiple data sources for disease awareness Linear regression | Hypothesis testing | Correlation | LOESS | SIR modeling | Time series

14 Amazon Linux AMI (EC2) ANALYTICS: Frequency distribution plots, map with case counts, Hypothesis Testing using Linear Regression, Temporal & Spatial Trend Analysis with Loess Smoothing, SIR model

15 Potentially start a MVP without big data  A minimum viable product (MVP) is the most pared down version of a product that can still be released. 1.It has enough value that people are willing to use it or buy it initially 2.It demonstrates enough future benefit to retain early adopters 3.It provides a feedback loop to guide future development

16 Example 2 - Geospatial analytics for situational awareness in complex emergencies Spatial clustering | Raster analysis | Density computation | Spatial correlations | Spatial trend analysis | Path analysis

17 kml/xml Events Outbreaks Emergencies web client and broker mobile client and broker MapCart EpiGeo An OGC-compliant Geospatial standards API Health Scout Health Diviner Health Trends Vectors Diseases Demographics Environment Land use Transportation Medical Infrastructure Hosts ANALYTICS: Path analysis, spatial and temporal correlations Data standards vary across data types and formats, but need to be considered in the analytics pipeline for data processing and preparation.

18 Example 3 - Signal detection of adverse events and medication errors through public mobile app reporting Pharmacovigilence metrics: GPS | ROR | PRR | IC | Descriptive analytics | Synthetic data generation | Time series

AE reports (FAERS) Synthetic data RAPID data Clinical data csv xls

Security still remains critical. Consider options including encryption of data at rest, data in transit, and/or GovCloud.

21 Summary of Considerations A good use case forms the blueprint. Then you can consider the technologies and data issues. Start with a question Consider your team’s capabilities. Statisticians and SMEs are vital! This is a team sport To start, do you need “big data”? What about an MVP? Don’t over-engineer

22 Thanks for your generous time!   Before we demand more of our data, we need to demand more of ourselves – Nate Silver “ “