Back-End Structures and Front End Visualizations DAMA Minnesota Matthew Israelson 19 November, 2014
About Us IHME is an independent global health research center at the University of Washington Vision: Provide high-quality information on population health, its determinants, and the performance of health systems. Mission: Improve the health of the world’s populations by providing the best information on population health. Method: Produce rigorous and comparable measurements. For general information Phone: Fax:
A Short History Started in 2007 and continuing to grow into 2014 July 2007: Founding of IHME with support from Bill & Melinda Gates Foundation and the state of Washington July 2009: Published the Financing Global Health (FGH) report June 2010: Graduated the first Masters of Public Health March 2011: Launched the Global Health Data Exchange (GHDx) at the Global Health Metrics and Evaluation conference December 2012: The Lancet published The Global Burden of Diseases, Injuries, and Risk Factors Study 2010 (GBD 2010) December 2014: First annual update with GBD 2013
Agenda Back End Structures Front End Visualizations Data Collection Infrastructure Modeling and Analysis Audience Outreach Visualizations
Back-End Structures Data Collection Infrastructure Process Collection Cataloging People Networking Technology Modeling & Analysis GBD Data model Deliverables
Locate data Acquire data Catalog dataExtract data Identify gaps Overview the data cycle Search for new health sources from: Government and NGO websites Databases Expert advice Literature Negotiate with providers for access Formal requests Collaboration DUA / MOU / IRB Payment Add to the GHDx Assign NID Create citation Add keyword Attach files Provide data to our researchers Notify teams Extract data Import to research databases Provide sourcing Analyze results to identify data gaps Years Causes Countries Etc.
What we collect Health Surveys Census Records Surveillance Systems Disease Registries Vital Registration Hospital Records Financial Records Literature Estimates
How we collect it Added 15,000 new sources of data since January Not everything is on the Internet 900+ “high-touch” requests Applications Data Use Agreements IRB Approval Restricted Data A project management tool is essential Adopted JIRA in 2013
Sourcing data Global Health Data Exchange (GHDx) Centralized citation database for IHME Ensures same citation for the same data Allows us to source all data points GHDx Nids All metadata Citations Federated Citations Accessed date Publication status Nids Research databases Nids Not publicly accessiblePublicly accessible
Organizing Data LIVE DEMO
The people Board of Directors & Scientific Oversight Group 210 Employees Professors: 20+ Researchers: 90+ IT: 20+ Staff: Affiliate Professors GBD Expert Collaborators
The GBD network GBD enrolled 1,095 collaborators from 107 countries
Networking as an enabler The collaborative network enhances the GBD Assess the validity of country results Identify missing datasets or incorrect interpretations of data Interpret findings and facilitate country policy translation Assist with acquiring new sources of data Publish papers using GBD results The size of the network demands new ways to manage contacts CRM is an immediate priority
The technical infrastructure Capacity of 250 Terabytes Access limited to IHME Limited Use Access Restricted to named researchers Controlled or sensitive datasets Cluster for running Stata and R jobs (Sun Grid Engine) Largest capacity at the University of Washington Capacity to increase 10x for projections
IT requirements for GBD 12+ major database 8 Servers Cluster (STATA; R) Primary databases for GBD CodShared CovariatesGHDx EpiRisk Idie2GBD results mortalityCodmod GBDvizGBDx2.0 – every day, all day
The Global Burden of Disease (GBD) A systematic, scientific effort to quantify the comparative magnitude of health loss due to diseases, injuries & risk factors. GBD 2010 published by The Lancet GBD 2013 to be published in 2014 Annual updates to follow GBD 2010GBD 2013 Diseases and injuries Sequelae1,1602,435 Risk factors6768 Countries Years
Measuring burden of diseases and injuries
Data Inputs for GBD Population-basedEncounter-levelOther Vital registration Censuses Surveys Verbal autopsy Disease registries Surveillance systems Hospital records Ambulatory records Primary care records Claims data Literature reviews Sensor data Mortuaries/burial sites Police records
Defining analysis Task of the analysts Research Prep data Write code Review estimates Interpret results Publish
Mortality 2 Causes of death 3 Nonfatal health outcomes 4 Risk factors 5 Co- variates 1 YLLs/ YLDs/ DALYs 6 Main components of the data model
Processes within the data model
Deliverables All-cause mortality rates Deaths by cause ( ) Years of life lost (YLLs) Years lived with disability (YLDs) Disability adjusted life years (DALYs) 188 Countries 322 Disease and Injuries 68 Risk Factors Men and Women 20 Age Groups At least 1,000 draw calculations per estimate based on known data points and uncertainty 1.03 billion estimates
COOPER LIVE DEMO
Vignette – Using different sources for COD TypeSite years Coun- tries Vital registration2, Verbal autopsy48666 Cancer registries2,71593 Police reports1, Surveys/ census1,56482 Maternal mortality surveillance 838 Deaths in health facilities 219 Burial and mortuary M deaths back to 1980
Vignette – Garbage codes in VR data
Vignette – Garbage code redistribution
Front End Visualizations Purpose & Audience Traditional Outreach Audience Underlying principles Publications Media Other approaches Interactive Visualizations Key Uses Development Demonstrations
Audience
Communicating data for impact Audiences and characteristics Casual user Data actor Data analyst Researcher Granularity of data Type of tool or visual
Designing for the right audience Casual UserData ActorAnalyst Infographics Illustrative diagrams Narrative visualizations Press releases Reports Briefs Search tools Limited interactive visualizations Query tools Exploratory visualizations API Researcher Query tools Exploratory visualizations Data catalog – repository Methods
IHME outreach Research Articles Policy Reports Brochures Country Profiles Infographics Newsletters Presentations Videos Visualizations
Policy reports, articles & profiles Note …
Infographics Note …
News Articles
Blogging and newsletters
@IHME_UW
Video
Open Source Tools Note …
Key uses for visualizations 1.Review input data & launch models 2.Review results 3.Obtain feedback from collaborators/ experts 4.Communicate results 5.Use as presentation / teaching aid 6.Convince data owners to share data Researchers Different Audiences
The development process 1.Contact product owner 2.Identification of relevant audience(s) 3.Business and technical requirements 4.Creation of appropriate design 5.Development (using Agile/Scrum) 6.Testing & initial user feedback 7.Launch under embargo (journalists) 8.Public launch 9.Feedback collection
Visualizations LIVE DEMO GBD Compare compare/ GBD Cause Patterns visualization/gbd-cause-patterns
Visualizations LIVE DEMO US Health Map Tobacco Burden Visualization Millennium Development Goals
Summary Gather and organize the data Utilize that information Inform and empower your audience Contact me: Matthew Israelson