Download presentation
Presentation is loading. Please wait.
1
Data Science for Affordable Care Act Data
Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community May 5, 2015
2
Overview Previous Data Science for HHS:
HealthData.gov and HealthCare.gov IDEALab and Demand-Driven Open Data Member Mary Galvin: MarkLogic Work for CMS HealthCare.gov: Health Insurance Marketplace Data Services Hub Member Chris Thompson: Freely Available ACA Data Sets: Health and Dental Plan Datasets for Researchers and Issuers Health Insurance Marketplace Public Use Files (Marketplace PUF) Data.HealthCare.gov: 147 Total of 9 Types 73 Datasets Data Science ACA Data: 2015 Plan Data Benefits and Costs Sharing PUF
3
Previous Data Science for HHS
Healthcare.gov Data Science and Be Informed Prototype Video (February 2014) Data Science for EHRs, and A Robust Health Data Infrastructure (August 2014) Data Science Data Publication for FDA Data (October 2014) Data Science for HealthData.gov Developers and Data Science for FDA RFI (April 2015) David Portnoy, HHS IDEA Lab External Entrepreneur HHS IDEA LAB: Demand-Driven Open Data (April 2015)
4
MarkLogic Work for CMS HealthCare.gov: Background
Healthcare.gov is overseen by the Centers for Medicare and Medicaid Services, or CMS. CMS is a federal agency within the Department of Health and Human Services, and is responsible for the implementation of healthcare.gov—the key part of the implementation of ACA. ACA was signed into law by the President on March 23rd, 2010, with the intent to provide insurance to millions of Americans, and eventually help reduce costs and improve care in a nationwide healthcare system that has serious problems (The U.S. spends over 17% of its GDP on healthcare but key metrics indicate the high rate of expenditures does not equate to healthier outcomes).
5
MarkLogic Work for CMS HealthCare.gov: Complexity
Initially, no one imagined how difficult it would be to implement the ACA. Healthcare.gov has been described as “one of the most complex IT projects the federal government has ever undertaken,” by a senior technology expert. Many people didn’t think the project would even get off the ground. In fact, the majority of major government IT projects are not successful. One study showed that only 6.4% of government IT projects were successful (Of 3,555 projects from 2003 to 2012 that had labor costs of at least $10 million, only 6.4% were successful. The Standish data showed that 52% of the large projects were "challenged," meaning they were over budget, behind schedule or didn't meet user expectations. The remaining 41.4% were failures -- they were either abandoned or started anew from scratch). The healthcare.gov website was not just a website. It was an extremely complicated project involving lots of players, with a strict deadline governed by policy makers. And, unlike sites such as Amazon, Facebook, and others, the healthcare.gov website had to scale from day 1. There was no slow ramp-up period to work out any glitches. The site had to process transactions to provide health insurance for millions of Americas right out of the gates. As we go through this case study, you’ll see why the traditional approach to building a large-scale web application was impossible, why healthcare.gov required a NoSQL solution, and why MarkLogic was the perfect fit for the project in order to make it successful. Interesting to Note: HHS is in fact responsible for about 25% of all federal outlays. For CMS and the implementation of ACA in particular, the 2014 budget request is for $1.9B, pg 29). But, the success or failure of the entire program depends on the underlying database to some extent, which is only about .01% of that entire budget.
6
MarkLogic Work for CMS HealthCare.gov: Before and After
Before MarkLogic: Unable to handle complexity Impossible data model Development too slow Limited scalability Inflexible to change After MarkLogic: Built for Today’s Data Schema-agnostic data model that could handle various data sources and adapt to later changes with policies and regulations Agile Development 18-month timeframe from procurement to launch for what has been called “the most complex government-IT project of all-time” Secure and Trusted Did not have to sacrifice any of the enterprise features required, and could rely on a system with government-grade security, ACID transactions, and HA/DR Successful Deployment Over 8 Million people signed up for health insurance in less than 5 months The Centers for Medicare and Medicaid Services was mandated by ACA to launch healthcare.gov in November 2014, but had a problem. It was 18 months before launch and development was moving too slow, and the team could not develop a viable data model using Oracle. They had to develop a Health Insurance Marketplace (HIM) and Data Services Hub (DSH) that would allow millions of Americans to shop for insurance, check their eligibility against dozens of federal and commercial data sources, and provide a way for state health exchanges to connect as well.
7
MarkLogic Work for CMS HealthCare.gov: Architecture
When the Affordability Care Act passed, it required that the Centers for Medicare and Medicaid Services (CMS) build a health insurance exchange (HIX) which would allow everyone in the US to search, select and enroll in an insurance plan. CMS needed to aggregate data from 50 states and many more insurance providers, and present the information in an easy-to-read format for subscribers/patients. The HIM: The HIM is used to gather and manage health insurance plan information delivered through healthcare.gov. Twenty-seven states are leveraging this platform directly, while 16 states and the District of Columbia opted to build their own health insurance exchanges. Seven states chose to operate in hybrid state and federal partnerships. The DSH: The DSH enables communication between the state exchanges and various federal agencies (e.g. Department of Homeland Security, Internal Revenue Service and Social Security Administration) to assess individual eligibility and facilitate enrollment. The DSH provides a front-end application that allows CMS to perform operational reporting. States can plug directly into the DSH; they don’t have to develop their own systems to perform income verification and eligibility determinations. This reduces costs and improves reliability. Problems in Oregon: Most states were unable to meet the challenge of building an exchange that would be operational by October 1, 2014, even with large federal subsidies including a 90/10 match of funds. The irony is that all the blue states that tried to build their own exchanges in compliance with the law generally failed to create anything that worked. In fact, Oregon is suing Oracle because they spent $200M on a site that never worked. Link to actual healthcare relational model shown above: Health Insurance Payers (1000s) Income and Eligibility Confirmation (Multiple) State Exchanges (50 states, etc.) HIM (Health Information Marketplace) DSH (Data Service Hub) Millions of Subscribers (30-50)
8
MarkLogic Work for CMS HealthCare.gov: Built Out
CMS eventually chose to use MarkLogic and with 18 months until the launch deadline, the team began building on MarkLogic, using it for both the HIM and DSH. (Note: The Deputy CIO at CMS, Henry Chao, who championed the NoSQL approach, and MarkLogic in particular). Because of MarkLogic’s flexible schema, they had a future-proof database that could adapt to changing policy and to the different data models being used by the dozens of entities engaging with the site’s backend. The site was also incredibly scalable and the backend database portion of the site never experienced any issues with scaling. The HIM had issues at the start, which were primarily due to poor decision making in using a model driven architecture. The DSH did not use that approach, and thus never had any issues, even at launch. At the end of the day, MarkLogic performed extremely well under high load volumes, performing thousand of transactions per second and supporting hundreds of thousands of concurrent users. And, CMS didn’t require an army of DBAs to manage MarkLogic either. They may have needed that army for the other components, but not MarkLogic. CMS trusted MarkLogic from the start, and continues to trust MarkLogic in every way. Where the project went wrong at the start has been made clear, and there are new contractors in place. The government has increased its investment in MarkLogic, and MarkLogic will continue to serve as the underlying database for various components across the healthcare.gov infrastructure.
9
MarkLogic Work for CMS HealthCare.gov: Numbers verified
8,019,763 people selected Marketplace plans from October 1, 2014, through March 31, 2014, (including additional Special Enrollment Period activity through April 19th). Nearly 2.6 million signed up in the State Based Marketplaces and over 5.4 million in the Federally-facilitated Marketplace. About 3.8 million people, including nearly 1.2 million young adults (ages 18 – 34), enrolled in the Health Insurance Marketplace plans in the sixth and final reporting period, which began March 2 and concluded on April 19. Those 3.8 million individuals represent nearly 90 percent growth over February’s cumulative enrollment. Of the more than 8 million: 54 percent are female and 46 percent are male; 34 percent are under age 35; 28 percent are between the ages of 18 and 34; 65 percent selected a Silver plan, while 20 percent selected a Bronze plan; and, 85 percent selected a plan with financial assistance. Other numbers verified by MarkLogic team members as of August 2014. 8,000,000+ new beneficiaries 150,000+ concurrent users 0 zero data loss
10
HealthCare.gov: Health and dental datasets for researchers and issuers
11
Health Insurance Marketplace Public Use Files (Marketplace PUF)
12
Marketplace PUF Download
Before You Download: The Benefits and Cost Sharing PUF and the Rate PUF datasets included in the zip files are extremely large and may be burdensome to download and/or cause computer performance issues. Downloading the files with the assistance of the Akamai Download Manager application should make downloading the data easier by offering the option to pause and restart downloading to minimize resource allocation impact. Be advised that the file size, once downloaded, may still be prohibitive if you are not using a robust data viewing application. Microsoft Excel has limitations on the number of records it can display, which these two files exceed. My Experience: I decided to download these files directly without the Akamai Download Manager application because I have broadband Internet connection and a newer 64 bit PC. I thought: What if I could download and use all of these in Spotfire as Big Data, because it was only 7 ZIP files of 72 MB total, that in turn totaled 1.5 GB of CSV (Not Excel), that became only a 35 MB Spotfire File!
13
Marketplace PUF Benefits and Cost Sharing PUF (BenCS-PUF) – Plan-level data on essential health benefits, coverage limits, and cost sharing. Rate PUF (Rate-PUF) – Plan-level data on individual rates based on an eligible subscriber’s age, tobacco use, and geographic location. Plan Attributes PUF (Plan-PUF) – Plan-level data on maximum out of pocket payments, deductibles, cost sharing, HSA eligibility, formulary ID, and other plan attributes. Business Rules PUF (BR-PUF) – Plan-level data on the application of rates, such as allowed relationships (e.g., spouse, dependents) and tobacco use. Service Area PUF (SA-PUF) – Issuer-level data on the geographic coverage or service area (i.e., where the plan is offered) including state, county, and zip code. Network (Ntwrk-PUF) – Issuer-level data identifying provider network URLs. Plan ID Crosswalk PUF (CW-PUF) – Plan-level data mapping plans offered in 2014 to plans offered in 2015.
14
Key Data Science Questions
How was the data collected? By the CMS ACA program with documentation in PDF files How was the data stored? In two very large Zipped files and 5 smaller Zipped files of CSV that were easily imported into Spotfire were they are now stored in a much more compressed and reusable form for analytics and visualizations. What are the data results? The initial analytics and visualizations are show in the Spotfire Dashboard with one Tab for each of the seven data sets. Why should we believe the data results? Because the data are produced by CMS with documentation and made publicly available for scrutiny.
15
Data.HealthCare.gov: Datasets
All: 147 Charts: 9 Maps: 1 Calendars: 0 Filtered Views: 3 External Datasets: 0 Files and Documents: 36 Forms: 4 APIs: 30 Datasets: 73
16
Marketplace PUF Information: PDF to MindTouch Knowledge Base
General Information Factsheet Frequently Asked Questions Data Disclaimer - User Agreement Benefits and Cost Sharing Data Dictionary Rate Data Dictionary Plan Attributes Data Dictionary Business Rules Data Dictionary Service Area Data Dictionary Network Data Dictionary Plan ID Crosswalk Data Dictionary
17
General Information Factsheet: MindTouch Table for Spreadsheet
Table 3.1 File Format Descriptions for 2015 Marketplace PUF
18
ACA Data Spreadsheet Knowledge Base
ACAData.xlsx
19
ACA Data Spotfire Knowledge Base: 1
20
ACA Data Spotfire Knowledge Base: 2
21
ACA Data Spotfire Knowledge Base: 3
22
ACA Data Spotfire Knowledge Base: 4
23
ACA Data Spotfire Knowledge Base: 5
24
Conclusions and Recommendations
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.