Download presentation
Presentation is loading. Please wait.
Published byAusten McCarthy Modified over 6 years ago
1
University of Houston-Clear Lake Kaiser Permanente San Jose
Trend in the Leading Causes of Death in the USA: A Case Study using a Data Warehouse and OLAP Cube Mohammad A. Rob University of Houston-Clear Lake Farhana Rob Kaiser Permanente San Jose
2
Presentation Outline introduction The Raw Data Why Data Warehousing?
Designing the Data Warehouse Designing the OLAP Cube OLAP Reports Conclusion
3
INTRODUCTION This paper presents how a large amount unstructured mortality data can be organized into a data warehouse and then using an OLAP cube key information can be presented. The OLAP reports show the shifting trend in the leading causes of deaths in the USA. The Reports present the top six causes of deaths by Location, Time, Age Group, and race. Knowing these leading causes of deaths will help general public to take preventive actions.
4
The Raw Data Center for Disease Control (CDC) Publishes Data for various causes of deaths in the USA. ty-health-status-indicators-chsi-combat- obesity-heart-disease-and-cancer It includes data for 3141 US Counties of All States, for many years, with various ages and races. However, these data are not organized to make any conclusion by the state, disease, race, year, or age group.
5
The Raw Data Source
6
The Raw Data Data can be downloaded in CSV Format for a selected number of years, which can be opened in Microsoft Excel. There are 13 data files with thousands of records and about hundred attribute values. We have downloaded data for three years ( ) to show our proof of concept.
7
sample Raw Data in Excel
8
Why Data Warehousing? Data warehouse allows a significant large amount of data to be stored in a particular format so that users can query the data in a variety of ways to obtain business intelligence. Typically a dimensional model or star schema is used to design the data warehouse that simplifies query processing of a large amount of data. Online Analytical Processing (OLAP) tool can be used to present the data, that provides an interactive interface to top management to create Reports on an ad hoc basis.
9
Why Data Warehousing? OLAP CUBE The Concept of the Data warehouse as the Back end and the OLAP Cube as the Front end. DATA WAREHOUSE
10
Designing the Data Warehouse
Before designing the data warehouse, Raw data needed to be cleaned and formatted to fit into dimensions and Facts. From our raw data, we needed to filter and drop many columns so that we could focus on important dimensions like the cause of death, time, location, race, and age group. The fact is the Number of Deaths. The Dimensions and Facts are organized into separate Excel sheets with a Primary Key (PK) for each Dimension and Foreign Keys (FKs) in the fact sheet. All Excel data are then transferred into a Microsoft Access database.
11
Designing the Data Warehouse
From the Access Database, data were then transferred to a Microsoft SQL Server data Warehouse. The Dimensional Hierarchies allow browsing summarized data in various levels of details: The Location Dimension has a hierarchy like: County -> State The Time Dimension has a hierarchy like: Month -> Quarter -> Year.
12
Designing the Data Warehouse
Dimensions and Hierarchies
13
Designing the Data Warehouse
The Fact Table contains the Foreign Keys from the Dimension Tables and the Measures (Number of Deaths and anything measurable like population in our case)
14
Designing the Data Warehouse
The Dimensional Model or STAR Schema of our Project
15
Designing the olap Cube
The OLAP Cube was created using Microsoft Visual Studio Business Intelligence Tool in Conjunction with SQL Server Analysis Services. It allows data from the data warehouse to be summarized in a variety of ways. CUBE allows browsing summarized data like slicing, dicing, roll-up, drill-down and pivoting. Data from the CUBE is exported in a Microsoft Excel Pivot Table for Analysis and Reporting.
16
A View of the olap Cube
17
Olap Reports Overall Summary Report: Total Number of Death by Year: Somewhat more in 2014
18
Olap Reports Number of Death by Year by top six Causes: Heart disease and Cancer are leading causes followed by injuries
19
Olap Reports Number of Deaths due to Cancer for All States in three years: Texas is leading followed by Georgia, Virginia…
20
Olap Reports Further Drill-Down on Texas Counties: Number of Deaths by Cancer for All Counties in the state of Texas
21
Olap Reports Drill-Down to Time Dimension:
Number of Deaths due to Injuries in various Quarters for the Year 2012
22
Olap Reports Drill-Down to Age Group Dimension:
Number of Deaths due to various causes for different age groups: Heart Disease and cancer are leading causes followed by injuries.
23
Olap Reports Drill-Down to Race Dimension: Number of Deaths due to various causes for various Race groups: Again Heart Disease and Cancer are the leading causes of death for each Race followed by Injuries.
24
conclusion we have discussed how data warehouse can be used to store a large amount of data in a suitable format after going through a cleaning and formatting process of Data. We have also discussed how an olap cube can be created from the data warehouse to display various ad hoc reports in various details. In our example problem, we have shown summarized reports in a variety of ways using various dimensional attributes.
25
conclusion It is found that the number of deaths is increasing in each year. Heart disease in the #1 Cause of Death amounting to about 28% of the total deaths. About 21% of the deaths are due to cancer. Among the younger age groups (<35 years), major cause of death is injuries. Among the older age groups (>40 years), major cause of death is due to heart disease and cancer.
26
Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.