Download presentation
Presentation is loading. Please wait.
Published byGwendolyn Wells Modified over 6 years ago
1
Data Warehousing and Mining Hospitals and Nursing Homes organized by CMS
Tsegazaab T. Weldegebrial Masters in Health Informatics Marshall University - Huntington, WV, USA May 01, 2017
2
Introduction This data warehouse and mining project is developed to analyze the status of US Hospitals and Nursing Homes which are organized by Center of Medicaid and Medicare Services (CMS). The hospitals are usually funded by Medicare, and the Nursing Homes are CMS certified Nursing Homes. The public data sets used in this project were obtained from CMS (2014 & 2015) and US Census Bureau, and released by AHIMA (American Health Information Management Association) for academic purposes. The data are 100% de-identified.
3
The Datasets Three datasets are used to develop this Data warehouse and Mining project. The first dataset focuses on geographic impact of excessive readmissions and contain the following columns: Hospital name - The name of the hospital State - US states in abbreviated form Region name - South, West, Northeast and Mid-West regions (US regions) Measuring Factors - Heart Failure (HF), Acute Myocardial Infarction (AMI) and Pneumonia (PN) Number of Discharges - Total number of discharges for each hospital associated with HF, AMI and PN Number of Readmissions - Total number of readmissions for each hospital associated with HF, AMI and PN Excess Readmission Ratio - CMS-generated ratio compared to the national average
4
The Datasets … The second dataset discusses the relationship between a quality measure and staffing hours in nursing homes. It focuses on high risk long stay residents. It has the following columns: Nursing home id (unique) Measure code - a code for performance measure Measure description - a description of the performance measure PHRSR - Percent of high risk long stay residents Ownership - the owners of the nursing home; for profits, non profits and government. In Out Hospital - An identifier whether the nursing home is inside hospital or outside hospital
5
The Datasets … Adjusted total Staffing Hours - the total staffing hours per a resident per a day = (CNA hours + LPN hours + RN hours)/Total number of Residents CNA = Certified Nurse Assistant LPN = Licensed Practical Nurse RN = Registered Nurse Adjusted RN total Staffing Hours - the total RN staffing hours per a resident per a day Each category of the owners has sub categories like for profit – corporation, for profit – individual, for profit – partnership, for profit – LLC, and unknown, non profit – church related, non profit – corporation, non profit – others and unknown, government – city, government – county, government – city/county and government – hospital district.
6
The Datasets … The third dataset discusses using the mining techniques to predict volume of US hospitals so that it will be easy to identify which hospitals are more crowded and which are not. This dataset contain the following columns: Hospital id Hospital Ownership – which is considered to be the same with the categories for nursing homes owners Emergency services – whether the hospital gives emergency services or not CPCD - Clinical Process of care domain score PECD - Patient experience of care domain score HSBP - Medicare spending per beneficiary Readmission - Hospital wide 30-day readmission rate
7
The Questions The following are some of the questions to be answered from the Project: Number of Discharges for HF, AMI and PN in the four regions Number of Readmissions for HF, AMI and PN in the four regions Excessive ratio for HF, AMI and PN in the four regions Number of Discharges for HF, AMI and PN in For-profit, Non-profit and Government owned hospitals Number of Readmissions for HF, AMI and PN in For-profit, Non-profit and Government owned hospitals Excessive Ratio for HF, AMI and PN in For-profit, Non-profit and Government owned hospitals
8
The Questions … Compare the percentage of high-risk-long-stay residents between For-profit, Non-profit and Government owned nursing homes Compare the percentage of high-risk-long-stay residents between in-hospital and out-of-hospital nursing homes Compare the rate of adjusted nurse staffing hours per resident per day between for-profit, non-profit and government owned nursing homes Compare the rate of adjusted nurse staffing hours per resident per day between in-hospital and out-of-hospital nursing homes Is there a correlation between the percent of high-risk long stay residents and adjusted nurse staffing hours per resident per day across for-profit, nonprofit and government owned nursing homes Is there a correlation between the percent of high-risk long stay residents and adjusted nurse staffing hours per resident per day across nursing homes in and outside hospital What kind of Prediction will you do from the mining part of the project
9
Data Preprocessing Data Cleaning
Blank spaces, extra spaces and erroneous values are cleaned in Excel The excel files are changed to tab delimited text files to be imported to MS-SQL Server 2014 Data Integration A database is created inside MS SQL Server 2014 in a local server (personal computer) The three flat file format datasets are imported to the database Two fact tables and six dimension tables are developed from the three datasets The fact tables are designed to be two rather than one as the datasets have different facts and they share a common dimension in someway.
10
Data Normalization The three datasets are normalized to the following dimension tables and fact tables:
11
The Dimensions
12
The Cube
13
Statistical Summaries
1. Number of Discharges for HF, AMI and PN among the four regions
14
Statistical Summaries
Number of Readmissions for HF, AMI and PN among the four regions
15
Statistical Summaries
Readmission Excess Ratio for HF, AMI and PN among the four regions
16
Statistical Summaries and Comparisons
Comparison of Excess Readmission Ratio for HF, AMI and PN among the owners
17
Statistical Summaries and Comparisons
Comparison of Discharges, Readmissions and Excess Readmission Ratio for HF, AMI and PN among the four regions
18
Statistical Summaries and Comparisons
Percent of high risk long stay residents in Nursing Homes owned by For-profit, Non-profit and Government owned hospitals
19
Correlation Analysis Correlation of adjusted total staffing hours, RN total staffing hours and percent of high risk long stay residents among the nursing homes inside hospitals and outside hospitals. As we can see from the line chart, they have positive association in all cases.
20
Correlation Analysis A correlation between the percent of high-risk long stay residents and adjusted nurse staffing hours per resident per day among nursing homes owned by for-profit, nonprofit and government. As we can see from the line chart, they have positive association.
21
Prediction of Voluminous US Hospitals
Voluminous US Hospitals are defined in this project as Hospitals having a high volume of discharges, readmissions and excess readmission ratio. This definition may not meet the national standard to say that a hospital is voluminous. High volume of discharges, readmissions and excess readmission ratio are defined as the values greater than average of each of the terms. This in a sense High volume US hospitals are hospitals which have discharges greater than average discharge value and readmissions greater than average readmission value and excess ratio greater than the average excess ratio. The average value was calculated from the three source datasets. It was calculated using excel average() function and the following are the values: Average number of discharges = 339 Average number of readmissions = 68 Average readmission excess ratio =
22
Prediction of Voluminous US Hospitals
New mining table is designed for this task. The mining table has the following columns: My_key (unique) Owners = For profit, Non profit and Government Owners categories Region = South, West, Midwest and Northeast regions State = US States Emergency = An indicator whether the hospital provides an emergency service or not Discharges = Number of discharges per a hospital Readmissions = Number of readmissions per a hospital Excess ratio = Excess readmission ratio per a hospital Volume_of_US_Hospitals = ‘Voluminous’ if discharges>339 and readmission>68 and excess_ratio>1 Volume_of_US_Hospitals = ‘Normal Volume’ if discharges<=339 and readmission<=68 and excess_ratio<=1 Volume_of_US_Hospitals = Unknown otherwise
23
Prediction of Voluminous US Hospitals
The input columns for data mining in this task are the following: Owners = For profit, Non profit and Government Owners categories Region = South, West, Midwest and Northeast regions State = US States Emergency = An indicator whether the hospital provides an emergency service or not Discharges = Number of discharges per a hospital Readmissions = Number of readmissions per a hospital Excess ratio = Excess readmission ratio per a hospital The key column is: My_key (unique) The predictable column is: Volume_of_US_Hospitals = ‘Voluminous’ if discharges>339 and readmission>68 and excess_ratio>1, ‘Normal Volume’ if discharges<=339 and readmission<=68 and excess_ratio<=1, or Unknown otherwise The mining algorithms applied in this project are: Decision Trees, Clustering and Neural Networks
24
Mining by Decision Trees Algorithm
25
Dependency Network
26
Mining by Clustering Algorithm
27
Mining by Neural Networks Algorithm
28
Probability of the Hospitals in West Virginia to be Voluminous
29
Probability of the Hospitals in West Virginia to be Voluminous
As we can see from the predicted result in the previous slide, the probability of the hospitals in West Virginia to be in Normal Volume is almost 0.06% which means, they have a probability of 99.94% to be voluminous. This result agrees with the trends as well as summary results that we see from the number of discharges, number of readmissions and readmission ratio in the previous summary charts. The south region is the tallest in all the measuring factors considered for this project.
30
End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.