Download presentation
Presentation is loading. Please wait.
Published byShauna Hicks Modified over 9 years ago
1
Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009 1
2
Relational Database Design 2
3
Normalization Normalization - process of efficiently organizing data in a database to reduce redundancies of data Goal - consistency of data –Store data once and one time only! –security –disk space –speed of queries –efficiency of database updates –data integrity In normalized database no aggregation and no calculated fields 3
4
Data Anomolies 4
5
Unnormalized data set Patient ID NameAddressDOBDocAppt Date LocationDX 111111Cindy Marselis 2320 Edge Hill Road 1/11/64Armstrong9/1/09 11:00 AM Alter 2011Herniated Disc Flu 111111Cindy Marselis 9331 Rising Sun Avenue 1/11/64Morningstar9/1/09 11:00 AM Alter 2011Herniated Disc 111111Cindy Marselis 2320 Edge Hill Road 1/11/64Allen11/1/09 10:00 AM Alter 2012Psoriasis 222222Kathryn Marselis 2320 Edge Hill Road 11/3/04Dershaw8/1/09 11:00 AM Speakman 105 Well baby check 111111Cindy Schwartz 9331 Rising Sun Avenue 1/11/64Armstrong8/11/09 3:00 PM Alter 105Psoriasis Herniated Disc 5
6
Normalized db - before 6
7
Normalized db - after 7
8
Example of Appointment Entity Relationship Diagram 8
9
Structured, free text, unstructured text 9
10
Free text Issues with string searches –Must match exactly in case, punctuation, spelling, etc. Use of lookup tables where possible 10
11
Unstructured Text Gartner: white-collar workers spend from 30 to 40% of time managing documents Merrill Lynch: > 85 % of business information exists as unstructured data –e-mails, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and Web pages. In relational db, data that can't be stored in rows and columns. –stored in a BLOB (binary large object) –e-mail files, word-processing text documents, PowerPoint presentations, JPEG and GIF image files, and MPEG video file Metadata (data about data can be stored) http://www.information- management.com/issues/20030201/6287-1.html 11
12
Approaches to structured and unstructured data 1.Unique database: consolidates all structured and unstructured data together –expensive to buy and maintain –large volume of data can clog the database making it slow and inefficient 2.Use two databases: one structured data, and one for unstructured data. –Avoids performance issues with structured data –significant performance limitations for unstructured data 12
13
Approaches to structured and unstructured data 3. Unstructured data left on file servers with database to record and links to unstructured data files. –Avoids issue with volumes of data –Fragile as links are broken when files and folders moved around. –Must create links every time new document created 4.Complex and expensive connectors used to tap in all databases and file servers providing unified view of data. –Expensive and complex requiring purchase and maintenance of multiple databases and file servers with the added cost of all required connectors. 5. Patents currently under development. 13
14
Certification Commission for Health Information Technology (CCHIT) EHR Construct EMAR: Electronic Medication Administration Record CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule OC/RR: Physician Order Communication/Results Retrieval CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule R-ADT: Registration Admission Discharge Transfer 14
15
Data Warehousing 15
16
16 External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Pressures Driving Need for Business Intelligence and Data Warehousing
17
17 Operational data –Relational, normalized database –Optimized to support transactions –Real time updates Operational vs. Decision Support Data DSS –Snapshot of operational data –Summarized –Large amounts of data Data analyst viewpoint –Timespan –Granularity –Dimensionality
18
18 Creating a Data Warehouse
19
19 Separated from operational environment Integrated Data Historical data over long time horizon Snapshot data captured at given time Subject-oriented data Mainly read-only data with periodic batch updates from operational source, no online updates Codd’s Key Data Warehouse Rules
20
20 Contains different levels of data detail –Current and old detail –Lightly and highly summarized Metadata (data about the data) critical components –Identify and define data elements –Provide the source, transformation, integration, storage, usage, relationships, and history of data elements Codd’s Key Data Warehouse Rules contd.
21
Decision Support Systems DSS 21
22
22 DSS Components
23
Decomposition of DSS – Operational Data o Tumor registry o A/D/T o Radiology narrative o Pathology narrative o Lab results o Patient Accounting o Charge Master 23
24
Decomposition of DSS – External Data o Research spider o Treatment guidelines o Reimbursement schedules o NCI/NIH protocols 24
25
Decomposition of DSS – ETL Rationalize normal lab values Transform gender codes and free text Narrative dumps Doctor cleansing o Similar names o Which practice gets credit? 25
26
26 ETL – Extraction, Transformation, Load Transform : cleanse data for consistency and output exceptions o Apply business rules o Selecting certain columns to load (not null records) o Translating coded values (1, M, male = 0 ) o Derive new calculated value (sale_amount = qty * unit_price) o Join data from multiple sources (lookup, merge) o Aggregate (rollup/summarize data – average LOS for each doctor by DRG) o Transpose/pivot (turning columns into rows) o Data validation. Extract data from source systems Load: data into repository
27
ETL Best Practice 27 ETL: 60-80% of development effort Create multi-departmental team charged with consensus on Transformation! Review exceptions carefully o Indicator of issues with operational db design o Indicator of changes needed in transformation
28
Decomposition of DSS – Business Data Business data – central repository Includes metadata: source, format, timing of feeds CharacteristicFactors IntegratedCentralized Holds data retrieved from entire organization Subject- Oriented Optimized to give answers to diverse questions Used by all functional areas Time VariantFlow of data through time Projected data Non-VolatileData never removed Always growing 28
29
Decomposition of DSS – Business Model Data Comprehensive Cancer Center definition of a patient o Must have seen a physician for suspected or confirmed benign or malignant condition o What about patients seen for screening mammography? 29
30
Decomposition of DSS – End user query tool Web-based or client-server? OLAP – Online Analytic Processing o Microsoft o Business Objects (bought by SAP) o MicroStrategy o Cognos (bought by IBM) o Oracle (includes Hyperion) 30
31
Decomposition of DSS – End –user tool o Drill down functionality o Roll up o Charts – not data level o Export features http://demos.telerik.com/aspnet- ajax/chart/examples/functionality/ drilldown/defaultcs.aspx 31
32
Design – Star Schema 32
33
Star Schema Center fact table o usually contains numeric information for summary reports. Dimension table radiate from fact table Dimension table is hierarchial ‘rollup’ allows to compare types of hospitals, disease categories, or even patient age bands. Creates logical data cube dimensions identifying a set of numeric measurements within the cube. 33
34
34 Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: –Facts –Dimensions –Attributes –Attribute hierarchies Star Schema
35
35 Simple Star Schema
36
Star Schema 36
37
Entity Relationship Diagram 37
38
Analysis 38
39
39 Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP –Use multidimensional data analysis techniques –Provide advanced database support –Provide easy-to-use end-user interfaces –Support client/server architecture Online Analytical Processing (OLAP)
40
Healthcare Cube – slice and dice view Diagnosis Time Physician Time Strategic Period YearQuarterMonthWeekDayShiftHour Provider ClinicSpecialtyGroupPhysician 40
41
Dashboard 41
42
Scorecard including Key Performace Indicators (KPI) Risk-adjusted mortality index Risk-adjusted complications index Risk-adjusted patient safety index Severity-adjusted average length of stay Expense per adjusted discharge, case mix- and wage-adjusted 42
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.