Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18, 2009 1.

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

Chapter 13 The Data Warehouse
C6 Databases.
Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
Lecture-7/ T. Nouf Almujally
Technical BI Project Lifecycle
Management Information Systems, Sixth Edition
Data Warehousing M R BRAHMAM.
Introduction to Data Warehouse and Data Mining MIS 2502 Data Analytics
Chapter 3 Database Management
13 Chapter 13 The Data Warehouse Hachim Haddouti.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
Lab3 CPIT 440 Data Mining and Warehouse.
Chapter 13 The Data Warehouse
An Overview of Data Warehousing and OLTP Technology Presenter: Parminder Jeet Kaur Discussion Lead: Kailang.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Components of the Data Warehouse Michael A. Fudge, Jr.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
ETL By Dr. Gabriel.
ITEC 3220A Using and Designing Database Systems
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Data Warehouse & Data Mining
Database Systems – Data Warehousing
Systems analysis and design, 6th edition Dennis, wixom, and roth
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Datawarehouse Objectives
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
1 Data Warehouses BUAD/American University Data Warehouses.
13 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
Data Warehousing.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
Decision Support and Date Warehouse Jingyi Lu. Outline Decision Support System OLAP vs. OLTP What is Date Warehouse? Dimensional Modeling Extract, Transform,
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
13 1 Chapter 13 The Data Warehouse Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
Fox MIS Spring 2011 Data Warehouse Week 8 Introduction of Data Warehouse Multidimensional Analysis: OLAP.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 13 Business Intelligence and Data Warehouses.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Building Dashboards SharePoint and Business Intelligence.
Foundations of Business Intelligence: Databases and Information Management.
7 Strategies for Extracting, Transforming, and Loading.
Business Intelligence Training Siemens Engineering Pakistan Zeeshan Shah December 07, 2009.
Advanced Database Concepts
1 Database Systems, 8 th Edition 1 Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: –How business intelligence.
Copyright© 2014, Sira Yongchareon Department of Computing, Faculty of Creative Industries and Business Lecturer : Dr. Sira Yongchareon ISCG 6425 Data Warehousing.
ITEC 3220M Using and Designing Database Systems Instructor: Prof. Z.Yang Course Website: c3220m.htm Office: TEL.
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
1 Database Systems, 8 th Edition Star Schema Data modeling technique –Maps multidimensional decision support data into relational database Creates.
Introduction to OLAP and Data Warehouse Assoc. Professor Bela Stantic September 2014 Database Systems.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Management Information Systems by Prof. Park Kyung-Hye Chapter 7 (8th Week) Databases and Data Warehouses 07.
Decision Support System by Simulation Model (Ajarn Chat Chuchuen)
Chapter 13 Business Intelligence and Data Warehouses
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Data Warehouse.
Chapter 13 – Data Warehousing
MANAGING DATA RESOURCES
Data Warehouse and OLAP
An Introduction to Data Warehousing
Introduction of Week 9 Return assignment 5-2
The ultimate in data organization
Data Warehousing Concepts
Chapter 3 Database Management
Data Warehouse and OLAP
Presentation transcript:

Moving Towards A Data Repository That Facilitates Data Analysis CHOP November 18,

Relational Database Design 2

Normalization Normalization - process of efficiently organizing data in a database to reduce redundancies of data Goal - consistency of data –Store data once and one time only! –security –disk space –speed of queries –efficiency of database updates –data integrity  In normalized database no aggregation and no calculated fields 3

Data Anomolies 4

Unnormalized data set Patient ID NameAddressDOBDocAppt Date LocationDX Cindy Marselis 2320 Edge Hill Road 1/11/64Armstrong9/1/09 11:00 AM Alter 2011Herniated Disc Flu Cindy Marselis 9331 Rising Sun Avenue 1/11/64Morningstar9/1/09 11:00 AM Alter 2011Herniated Disc Cindy Marselis 2320 Edge Hill Road 1/11/64Allen11/1/09 10:00 AM Alter 2012Psoriasis Kathryn Marselis 2320 Edge Hill Road 11/3/04Dershaw8/1/09 11:00 AM Speakman 105 Well baby check Cindy Schwartz 9331 Rising Sun Avenue 1/11/64Armstrong8/11/09 3:00 PM Alter 105Psoriasis Herniated Disc 5

Normalized db - before 6

Normalized db - after 7

Example of Appointment Entity Relationship Diagram 8

Structured, free text, unstructured text 9

Free text Issues with string searches –Must match exactly in case, punctuation, spelling, etc. Use of lookup tables where possible 10

Unstructured Text Gartner: white-collar workers spend from 30 to 40% of time managing documents Merrill Lynch: > 85 % of business information exists as unstructured data – s, memos, notes from call centers and support operations, news, user groups, chats, reports, letters, surveys, white papers, marketing material, research, presentations and Web pages. In relational db, data that can't be stored in rows and columns. –stored in a BLOB (binary large object) – files, word-processing text documents, PowerPoint presentations, JPEG and GIF image files, and MPEG video file Metadata (data about data can be stored) management.com/issues/ / html 11

Approaches to structured and unstructured data 1.Unique database: consolidates all structured and unstructured data together –expensive to buy and maintain –large volume of data can clog the database making it slow and inefficient 2.Use two databases: one structured data, and one for unstructured data. –Avoids performance issues with structured data –significant performance limitations for unstructured data 12

Approaches to structured and unstructured data 3. Unstructured data left on file servers with database to record and links to unstructured data files. –Avoids issue with volumes of data –Fragile as links are broken when files and folders moved around. –Must create links every time new document created 4.Complex and expensive connectors used to tap in all databases and file servers providing unified view of data. –Expensive and complex requiring purchase and maintenance of multiple databases and file servers with the added cost of all required connectors. 5. Patents currently under development. 13

Certification Commission for Health Information Technology (CCHIT) EHR Construct EMAR: Electronic Medication Administration Record CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule OC/RR: Physician Order Communication/Results Retrieval CPOE: Computerized Physician Order Entry PFS: Physician Fee Schedule R-ADT: Registration Admission Discharge Transfer 14

Data Warehousing 15

16 External and internal forces require tactical and strategic decisions Search for competitive advantage Business environments are dynamic Decision-making cycle time is reduced Pressures Driving Need for Business Intelligence and Data Warehousing

17 Operational data –Relational, normalized database –Optimized to support transactions –Real time updates Operational vs. Decision Support Data DSS –Snapshot of operational data –Summarized –Large amounts of data Data analyst viewpoint –Timespan –Granularity –Dimensionality

18 Creating a Data Warehouse

19 Separated from operational environment Integrated Data Historical data over long time horizon Snapshot data captured at given time Subject-oriented data Mainly read-only data with periodic batch updates from operational source, no online updates Codd’s Key Data Warehouse Rules

20 Contains different levels of data detail –Current and old detail –Lightly and highly summarized Metadata (data about the data) critical components –Identify and define data elements –Provide the source, transformation, integration, storage, usage, relationships, and history of data elements Codd’s Key Data Warehouse Rules contd.

Decision Support Systems DSS 21

22 DSS Components

Decomposition of DSS – Operational Data o Tumor registry o A/D/T o Radiology narrative o Pathology narrative o Lab results o Patient Accounting o Charge Master 23

Decomposition of DSS – External Data o Research spider o Treatment guidelines o Reimbursement schedules o NCI/NIH protocols 24

Decomposition of DSS – ETL Rationalize normal lab values Transform gender codes and free text Narrative dumps Doctor cleansing o Similar names o Which practice gets credit? 25

26 ETL – Extraction, Transformation, Load Transform : cleanse data for consistency and output exceptions o Apply business rules o Selecting certain columns to load (not null records) o Translating coded values (1, M, male = 0 ) o Derive new calculated value (sale_amount = qty * unit_price) o Join data from multiple sources (lookup, merge) o Aggregate (rollup/summarize data – average LOS for each doctor by DRG) o Transpose/pivot (turning columns into rows) o Data validation. Extract data from source systems Load: data into repository

ETL Best Practice 27 ETL: 60-80% of development effort Create multi-departmental team charged with consensus on Transformation! Review exceptions carefully o Indicator of issues with operational db design o Indicator of changes needed in transformation

Decomposition of DSS – Business Data Business data – central repository Includes metadata: source, format, timing of feeds CharacteristicFactors IntegratedCentralized Holds data retrieved from entire organization Subject- Oriented Optimized to give answers to diverse questions Used by all functional areas Time VariantFlow of data through time Projected data Non-VolatileData never removed Always growing 28

Decomposition of DSS – Business Model Data Comprehensive Cancer Center definition of a patient o Must have seen a physician for suspected or confirmed benign or malignant condition o What about patients seen for screening mammography? 29

Decomposition of DSS – End user query tool Web-based or client-server? OLAP – Online Analytic Processing o Microsoft o Business Objects (bought by SAP) o MicroStrategy o Cognos (bought by IBM) o Oracle (includes Hyperion) 30

Decomposition of DSS – End –user tool o Drill down functionality o Roll up o Charts – not data level o Export features ajax/chart/examples/functionality/ drilldown/defaultcs.aspx 31

Design – Star Schema 32

Star Schema Center fact table o usually contains numeric information for summary reports. Dimension table radiate from fact table Dimension table is hierarchial ‘rollup’ allows to compare types of hospitals, disease categories, or even patient age bands. Creates logical data cube dimensions identifying a set of numeric measurements within the cube. 33

34 Data-modeling technique Maps multidimensional decision support into relational database Yield model for multidimensional data analysis while preserving relational structure of operational DB Four Components: –Facts –Dimensions –Attributes –Attribute hierarchies Star Schema

35 Simple Star Schema

Star Schema 36

Entity Relationship Diagram 37

Analysis 38

39 Advanced data analysis environment Supports decision making, business modeling, and operations research activities Characteristics of OLAP –Use multidimensional data analysis techniques –Provide advanced database support –Provide easy-to-use end-user interfaces –Support client/server architecture Online Analytical Processing (OLAP)

Healthcare Cube – slice and dice view Diagnosis Time Physician Time Strategic Period YearQuarterMonthWeekDayShiftHour Provider ClinicSpecialtyGroupPhysician 40

Dashboard 41

Scorecard including Key Performace Indicators (KPI) Risk-adjusted mortality index Risk-adjusted complications index Risk-adjusted patient safety index Severity-adjusted average length of stay Expense per adjusted discharge, case mix- and wage-adjusted 42