D ATABASE S YSTEMS D ATA W AREHOUSING I Asma Ahmad 29 th April, 2011
T HE E VOLUTION OF D ATA W AREHOUSING Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost- effective services to the customer. This resulted in accumulation of growing amounts of data in operational databases.
T HE E VOLUTION OF D ATA W AREHOUSING Organizations now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage. However, operational systems were never designed to support such business activities. Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions.
T HE E VOLUTION OF D ATA W AREHOUSING Organizations need to turn their archives of data into a source of knowledge, so that a single integrated / consolidated view of the organization's data is presented to the user. A data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.
W HAT IS A D ATA W AREHOUSE ? A complete repository of historical corporate data extracted from transaction systems that is available for ad-hoc access by knowledge workers.
W HAT IS A D ATA W AREHOUSE ? Transaction System Management Information System (MIS) Could be typed sheets (NOT transaction system) Ad-Hoc access Dose not have a certain access pattern. Queries not known in advance. Difficult to write SQL in advance. Knowledge workers Typically NOT IT literate (Executives, Analysts, Managers). NOT clerical workers. Decision makers.
A NOTHER V IEW OF A DWH
D ATA W AREHOUSING C ONCEPTS A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management's decision making process (Inmon, 1993)
S UBJECT -O RIENTED D ATA Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales) This is reflected in the need to store decision- support data rather than application-oriented data
I NTEGRATED D ATA The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent The integrated data source must be made consistent to present a unified view of the data to the users
T IME -V ARIANT D ATA Data in the warehouse in only accurate and valid at some point I time or over some time interval Time-variance is also shown in the extended time that data is held, implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots
N ON -V OLATILE D ATA Data in the data warehouse is not updated in real time but is refreshed from the operational system on the regular basis New data is always added as a supplement to the database, rather then a replacement
B ENEFITS OF D ATA W AREHOUSING Potential high returns on investment Competitive advantage Increased productivity of cooperate decision- making
C AUTION ! A Warehouse of Data is NOT a Data Warehouse
C AUTION ! Size is NOT Everything
W HY A D ATA W AREHOUSE ? Businesses demand Intelligence (BI). Complex questions from integrated data. "Intelligent Enterprise"
W HY A D ATA W AREHOUSE ? Businesses want much more… What happened? Why it happened? What will happen? What is happening? What do you want to happen? Stages of Data Warehouse
H OW IS IT D IFFERENT ? Combines operational and historical data. Don't do data entry into a DWH, OLTP or ERP are the source systems. OLTP systems don't keep history, cant get balance statement more than a year old. DWH keep historical data, even of bygone customers. Why? In the context of bank, want to know why the customer left? What were the events that led to his/her leaving? Why? Customer retention.
H OW MUCH HISTORY ? Depends on: Industry. Cost of storing historical data. Economic value of historical data. Industries and history Telecomm calls are much much more as compared to bank transactions- 18 months. Retailers interested in analyzing yearly seasonal patterns- 65 weeks. Insurance companies want to do actuary analysis, use the historical data in order to predict risk- 7 years. Hence, NOT a complete repository of data
H OW MUCH HISTORY ? Economic value of data Vs. Storage cost Data Warehouse a complete repository of data?
C OMPARISON OLTP SYSTEMS Holds Current Data Stores Detailed Data Data is dynamic Repetitive Processing High volume of transactions Predictable pattern of usage Transaction-driven Application-oriented Supports day-to-day decisions Serves a large number of users DATA WAREHOUSING SYSTEMS Holds historical data Stores detailed and summarised data Data is generally static Ad hoc, unstructured, heuristic proc. Medium to low volume of transactions Unpredictable pattern of usage Analysis Driver Subject-oriented Supports strategic decisions Serves low number of managerial users
T YPICAL A PPLICATIONS Impact on organization's core business is to streamline and maximize profitability. ◦ Fraud detection. ◦ Profitability analysis. ◦ Direct mail/database marketing. ◦ Credit risk prediction. ◦ Customer retention modeling. ◦ Yield management. ◦ Inventory management. ROI on any one of these applications can justify HW/SW & consultancy costs in most organizations.