Download presentation
Presentation is loading. Please wait.
1
Data warehouse
2
Overview Traditional database systems are tuned to many, small, simple queries. Some new applications use fewer, more time-consuming, complex queries. New architectures have been developed to handle complex “analytic” queries efficiently.
3
What is a Data Warehouse?
A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.
4
The Data Warehouse The most common form of data integration.
Copy sources into a single DB (warehouse) and try to keep it up-to-date. Usual method: periodic reconstruction of the warehouse, perhaps overnight. Frequently essential for analytic queries.
5
Data Warehouse A data warehouse is a
subject-oriented integrated time-varying non-volatile collection of data that is used primarily in organizational decision making.
6
Data Warehouse Architecture
Relational Databases Legacy Data Purchased Data ERP Systems Analyze Query Data Warehouse Engine Optimized Loader Extraction Cleansing Metadata Repository
7
OLTP Most database operations involve On-Line Transaction Processing (OTLP). Short, simple, frequent queries and/or modifications, each involving a small number of tuples. Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.
8
OLAP Of increasing importance are On-Line Application Processing (OLAP) queries. Few, but complex queries --- may run for hours. Queries do not depend on having an absolutely up-to-date database.
9
OLAP Examples Amazon analyzes purchases by its customers to come up with an individual screen with products of likely interest to the customer. Analysts at Wal-Mart look for items with increasing sales in some region.
10
Common Architecture Databases at store branches handle OLTP.
Local store databases copied to a central warehouse overnight. Analysts use the warehouse for OLAP.
11
Approaches to Building Warehouses
ROLAP = “relational OLAP”: Tune a relational DBMS to support star schemas. MOLAP = “multidimensional OLAP”: Use a specialized DBMS with a model such as the “data cube.”
12
OLTP vs. Data Warehouse OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries) e.g., average amount spent on phone calls between 9AM-5PM in Pune during the month of December
13
OLTP vs Data Warehouse OLTP Warehouse (DSS) Application Oriented
Used to run business Detailed data Current up to date Isolated Data Repetitive access Clerical User Warehouse (DSS) Subject Oriented Used to analyze business Summarized and refined Snapshot data Integrated Data Ad-hoc access Knowledge User (Manager)
14
OLTP vs Data Warehouse OLTP Data Warehouse Performance Sensitive
Few Records accessed at a time (tens) Read/Update Access No data redundancy Database Size MB -100 GB Data Warehouse Performance relaxed Large volumes accessed at a time(millions) Mostly Read (Batch Update) Redundancy present Database Size GB - few terabytes
15
OLTP vs Data Warehouse OLTP Data Warehouse
Transaction throughput is the performance metric Thousands of users Managed in entirety Data Warehouse Query throughput is the performance metric Hundreds of users Managed by subsets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.