Download presentation
Presentation is loading. Please wait.
Published byHannah Armstrong Modified over 9 years ago
1
Data Warehousing Alex Ostrovsky CS157B Spring 2007
2
Introduction ► Data warehouse is a main repository of corporate data ► Multiple databases are employed per specific purpose ► Contains raw events and unprocessed data, although separate tables might exist for processed information displaying meaningful data
3
What is it used for? ► Data analysis ► Data mining ► Complex queries with multiple table join ► Forecasting ► Historical reporting ► OLAP (Online Analytical Processing)
4
High level view
5
Key Concepts and Features ► Data is not required to be heavily normalized ► Transaction Processing is done mostly offline, thus processing time is not very critical. Although, this might depend on amount of data, normalization, query complexity, and application specifications.
6
Key Concepts and Features (cont.) ► Unlike regular OLTP real-time databases data is subject-oriented ► Non-volatile, i.e. data is essentially stored forever without being pruned or deleted. ► Heavily integrated: contains data from majority of organization’s applications ► Time-variant: most of the data has some time reference for the purpose of producing the reports
7
Types of data warehousing DBs ► Offline operational database: similar to regular data replication. Used to minimize the impact of queries on a running primary operational system ► Offline data warehouse: heavily integrated, reporting-oriented warehouse databases which are updated with data from operational databases on regular time intervals
8
Types of data warehousing DBs (cont) ► Real-time data warehouse: database data is updated instantaneously as soon as transaction happens ► Integrated data warehouse: database is integrated with primary operational system for immediate decision making and reporting.
9
Benefits of Data Warehousing ► No need to stress operational database with complex queries ► Separation of processing and business logic ► Very flexible, multiple distinct relations can be defined from a set of data ► Can be customer or object specific ► Persistent – once result is computed from the raw events, it doesn’t need to be recomputed again, giving faster response time on subsequent queries.
10
Dangers of Data Warehousing ► Heavy processing requires physically separate database machines for warehousing and OLTP ► Must be optimized for novice users, complex queries might take a very long time ► Much more complex multidimensional design compared to regular relational databases ► Errors in computational logic can cause serious financial losses and computational recalculations. ► Data representation ► Relatively difficult to perform data migration
11
Database Design ► Data warehousing databases mostly utilize complex multidimensional design ► Relationships must be meaningful and represent clear patterns and trends of unprocessed data. More data and relationships you have more dimensions database will have. ► Information is viewed along one common dimensional position. Can be thought of as intersection of a few planes.
12
OLAP Market
13
References ► http://en.wikipedia.org/wiki/Data_warehouse http://en.wikipedia.org/wiki/Data_warehouse ► http://en.wikipedia.org/wiki/OLAP http://en.wikipedia.org/wiki/OLAP ► http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/ http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/ http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/ ► http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/Articles/ http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/Articles/ http://dmoz.org/Computers/Software/Databases/D ata_Warehousing/Articles/ ► http://en.wikipedia.org/wiki/Multidimensional_data base http://en.wikipedia.org/wiki/Multidimensional_data base http://en.wikipedia.org/wiki/Multidimensional_data base ► http://www.olapreport.com/market.htm http://www.olapreport.com/market.htm
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.