Download presentation
Presentation is loading. Please wait.
Published byRoderick Matthews Modified over 9 years ago
1
Data Warehousing & OLAP Nuosang Du Jon B. Arnason CSCI 5707 November 19, 2013
2
1 st affect The on-line analytical processing has some advantages such as flexible analysis function, intuitive data manipulation and the visualization to analysis results which makes easy and efficient to the analysis based on a complex large number of data. Users can make the right judgments quickly. It can be used to confirm the proposed complex hypothesis, the result is in the form of graphics or tables. It doesn’t mark the abnormal information. OLAP should be based on a large amount of historical data on different time points and the complex analysis of multidimensional and integrated information. OLAP requires that the user has subjective information requirements definition, so the system efficiency is better. 2 nd origin Is the concept of on-line analytical processing (OLAP), as the farther of the relational database E.F. Codd proposed in 1993, he also put forward 12 rules of OLAP, which caused great repercussions. Rule 1:OLAP model must provide multidimensional conceptual view Rule 2:Transparency Rule 3:Access ability Rule 4:Stable report ability Rule 5:Client/server architecture Rule 6:Equal dimension Rule 7:Dynamic sparse matrix processing Rule 8:Multi-user support ability Rule 9:Unlimited operating across dimensions Rule 10: Direct manipulation of data
3
Rule 11: Flexible report generation Rule 12:Unlimited dimension and gathering Time_ID Workday Week_No Sales ID Region_ID Time_ID Product_ID Customer_ID Revenue Quantity_Sold Pro_ID PName J Week_No Jjjjj jjjjj Region_ID Reg_Des City_ID J City_ID Cname Pro_ID J Provience City Sales Region Fact Table Day Cus_ID Name CCat_ID J Week CCat_ID CCat_Des J PC_ID PC_Des J PSC_ID PSC_Des PSC_ID PC_ID J Prod_ID B_ID PSC_ID J II_ID Des J Brand Customer Category CustomerProduct Sub_Category Product Category Snow-flake Schema Geography Dimension Customer Dimension Time Dimension Product Dimension
4
3 rd Category Today’s data processing can be divided into two categories: online transaction processing (OLTP), on-line Analytical processing (OLAP). OLTP is traditional relational database application for basic and daily transaction processing, such as bank transactions. OLAP is the main application of data warehouse system which supporting complex analysis operation, focusing on the decision supporting, it provides straightforward query results. The following table lists the comparison between OLTP and OLAP. OLTP OLAP uUUsUSe operators, lower-level managers decision makers, high-lever managers daily operation processing analysis and decision database design application-orientedtopic-oriented current, the latest details, two- dimensional, discrete historic, gathered, multi- dimensional, integrated and unified access read/write dozens of re cords read millions of records work for simple transaction complex query size of database100MB-GB100GB-TB user function database design data access work for size of database
5
Customer_ID Customer Name Customer Category 4th Development background (You can find them on the internet easily.) 5 th Function In general, data warehouse system is a comprehensive enterprise database which can carry on the fast and accurate analysis to a large number of data for making better business decisions. It consists of three parts:
6
1. The data layer: the implementation of enterprise operating data extraction, transformation, cleaning and summary, forming the information, and storing in the database. 2. The application layer: through the on-line analytical processing, and even data mining application processing, realize the analysis of the data. 3. The presentation layer: through the front desk analysis tool, the query statements, statistical analysis, and the conclusion of multi-dimensional online analysis and data mining, it can be shown in front of the user. 6 Concepts OLAP is shown in front of the user its multidimensional view. D (Dimension): is the observation data of a particular perspective, it can be considered as a class attribute, the attribute sets from a D (time Dimension, etc.). D Level (Level): is the observation data of a particular perspective (i.e., a D), can also be different in every detail description field (time D: date, month, quarter). D (Member): a dimension value is the description of the data item in one dimensional position. (“yy/mm/dd” is a description of the position on the time dimension). M (Measure): values of multidimensional array. (Sep 2013, Minneapolis, 5707, database principle). Operation of OLAP multidimensional analysis have Drill (Drill up and Drill down), Slice, Dice and Pivot, etc. Drill: is to change the dimensional hierarchy and the analysis granularity. It includes the drill down and drill up. Drill up is the low level of detail data will be summarized to a high level of summary data, or reducing dimension; and drill down, on the other hand, transforms from the summary data into the detail data to observe or add new dimensions.
7
Slice and dice: Given selected values in a part of D, we are concerned with measurement data in the distribution of the remaining dimensions. If the rest only has two dimensions it is called sliced; if it has three or more, it is called cut. Pivot: The transformation of dimension in the direction, reschedule the placement of D in the tables (for example, the swap of columns and rows). 7 th Architecture The architecture of a data warehouse and OLAP are a complementary relationship. The modern OLAP system is based on data warehousing, in other words, generally drawn for a subset of the detailed data from the data warehouse and after the necessary gathering of the storage in the OLAP memory for the reading of the front-end analysis tools. A typical OLAP system architecture as shown below: OLAP system according to the data storage format can be divided into relational-OLAP(ROLAP), multidimensional OLAP(MOLAP) and hybrid-OLAP(HOLAP) three types. ROLAP MOLAP R: Use existing relational database technology M: Especially designed for the on-line analytical processing R: The response speed is slower than the MOLAP; M: Existing relational database for OLAP already do a lot of optimization, including parallel storage, parallel query, parallel data management, query optimization based on the cost, the bitmap index, resulting an improved performance. R: Data loading speed is fast M: Data loading speed is slow
8
R: Storage cost is small, there is no limit to the dimension M: Need to calculate, may lead to data explosion, dimension is limited; failing to support the dynamic change of dimension R: Use RDBMS stored data, there is no file size limit M: The file size be limited in the operating system platform (only 10~20 g) R: Can be realized through SQL detailed data and summary data storage M: Lacking the standards of data model and data access The realization of the on-line analytical processing has three different methods: 1. ROLAP 2. MOLAP 3. Front-end display, on-line analytical processing (Desktop OLAP) Among these, Desktop OLAP needs to download all the data to the client, and then take report format/data structure reorganization on the client. The user can realize the dynamic analysis in the native. The method is more flexible, but it can support the very limited amount of data which seriously affect the scope and efficiency of use. So it has been eliminated now. You can find some examples below: Hyperion Oracle Cognos MicroStrategy IBM Brio
9
Widget: Silverlight Winforms Based on a CUBE calculation and analysis.
10
Bibliography Mailvaganam, Hari. “Introduction to OLAP”. www.dwreview.com/OLAP/Introduction_OLAP.htmlwww.dwreview.com/OLAP/Introduction_OLAP.html Chaudhuri, Surajit; Dayal, Umeshwar; Narasayya, Vivek. “An Overview of Business Intelligence”. Communications Of The ACM, Vol. 54, No. 8, August 2011.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.