Download presentation
Presentation is loading. Please wait.
Published byJuliet Bradford Modified over 8 years ago
1
Data Warehousing HOWTO ● What is a Data Warehouse? ● The organisational imperitive? ● How to build a data warehouse? – Evan Leybourn – Director – Looking Glass Solutions – Not a sales pitch! (Talk to me after for that)
2
Data Warehousing? ● Repository of Organisational Information ● Data from disparate sources is stored for – Reporting – Decision making – Business Intelligence
3
Warehouse Components ● Analysis and Reverse Engineering ● Design of the Consolidation DB and Data Marts ● Extraction and Transformation of the data ● Business level reporting on the data
4
Data ● Historical record of all transactions ● Not a transactional system. ● Turns data into information
5
What's Available ● Minimal footprint in the FOSS space. ● Business Objects and Oracle
6
Process 1 Analysis
7
Data vs Information ● Data: The raw content of the data warehouse ● Information: The (delicious) output as processed by an intelligence tool. z
8
Organisation Requirements ● Meaningful reporting ● Data Integrity checking ● Data ownership
10
Data Analysis ● Understand your data sources ● Understand the database relationships ● Understand the content both public and private
11
Data Access ● Database schema and direct or ODBC connection. – Best option. ● Reverse engineering from an ODBC connection. – Acceptable but time consuming ● Reverse engineering from a database dump. – Sometimes your only choice, can be slow.
12
Open Warehouse Project ● Currently titled 'Golf' ● Analysis function ● Automated reverse engineering tools ● Inbuilt data dictionary system
13
Process 2: Design
14
Databases ● A data warehouse is made up of numerous databases. ● Why PostgreSQL – Open Source – Scales to enormous data sets – Complies with SQL standard – Supports triggers and functions (perl/python). – Excellent indexing
15
Consolidation Database ● A database which contains all data from all sources. ● Should contain historical information. ● Denormalised Schema.
16
Schema
17
Golf ● Predefined plperl triggers. ● Check and insert incremental data. (Inserts and Updates) ● Fill timestamp fields ● Enforce pseudo-foreign keys
18
Data Marts ● Subsets of the consolidation database ● Used for reporting and business intelligence ● Orders of magnitude faster to query ● Same requirements as the consolidation database.
19
Process 3 Extraction
20
● Extract -> Transform -> Insert
21
Transformation Reasons ● Validity checks ● Standardisation ● Integration
22
Transformation Types ● Join fields ● Split fields (by regex) ● Modify content (by regex) ● Drop field ● Insert arbitary field ● Drop row
23
Process 4 Reporting
24
● Tabular reporting ● Web Services ● Dashboarding
25
Thank You ● Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.