Download presentation
Presentation is loading. Please wait.
Published byEzra Sharp Modified over 5 years ago
1
Best Practices in Higher Education Student Data Warehousing Forum
Northwestern University October 21, 2003 Mary Weisse Team Leader MIT Data Warehouse
2
Warehouse Overview Design Architecture and implementation
Integrity checking and controls
3
Warehouse Overview Read only Integrated reporting Institute wide
Multiple subject areas Varied modes of access Hub for data extraction by other systems
4
Warehouse Design Transaction vs reporting design Star schema
Fact table Dimensions No user interface
5
Star Schema Benefits Intuitive joins Limit on dimensions
Reuse of dimension tables in multiple star schemas
6
Star Schema Example Dimensions Fact Dimensions
7
Star Schema Example Dimensions Fact Dimensions
8
Star Schema Example Dimensions Fact Dimensions
9
No User Interface All security at the database level
Naming of fields, and tables critical No place to code around problems, give messages etc.
10
Design Assumptions Minimal support & operational costs
Standard (open) interfaces & components Scaleable / able to evolve over time Secure
11
Risks Run away queries Poor data quality
Misunderstanding of the data by users may lead to erroneous reporting results
12
Security Machine security Data encryption Oracle roles Access control
Dynamic views Roles
13
Roles Web
14
Architectural Components
DBMS – Store, Manage, and Control Access to the Data Metadata – Data definitions, load control, data conversion rules Extract – Data taken from source systems Transfer – Data copied to the warehouse server securely Convert – Data translated into reporting format & structures Load – Data loaded into database & indexes created Transport – Data is securely transferred from the db to desktop Query Tool – Retrieve data & create export
16
Data Load Processing Assumption: Information is better stale than incorrect Grouping data loads Error tolerances may vary Checking status at each stage
17
Process Files Cron Meta Data Data 1 Check file existence
2 Move to secure directory 3 Decrypt 4 Optional pre-conversion processing 5 Convert 6 Remove data 7 Remove indexes 8 Load data 9 Optional post-load processing 10 Restore indexes 11 Optional post-batch processing Compute Statistics Calculate & add fields Create aggregate tables 12 Archive files Meta Data External Systems (SAP) Data 1 6, 7, 8, 9 5 2 10 3 4 Transfers Encrypted Decrypted Converted Archive
18
Extraction Minimize impact on production systems
Minimal data transformation done on source system Performance Data transformed in only one place Incremental control Extracted by date from last date run successfully Control files to ensure that extracted data is complete
19
Integrity Checks Correct files on hand before job runs
Record & byte counts Comparisons of control file to data file Extract file structure is checked against metadata DBMS constraints enforced
20
Control of jobs Cron–scheduling Error checking system
What jobs should have run? Did they run successfully? Data scanned for discrepancies Mail sent to appropriate staff and users
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.