Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Practices in Higher Education Student Data Warehousing Forum

Similar presentations


Presentation on theme: "Best Practices in Higher Education Student Data Warehousing Forum"— Presentation transcript:

1 Best Practices in Higher Education Student Data Warehousing Forum
Northwestern University October 21, 2003 Mary Weisse Team Leader MIT Data Warehouse

2 Warehouse Overview Design Architecture and implementation
Integrity checking and controls

3 Warehouse Overview Read only Integrated reporting Institute wide
Multiple subject areas Varied modes of access Hub for data extraction by other systems

4 Warehouse Design Transaction vs reporting design Star schema
Fact table Dimensions No user interface

5 Star Schema Benefits Intuitive joins Limit on dimensions
Reuse of dimension tables in multiple star schemas

6 Star Schema Example Dimensions Fact Dimensions

7 Star Schema Example Dimensions Fact Dimensions

8 Star Schema Example Dimensions Fact Dimensions

9 No User Interface All security at the database level
Naming of fields, and tables critical No place to code around problems, give messages etc.

10 Design Assumptions Minimal support & operational costs
Standard (open) interfaces & components Scaleable / able to evolve over time Secure

11 Risks Run away queries Poor data quality
Misunderstanding of the data by users may lead to erroneous reporting results

12 Security Machine security Data encryption Oracle roles Access control
Dynamic views Roles

13 Roles Web

14 Architectural Components
DBMS – Store, Manage, and Control Access to the Data Metadata – Data definitions, load control, data conversion rules Extract – Data taken from source systems Transfer – Data copied to the warehouse server securely Convert – Data translated into reporting format & structures Load – Data loaded into database & indexes created Transport – Data is securely transferred from the db to desktop Query Tool – Retrieve data & create export

15

16 Data Load Processing Assumption: Information is better stale than incorrect Grouping data loads Error tolerances may vary Checking status at each stage

17 Process Files Cron Meta Data Data 1 Check file existence
2 Move to secure directory 3 Decrypt 4 Optional pre-conversion processing 5 Convert 6 Remove data 7 Remove indexes 8 Load data 9 Optional post-load processing 10 Restore indexes 11 Optional post-batch processing Compute Statistics Calculate & add fields Create aggregate tables 12 Archive files Meta Data External Systems (SAP) Data 1 6, 7, 8, 9 5 2 10 3 4 Transfers Encrypted Decrypted Converted Archive

18 Extraction Minimize impact on production systems
Minimal data transformation done on source system Performance Data transformed in only one place Incremental control Extracted by date from last date run successfully Control files to ensure that extracted data is complete

19 Integrity Checks Correct files on hand before job runs
Record & byte counts Comparisons of control file to data file Extract file structure is checked against metadata DBMS constraints enforced

20 Control of jobs Cron–scheduling Error checking system
What jobs should have run? Did they run successfully? Data scanned for discrepancies Mail sent to appropriate staff and users


Download ppt "Best Practices in Higher Education Student Data Warehousing Forum"

Similar presentations


Ads by Google