Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD.

Similar presentations

Presentation on theme: "Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD."— Presentation transcript:

1 Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD

2 2 Improve communication between NOAA’s developers and the wider community of data management professionals –Introduce vocabulary –Identify NOAA applications that can be described using common vocabulary Purpose

3 3 Agenda Universal Data Management Challenges Notional Data Warehouse Architecture Data Modeling Approaches –Relational –Dimensional

4 4 I. Universal Data Management Challenges

5 5 Data Mining Example: “Market Basket Analysis” Decisions: 1)Move beer display closer to the diaper display 2)On Thursdays, sell beer & diapers at full price Rationale: 1) When men bought diapers on Thursdays and Saturdays, they also tended to buy beer 2) Men typically did their weekly grocery shopping on Saturdays 3) On Thursdays, they only bought a few items

6 6 Many Disciplines Mine Their Data Law Enforcement - Optimal Deployment Health Care – Coverage Risks E-Commerce – Pop-up/Link Selection Medicine – Gene/Disease Associations Etc. Data Management Goal Develop systems in which the data and procedures are configured to answer questions that are important to the enterprise

7 7 Integrating Global (Environmental Observations) and Data Management Ensuring Sound, State-of-the-Art (Research) Developing, Valuing, and Sustaining a World-Class Workforce NOAA’s Future We are not unique. Any enterprise that collects large amounts of data has the same kind of challenges and goals

8 8 Ask the same kinds of questions as those challenged with similar problems Understand the constructs and vocabulary – Architectures – Data Modeling We can find valuable expertise outside the NOAA community

9 9 II. Notional Data Warehouse Architecture

10 10 “Hub and Spoke Architecture” Application Specific “Data Marts” use ”OLAP” Technologies () Data Staging Area Data Ware- house External Data Internal Data Transform & “Cleanse” Application Neutral “ETL” = Extract, Transform and Load “OLAP” = Online Analytical Processing

11 11 Retail Application Hub and Spoke Architecture OLAP Data Marts (Application Specific) Data Staging Area Data Ware- house External Customer Lists Sales Data Transform & Cleanse Application Neutral Marketing Floor Management Human Resources Real Estate Accounting

12 12 Notional NOAA Hub and Spoke Architecture NOAA Applications (Data Marts using OLAP) Data Staging Area (Rich Inventory?) Data Ware- house Other Satellite Archives CLASS Transform & Cleanse Application Neutral Climate Prediction Weather Forecast Ecosystems Management Commerce & Transportation External Customers ESPC Data Centers

13 13 III. Data Modeling Approaches

14 14 “Relational” Vocabulary “Relational” technologies –Relational Data Base Management Systems (RDBMS) COTS Products (INFORMIX, DB2, ORACLE, MS/SS, etc.) Proprietary data management/manipulation software –RDBMS Extensions (Most COTS products built on an RDBMS) GUIs, CASE Tools, COOP, Application Generators, Security, etc. “Relational” Data Models - Evolutionary approach to data base design Conceptual Entity Relationship Diagrams (ERD) used to identify data requirements, relationships, rules –Diagrams –Data Dictionaries Logical ERDs used to normalize (eliminate redundancies) Physical models are the Table Schema entered into the RDBMS Online Transaction Processing (OLTP) – e.g., CLASS

15 15 Entity Relationship Diagram (ERD) key.. … key.. … key.. … key.. … Entity Relationship Attributes Cardinality (1, Many, or 0) The foundation of all OLTP systems, such as CLASS Attributes, entities, and relationships are described in the data dictionary Entity Class

16 16 Object Models “inherit” ERD constructs key.. … key.. … key.. … Object Class key … Behavior: >>>>

17 17 Pros & Cons of systems based on Relational models Strengths –Referential integrity –Data locking –Fast Look-up and Retrieval –GUIs Weaknesses –Entity proliferation –Users don’t understand them –Complex code must be written to accumulate multiple instances (Hard to use for Data Mining)

18 18 Dimensional Data Models Fact –An instance of numeric data Dimension –Foreign key Fact Table –Key is a concatenation of foreign keys (dimensions) –An instance can have dozens of foreign keys –Millions of instances (rows) often required Programmers revenge on Data Base Administrators –Break many relational “rules” –Re-invented often

19 19 A “Dimensional” Data Model for Retailing Who (buys, sells) –Customer (age, gender, marital status, occupation, etc.) –Sales person ( “, “, training, etc.) –Cash Register What (products) –Brand, color, size, type, etc When –Time of day, day of week, season Where –Store (location, size, type), Shelf Why –Promotions, advertising, discounts, economic trends How much (was spent) –Per product, per total sale

20 20 Classical Star Schema: Point of Sale Clerk_key ClerkName JobGrade Etc. Clerk Dimension Time_key Customer_key Store_key Clerk_key Promo_key Product_key Register_key Dollars Sold Units Sold Dollars Cost Register_key Location Type Etc. Register Dimension Promo_key PromoName PriceType AdType Etc. Promo Dimension Product_key Description Brand Sub Category Category Dept Flavor Package Type Product Dimension Time_key DayofWeek Fiscal period Time Dimension Customer_key CustomerName Purchase Profile Etc. Customer Dimension Store_key StoreName Address FloorType Etc. Store Dimension FACT

21 21 Snowflake Schema: Point of Sale Register_key Location Type Etc. Register Dimension Clerk_key ClerkName JobGrade Etc. Clerk Dimension Time_key Customer_key Store_key Clerk_key Promo_key Product_key Register_key Dollars Sold Units Sold Dollars Cost Promo_key PromoName PriceType AdType Etc. Promo Dimension Product_Type_PK Product_Type_ Desc Product Dimension Time_key DayofWeek Fiscal period Time Dimension Customer_key CustomerName Purchase Profile Etc. Customer Dimension Store_key StoreName Address FloorType Etc. Store Dimension FACT Sub-Type_PK Sub-Type-Desc Sub-Type_PK Sub-Type-Desc Sub-Type_PK Sub-Type-Desc Model-Num_PK Model-Desc Brand-ID_PK Maker-Desc Sub-Type_PK Sub-Type-Desc Model-Num_PK Model-Desc Brand-ID_PK Maker-Desc

22 22 Metadata in Dimensional Modeling NOAA usage: –If it’s not a fact –If it’s not a key –It’s metadata Conventional Dimensional usage: – If it’s not a fact – If it’s not a key – It’s documentation BUT – If it’s a key – It’s metadata (because it describes the fact)

23 23 Dimensional Models for NOAA Which –Satellite –Instrument When –Orbit, UTC, Season, decade, epoch, etc Where –Geospatial coordinates Who –User affiliation –Developer affiliation FACT: How much? –Temperature, moisture, radiance, color, etc.

24 24 A NOAA Star Schema? Altitude_ key Distance above SL Etc. Altitude Dimension Time_key (fk) Location-key (fk) Altitude key (fk) Product_key (fk) Satellite_key (fk) Instrument_key (fk) Temperature Satellite_key Name Position Satellite Dimension Instrument_key Name Description Instrument Dimension Product_key Product Name Description System Sub System Etc. Product Dimension FACT TABLE Time_key UTC of Obs’n UTC of receipt LocalT of Obs’n Orbit_Id Etc. Time Dimension Location key Geo-Coordinates of Obs’n Etc. Location Dimension

25 25 Pros & Cons of systems based on dimensional models Strengths –Very few “entity types” needed –Decision Support Systems (DSS) End-Users construct complex queries by selecting dimensions from a GUI Statistical analysis of very large data bases –Artificial Intelligence (AI) Automated scheduling of continuous executions System identifies (“discovers”) new relationships Discoveries shape successive execution Weaknesses –Development Cost –Storage –Operational Cost - Requires much “care and feeding”

26 26 False Dichotomy: Relational “vs.” Dimensional Relational and dimensional systems are not mutually exclusive –Data warehouses usually extract fact tables from relational data bases –Data warehouse capabilities are extensions in RDBMSs Depends on the business –Feasibility: Is the application data good enough for ETL? –ROI: Does the business benefit outweigh the cost?

27 27 SUMMARY: NOAA’s data mining challenge is similar to that of other enterprises A world-wide community of IT professionals uses a particular vocabulary to address the challenge Relational technologies & models are the essential first step Dimensional technologies & models come next

28 28 Questions Stan Cutler Mitretek System/NESDIS/OSD 301-457-5210 ex 163

Download ppt "Cash Registers & Satellites Briefing to the 2006 NOAATech Conference November 2, 2005 Stan Cutler 301-457-5210 x 163 Mitretek Systems/NESDIS/OSD."

Similar presentations

Ads by Google