The CRM Data Warehouse
I. Introduction to data warehouse II. Data warehouse architecture III. Data and process models
I. Introduction to data warehouse 1-1 Definition of data warehouse 1-2 Data warehouses and data marts 1-3 Data warehousing objectives
1-1 Definition of data warehouse data warehouse — a large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activities data warehouse — a large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activities
1-2 Data warehouse and data marts data mart – a subset of the data warehouse, tailored to meet the specialized needs of a particular group of users data mart – a subset of the data warehouse, tailored to meet the specialized needs of a particular group of users bottom-up approach to data warehouse development — the data marts are created first and then integrated. bottom-up approach to data warehouse development — the data marts are created first and then integrated.
1-3 Data warehouse objectives five objectives of data warehousing — (1) keep the warehouse data current; (2) ensure that the warehouse data is accurate; (3) keep the warehouse data secure; (4) make the warehouse data easily available to authorized users; and, (5) maintain descriptions of the warehouse data so that users and system developers can understand the meaning of each element five objectives of data warehousing — (1) keep the warehouse data current; (2) ensure that the warehouse data is accurate; (3) keep the warehouse data secure; (4) make the warehouse data easily available to authorized users; and, (5) maintain descriptions of the warehouse data so that users and system developers can understand the meaning of each element
II. Data warehouse architecture staging area — data is prepared to be moved into the warehouse data repository and the metadata repository staging area — data is prepared to be moved into the warehouse data repository and the metadata repository metadata — data about data, or descriptions of the data in the data warehouse metadata — data about data, or descriptions of the data in the data warehouse Exhibit 4.1: A Data Warehouse System ModelExhibit 4.1: A Data Warehouse System Model
II. Data warehouse architecture 2-1 Management and control 2-2 Staging area 2-3 Warehouse data repository 2-4 Metadata repository
2-1 Management and control management and control component — like a traffic officer standing in the middle of a street intersection, controlling the flow of traffic through the intersection management and control component — like a traffic officer standing in the middle of a street intersection, controlling the flow of traffic through the intersection
2-2 Staging area ETL — extraction, transformation, and loading as the activities of this staging area ETL — extraction, transformation, and loading as the activities of this staging area extraction — obtaining data from the internal databases and files of systems, accomplished according to a schedule extraction — obtaining data from the internal databases and files of systems, accomplished according to a schedule transformation — a process that includes cleaning, standardizing, reformatting, and summarizing transformation — a process that includes cleaning, standardizing, reformatting, and summarizing loading — writing the data into the data warehouse loading — writing the data into the data warehouse
2-3 Warehouse data repository(1/4) warehouse data repository — where the warehouse data is stored within the computer system or systems warehouse data repository — where the warehouse data is stored within the computer system or systems Data content customer picture — a compilation of : customer picture — a compilation of : Geographic dataGeographic data Demographic dataDemographic data Activity dataActivity data Psychographic dataPsychographic data Behavior dataBehavior data
2-3 Warehouse data repository(2/4) Data Characteristics data characteristics — the types of data to be processed, including considerations of data granularity, data hierarchies, and data dimensions data characteristics — the types of data to be processed, including considerations of data granularity, data hierarchies, and data dimensions
2-3 Warehouse data repository(3/4) Data Types Data Types fixed-length format — for example, a customer number element may be specified as a field that is always 8 positions with the positions always consisting of numeric datafixed-length format — for example, a customer number element may be specified as a field that is always 8 positions with the positions always consisting of numeric data variable-length format — for example would be a customer ’ s name which might vary from 20 positions to 50 depending on name length or would be textual data, such as comments that a customer might entervariable-length format — for example would be a customer ’ s name which might vary from 20 positions to 50 depending on name length or would be textual data, such as comments that a customer might enter Data Granularity Data Granularity data granularity — the degree of detail that is represented by the data, where the greater the detail, the finer the granularitydata granularity — the degree of detail that is represented by the data, where the greater the detail, the finer the granularity
2-3 Warehouse data repository(4/4) Data Hierarchies Data Hierarchies data hierarchy — since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute)data hierarchy — since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute) Exhibit 4.2: An Example of a Data HierarchyExhibit 4.2: An Example of a Data Hierarchy Data Dimensions Data Dimensions dimensional structure — for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and timedimensional structure — for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and time Exhibit 4.3: Every Data Record Contains the Time ElementExhibit 4.3: Every Data Record Contains the Time Element
2-4 Metadata repository metadata repository — describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on metadata repository — describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on Types of metadata Types of metadata Metadata for usersMetadata for users Metadata for systems developersMetadata for systems developers Sources of metadataSources of metadata
Types of metadata Metadata for Users Metadata for Users user metadata examples — identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouseuser metadata examples — identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouse Metadata for Systems Developers Metadata for Systems Developers system developer metadata examples — data to allow developers to maintain, revise, and reengineer the data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archivingsystem developer metadata examples — data to allow developers to maintain, revise, and reengineer the data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archiving SOURCES OF METADATA SOURCES OF METADATA metadata sources — occur naturally as a byproduct of the organization ’ s previous and ongoing systems development efforts; can come from data and process models, CASE tools, and database management systemsmetadata sources — occur naturally as a byproduct of the organization ’ s previous and ongoing systems development efforts; can come from data and process models, CASE tools, and database management systems
III. Data and process models data model examples — object diagrams and entity-relationship diagrams data model examples — object diagrams and entity-relationship diagrams process models examples — use cases, use case diagrams, and data flow diagrams process models examples — use cases, use case diagrams, and data flow diagrams CASE Tools CASE Tools CASE — stands for computer-aided system engineering and is a way to use the computer to develop systemsCASE — stands for computer-aided system engineering and is a way to use the computer to develop systems DBMS Systems DBMS Systems Database management systems — include a data dictionary component, which contains excellent descriptions of the data in the database or data warehouse.Database management systems — include a data dictionary component, which contains excellent descriptions of the data in the database or data warehouse.
III. Data and process models 3-1 How data is stored in data warehouse 3-2 Information packages 3-3 Data warehouse navigation 3-4 Data warehouse security
3-1 How data is stored in data warehouse dimension table — a list of all of the attributes that identify and describe a particular entity dimension table — a list of all of the attributes that identify and describe a particular entity Exhibit 4.4: A Sample Dimension TableExhibit 4.4: A Sample Dimension Table fact table — a list of all the facts that relate to some type of the organization ’ s activity fact table — a list of all the facts that relate to some type of the organization ’ s activity Exhibit 4.5: A Sample Fact TableExhibit 4.5: A Sample Fact Table
3-2 Information packages information package — a table that is maintained in the data warehouse repository that identifies both the dimensions and the facts that relate to a business activity information package — a table that is maintained in the data warehouse repository that identifies both the dimensions and the facts that relate to a business activity Exhibit 4.6: Information Package FormatExhibit 4.6: Information Package Format Exhibit 4.7: A Sample Information PackageExhibit 4.7: A Sample Information Package
3-2 Information packages star schema — the arrangement of an information package that usually identifies multiple dimension tables for a single fact table and has the appearance of a star, with the fact table in the center and the dimension tables forming the points star schema — the arrangement of an information package that usually identifies multiple dimension tables for a single fact table and has the appearance of a star, with the fact table in the center and the dimension tables forming the points Exhibit 4.8: Star Schema FormatExhibit 4.8: Star Schema Format Exhibit 4.9: A Sample Star SchemaExhibit 4.9: A Sample Star Schema key — a number, such as a customer number, that identifies a particular occurrence of the dimension key — a number, such as a customer number, that identifies a particular occurrence of the dimension foreign keys — a means of linking the fact table to the dimension tables by means of the keys identified at the top of the fact table where the keys identify other, “ foreign ” tables as opposed to the fact table foreign keys — a means of linking the fact table to the dimension tables by means of the keys identified at the top of the fact table where the keys identify other, “ foreign ” tables as opposed to the fact table
3-3 Data warehouse navigation Guidelines for OLAP addressed ease of use by Dr. Codd ’ s in 1993 Guidelines for OLAP addressed ease of use by Dr. Codd ’ s in 1993
3-3 Data warehouse navigation summary information — preprocessed data that provides the user with exactly the content that is needed summary information — preprocessed data that provides the user with exactly the content that is needed top-down navigation — the user seeks more detail in an effort to understand the summary information top-down navigation — the user seeks more detail in an effort to understand the summary information roll up navigation — the user summarizes data to “ see the forest rather than the trees ” or to prepare summary graphs roll up navigation — the user summarizes data to “ see the forest rather than the trees ” or to prepare summary graphs drill across navigation — the user moves from one data hierarchy to another, i.e., information on customer sales, salesperson sales, and then product sales drill across navigation — the user moves from one data hierarchy to another, i.e., information on customer sales, salesperson sales, and then product sales Exhibit 4.10: Navigation PathsExhibit 4.10: Navigation Paths
3-4 Data warehouse security(1/5) information systems security — those measures that are instituted to reduce or eliminate the risks that information systems face, including such acts as damage, destruction, theft, and misuse information systems security — those measures that are instituted to reduce or eliminate the risks that information systems face, including such acts as damage, destruction, theft, and misuse Exhibit 4.11: The Security Action CycleExhibit 4.11: The Security Action Cycle
3-4 Data warehouse security(2/5) The Corporate Security Environment (1/2) The Corporate Security Environment (1/2) deterrence — security policies and procedures that are intended to deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodicallydeterrence — security policies and procedures that are intended to deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodically prevention — measures aimed at those persons who ignore deterrence, and include such things as locks on computer rooms, user passwords, file permissions, or hiring outside consultants to “ break into ” the systemprevention — measures aimed at those persons who ignore deterrence, and include such things as locks on computer rooms, user passwords, file permissions, or hiring outside consultants to “ break into ” the system
3-4 Data warehouse security(3/5) The Corporate Security Environment (2/2) The Corporate Security Environment (2/2) detection — an alert to breaches or (more ideally) potential breaches in security; where, proactive actions include system audits, reports of suspicious activity, and virus scanning software and reactive actions take the form of investigationsdetection — an alert to breaches or (more ideally) potential breaches in security; where, proactive actions include system audits, reports of suspicious activity, and virus scanning software and reactive actions take the form of investigations remedies — using knowledge that a security breach has occurred and who committed it, the organization can respond with warnings, reprimands, termination of employment, or legal action.remedies — using knowledge that a security breach has occurred and who committed it, the organization can respond with warnings, reprimands, termination of employment, or legal action.
3-4 Data warehouse security(4/5) onion measures — suggest that to access the data, you have to go through the various security layers which protect the network, files, and the database or data warehouse onion measures — suggest that to access the data, you have to go through the various security layers which protect the network, files, and the database or data warehouse network security — using procedures such as firewalls to restrict access to the network that houses the servers and data files, databases, data warehouses, and data marts network security — using procedures such as firewalls to restrict access to the network that houses the servers and data files, databases, data warehouses, and data marts
3-4 Data warehouse security(5/5) data security — obtaining access to data once access to the network has been achieved; where, data files may be located on multiple servers on the network, and the user must provide a second password and also be screened in terms of which files may be made available and/or which operations can be performed on the file data data security — obtaining access to data once access to the network has been achieved; where, data files may be located on multiple servers on the network, and the user must provide a second password and also be screened in terms of which files may be made available and/or which operations can be performed on the file data database or data warehouse security — the security checks of the database management system (DBMS) that may include a third password, verification of user name, and also verification of access to particular data tables, records, and even record fields database or data warehouse security — the security checks of the database management system (DBMS) that may include a third password, verification of user name, and also verification of access to particular data tables, records, and even record fields