Decision Support System Course

Decision Support System Course
Dr. Aref Rashad Part:3 Data Component February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Components of a DSS February 2013
Decision Support Systems Course .. Dr. Aref Rashad

Values Matrix help designers of DSS to know what information to include

Characteristics of Useful Information
• Timeliness • Sufficiency • Level of Detail and Aggregation • Redundancy • Understandability • Freedom from Bias • Reliability • Decision Relevance • Cost Efficiency • Comparability • Quantifiability • Appropriateness of Format

Timeliness of Data Timeliness addresses whether the information is available to the decision maker soon enough for it to be meaningful

Sufficiency Level of Detail Understandability
whether the data are adequate to support the decision under consideration. Level of Detail The aggregation level of the data is also an important factor for determining the usefulness of information in a DSS Understandability The key is to simplify the representation in the database without losing the meaning of the data. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Freedom from Bias Decision Relevance Comparability
It is not appropriate for the designer to bias the analyses if it can be avoided. Bias can be caused by a wide variety of problems in the data, such as non representativeness with regard to time horizon, variables, comparability, or sampling procedures Decision Relevance Perhaps the most obvious issue to consider when building a database is the relevance of the information to the choices under consideration Comparability When deciding whether data are valuable, we need to assess whether they can be compared to other relevant data. Comparable means that, in important ways, measurement conditions have been held constant February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Reliability Redundancy Cost Efficiency
Decision makers will assume that the data are correct if they are included in the database; designers therefore need to ensure that they are accurate. They should verify the input of data and the integrity of the database Redundancy In a perfect world, the less information is repeated, the less storage is used. This goal is laudable because it should not limit the user's ability to link data from multiple sources. Cost Efficiency The benefit of improved decision-making capability must outweigh the cost of providing it or there is no advantage in the improvement. Said differently, data are only cost efficient in a database if there is positive value in the changed decision behavior associated with acting on the data in question after the cost of obtaining those data are subtracted. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Appropriateness of Format
Quantifiability Quantifiability does not assume that all valuable measures are quantified. Rather, it means the data are quantified at the appropriate level and that only appropriate operations can be performed on them. The level of quantification, referred to as the scale, dictates the types of meaningful mathematical operations that can be performed with the data. Appropriateness of Format The final determinant of the value of information is whether it is displayed in an appropriate fashion. This refers to the medium for their presentation, the ordering in which data are presented to the decision maker and the amount of graphics that are used. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Data Sources Access needed to multiple sources Often enterprise-wide
Disparate and heterogeneous databases XML becoming language standard Web Intelligent agents Document management systems Content management systems Commercial databases Sell access to specialized databases

Databases These databases are collections of interrelated data. The goal behind the database concept is to store related data together in a format independent of the DSS These data are linked together so that information from different physical locations on the storage medium can be joined together for transmission to the users‘ screens with a minimum amount of trouble. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Database Management Systems
The DBMS serves as a buffer between the needs of the applications and the physical storage of the data. It captures and extracts data from the appropriate physical location and feeds it to the application program in the manner requested. Software program Supplements operating system Manages data Queries data and generates reports Data security Combines with modeling language for construction of DSS

Database Models Hierarchical Top down, like inverted tree
Fields have only one “parent”, each “parent” can have multiple “children” Fast Network Relationships created through linked lists, using pointers “Children” can have multiple “parents” Greater flexibility, substantial overhead Relational Flat, two-dimensional tables with multiple access queries Examines relations between multiple tables Flexible, quick, and extendable with data independence Object oriented Data analyzed at conceptual level Inheritance, abstraction, encapsulation

Enterprise Data Model

Data Warehouse A data warehouse is a database management system
Exists separate from the operations systems. It is subject and time variant and integrated, as are the operational data. It is nonvolatile and hence able to support a variety of analyses consistently The difficult steps in building the data warehouse: What data are relevant to particular decisions, How the data should be represented and blended, How to ensure they are meaningful, consistent, and accurate The goal of the data warehouse is to bring together data from a variety of sources and merge it in a way to make it useful for decision makers. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Data Warehouse Subject oriented
Scrubbed so that data from heterogeneous sources are standardized Time series; no current status Nonvolatile: Read only Summarized Not normalized; may be redundant Data from both internal and external sources is present Metadata included: Data about data Business metadata Semantic metadata

Process of Building a Data Warehouse

Data Scrubbing The first step in building the data warehouse is to load data from the disparate data bases. The next step is to scrub or clean the data • Eliminate problems of misspelling, transposition of letters, variations in spelling, and typographical errors. Identify poorly documented data. • Remove duplicate records Remove obsolete data Identify records not using corporate standards for coding February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Data Scrubbing Identify missing or inconsistent data.
Merge third-party information. Remove spurious and invalid records Enrich data with attributes . Validate data (especially with external databases) Identify and tag similar records suspected to be duplicates. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Data Adjustment In data warehouse we need to know not only the data at any given point in time but also the relative data at any given point in time. Examples: Currency ; needs to be consistent Provision of additional dimensions to the data that might make analyses richer. Time; needs to be included in the data warehouse The goal of these adjustments is to provide the best picture of the organization; its customers, suppliers, and competitors; and as much other outside influences as possible so that the analyses are as reliable as possible. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Architecture May have one or more tiers
Determined by Application Server, Data Server and client workstation. One tier, where all run on same platform, is rare Two tier usually combines Application Server, Data Server Three tier separates these functional parts

Data Warehouse Tasks

Online Analytical Processing (OLAP)
Interactive analysis of data, allowing data to be summarized and viewed in different ways online Data that can be modeled as dimension attributes and measure attributes are called multidimensional data. Measure attributes measure some value can be aggregated upon e.g. the attribute number of the sales Dimension attributes define the dimensions on which measure attributes (or aggregates thereof) are viewed e.g. the attributes item_name, color, and size of the sales February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Store  Region  Country
Dimensions: Time, Product, Store Attributes: Product (upc, price, …) Store … … Hierarchies: Product  Brand  … Day  Week  Quarter Store  Region  Country February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Online Analytical Processing (OLAP)
Pivoting: changing the dimensions used in a cross-tabulation Dicing: defining dimension increments Slicing: creating a cross-tab for fixed values only Rollup: moving from finer-granularity data to a coarser granularity Drill down: The opposite operation - that of moving from coarser-granularity data to finer-granularity data February 2013 Decision Support Systems Course .. Dr. Aref Rashad

February 2013 Decision Support Systems Course .. Dr. Aref Rashad

OLAP Implementation OLAP implementations using only relational database features are called relational OLAP (ROLAP) systems OLAP systems used multidimensional arrays in memory to store data cubes are referred to as multidimensional OLAP (MOLAP) systems. Hybrid systems, which store some summaries in memory and store the base data and other summaries in a relational database, are called hybrid OLAP (HOLAP) systems. February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Star Schema (in RDBMS)

Star Schema Example

Star Schema with Sample Data

Points to be noticed about ROLAP
Defines complex, multi-dimensional data with simple model Reduces the number of joins a query has to process Allows the data warehouse to evolve with relatively low maintenance Can contain both detailed and summarized data. ROLAP is based on familiar, proven, and already selected technologies.

MOLAP: Dimensional Modeling Using the Multi Dimensional Model
MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products Pilot, Arbor Essbase, Gentia

MOLAP February 2013 Decision Support Systems Course .. Dr. Aref Rashad

Data Cube Dimensions: Time, Product, Store Attributes:
Can have n dimensions; Tables can be used as views on a data cube roll-up to region Dimensions: Time, Product, Store Attributes: Product (upc, price, …) Store … … Hierarchies: Product  Brand  … Day  Week  Quarter Store  Region  Country NY Store roll-up to brand SF LA Juice Milk Coke Cream Soap Bread 10 34 56 32 12 Product roll-up to week M T W Th F S S Time 56 units of bread sold in LA on M

Dicing & slicing February 2013

Points to be noticed about MOLAP
Pre-calculating or pre-consolidating transactional data improves speed. BUT Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB MDDs are great candidates for the <50GB department data marts. Rolling up and Drilling down through aggregate data.

HOLAP : Hybrid OLAP HOLAP = Hybrid OLAP: Best of both worlds
Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools

Data Flow in HOLAP RDBMS Server MDBMS Server Client
User data Meta data Derived MDBMS Server Multi- dimensionaldata Client Multi-dimensional access SQL-Read Multidimensional Viewer SQL-Read SQL-Reach Through Relational Viewer

When deciding which technology to go for, consider:
1) Performance: How fast will the system appear to the end-user? MDD server vendors believe this is a key point in their favor. 2) Data volume and scalability: While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.

What-if analysis IF A. You require write access
B. Your data is under 50 GB C. Your timetable to implement is days D. Lowest level already aggregated E. Data access on aggregated level F. You’re developing a general-purpose application for inventory movement or assets management THEN Consider an MDD /MOLAP solution for your data mart A. Your data is over 100 GB B. You have a "read-only" requirement C. Historical data at the lowest level of granularity D. Detailed access, long-running queries E. Data assigned to lowest level elements Consider an RDBMS/ROLAP solution for your data mart. A. OLAP on aggregated and detailed data B. Different user groups C. Ease of use and detailed data Consider an HOLAP for your data mart

Examples ROLAP MOLAP HOLAP
Telecommunication startup: call data records (CDRs) E-Commerce Site Credit Card Company MOLAP Analysis and budgeting in a financial department Sales analysis HOLAP Sales department of a multi-national company Banks and Financial Service Providers

Tools available ROLAP: MOLAP: HOLAP: ORACLE 8i
ORACLE Reports; ORACLE Discoverer ORACLE Warehouse Builder Arbors Software’s Essbase MOLAP: ORACLE Express Server ORACLE Express Clients (C/S and Web) MicroStrategy’s DSS server Platinum Technologies’ Plantinum InfoBeacon HOLAP: ORACLE Express Serve ORACLE Relational Access Manager

Conclusion ROLAP: RDBMS -> star/snowflake schema
MOLAP: MDD -> Cube structures ROLAP or MOLAP: Data models used play major role in performance differences MOLAP: for summarized and relatively lesser volumes of data (10-50GB) ROLAP: for detailed and larger volumes of data Both storage methods have strengths and weaknesses The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.

Data Mining vs OLAP February 2013

Data Mining Data mining is the process of semi-automatically analyzing large databases to find useful patterns Prediction based on past history Predict if a credit card applicant poses a good credit risk, based on some attributes (income, job type, age, ..) and past history Predict if a pattern of phone calling card usage is likely to be fraudulent Some examples of prediction mechanisms: Classification Given a new item whose class is unknown, predict to which class it belongs Regression formulae Given a set of mappings for an unknown function, predict the function result for a new parameter value

Data Mining Associations
Find books that are often bought by “similar” customers. If a new such customer buys one such book, suggest the others too. Associations may be used as a first step in detecting causation e.g. association between exposure to chemical X and cancer, Clusters e.g. typhoid cases were clustered in an area surrounding a contaminated well Detection of clusters remains important in detecting epidemics

Other Types of Mining Text mining: application of data mining to textual documents cluster Web pages to find related pages cluster pages a user has visited to organize their visit history classify Web pages automatically into a Web directory Data visualization systems help users examine large volumes of data and detect patterns visually Can visually encode large amounts of information on a single screen Humans are very good a detecting visual patterns

Data Mining Applications
Data mining application classes of problems Classification Clustering Association Sequencing Regression Forecasting Hypothesis or discovery driven ……..

Tools and Techniques Data mining Text Mining Statistical methods
Decision trees Case based reasoning Neural computing Intelligent agents Genetic algorithms Text Mining Hidden content Group by themes Determine relationships

Decision Support System Course

Similar presentations

Presentation on theme: "Decision Support System Course"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Decision Support System Course

Similar presentations

Presentation on theme: "Decision Support System Course"— Presentation transcript:

Similar presentations

About project

Feedback