Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.

Slides:



Advertisements
Similar presentations
Chapter 13 The Data Warehouse
Advertisements

C6 Databases.
1 CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization.
Management Information Systems, Sixth Edition
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
ICS 421 Spring 2010 Data Warehousing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/18/20101Lipyeow.
Managing Data Resources
Chapter 3 Database Management
Data Sources Data Warehouse Analysis Results Data visualisation Analytical tools OLAP Data Mining Overview of Business Intelligence Data visualisation.
1 9 Concepts of Database Management, 4 th Edition, Pratt & Adamski Chapter 9 Database Management Approaches.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 29 Overview of Data Warehousing and OLAP.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
Chapter 13 The Data Warehouse
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
Ch3 Data Warehouse part2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
CS346: Advanced Databases
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
8/20/ Data Warehousing and OLAP. 2 Data Warehousing & OLAP Defined in many different ways, but not rigorously. Defined in many different ways, but.
1 Chapter 4 Data Management: Warehousing, Access and Visualization MSS foundation New concepts Object-oriented databases Intelligent databases Data warehouse.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Week 6 Lecture The Data Warehouse Samuel Conn, Asst. Professor
Concepts of Database Management, Fifth Edition
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Warehouse & Data Mining
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
OnLine Analytical Processing (OLAP)
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
1 Data Warehouses BUAD/American University Data Warehouses.
Data Warehousing.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
October 28, Data Warehouse Architecture Data Sources Operational DBs other sources Analysis Query Reports Data mining Front-End Tools OLAP Engine.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Ch3 Data Warehouse Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CISB594 – Business Intelligence Data Warehousing Part I.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization 2 1.
Data Mining Data Warehouses.
© 2003 Prentice Hall, Inc.3-1 Chapter 3 Database Management Information Systems Today Leonard Jessup and Joseph Valacich.
Advanced Database Concepts
The Need for Data Analysis 2 Managers track daily transactions to evaluate how the business is performing Strategies should be developed to meet organizational.
Data Warehousing COMP3017 Advanced Databases Dr Nicholas Gibbins –
Managing Data Resources File Organization and databases for business information systems.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Intro to MIS – MGS351 Databases and Data Warehouses
Advanced Applied IT for Business 2
Data warehouse and OLAP
Chapter 13 The Data Warehouse
Chapter 5 Data Management
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehouse.
Chapter 13 – Data Warehousing
MANAGING DATA RESOURCES
Data Warehouse and OLAP
Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Data Warehouse and OLAP
Presentation transcript:

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Data, Information, Knowledge Items that are the most elementary descriptions of things, events, activities, and transactions Organized data that has a meaning and value Knowledge Processed data or information that conveys understanding or learning applicable to a problem or activity A data source can be: Internal External Personal

Data Collection, Problems, and Quality Data Collection: could be done manually or by instruments and sensors Data collection methods are surveys (using questionnaires), observations (using video cameras), and collecting information from experts (e.g., using interviews). In addition, sensors and scanners are used for automatic data collection. Suggest a reliable method of data collection to be used to identify a customer buying patterns.

Data Collection, Problems, and Quality (con.) Data Problems The major DSS data problems are summarized in following table along with some possible solutions

Data Collection, Problems, and Quality (con.) Data quality determines the usefulness of data as well as the quality of the decisions based on them. Data quality problems are divided into following four categories and dimensions: Contextual data quality Intrinsic data quality Accessibility data quality Representation data quality Often neglected or casually handled Problems exposed when data is summarized System conversions and migrations Heterogeneous systems integration Inadequate database design of source systems Data aging Incomplete information from customers Input errors Internationalization/localization of systems Lack of data management policies/procedures Types of Data Quality Problems Dummy values in source system fields Absence of data in source system fields Multipurpose fields Cryptic data Contradicting data Improper use of name and address lines Violation of business rules Reused primary keys Nonunique identifiers

Data Integrity Data integrity assures the accuracy and consistency of data One of the major issues of DQ is data integrity Data integrity issues Uniformity Version Completeness check Conformity check Genealogy or drill-down

Data Access and Integration Recognize what to access Integrate disparate and heterogeneous databases to develop enterprise-wide systems XML becomes standard language for database integration and data, transfer

Database Management Systems Software program for managing a database Manages data (i.e. update , delete , insert, sort, manipulate and retrieve data) Generates reports Better data security Combines with modeling language for construction of DSS

Database Models Relational Hierarchical Network Flat, two-dimensional tables with multiple access queries It is simple for the user to learn & easily expanded or altered Can be accessed in a number of formats not anticipated at the time of the initial design and development of the database It can support large amount of data Hierarchical Top down, like a tree Fields have only one “parent”, each “parent” can have multiple “children” quick & useful mainly in transaction processing Network Relationships created through linked lists, using pointers “Children” can have multiple “parents” It can save storage space through the sharing of some items

Database Models (con.) Object oriented Multimedia Based Document Based Data analyzed at conceptual level Inheritance, abstraction, encapsulation Multimedia Based Multiple data formats like JPEG, GIF, bitmap, PNG, sound, video, virtual reality Requires specific hardware for full feature availability Document Based Document storage and management Intelligent Intelligent agents and ANN Inference engines

Data Warehouse is a comprehensive database that supports all decision analysis required by an organization by providing summarized and detailed information. It has access to all information relevant to the organization, which may come from many different sources, both internal and external. © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

Data Warehouse (con.) Data extraction: Data cleaning: get data from sources Data cleaning: detect errors in the data and rectify them when possible Data transformation: convert data from host format to warehouse format , check integrity Load: sort, summarize, consolidate, compute views, and build indices and partitions propagate the updates from the data sources to the warehouse Data Extraction: Clearly identify all the internal data sources. Specify all the computing platforms and source files from which the data is to be extracted. If you are going to include external data sources, determine the compatibility of your data structures with those of the outside sources. Also indicate the methods for data extraction. Data Transformation. Many types of transformation functions are needed before data can be mapped and prepared for loading into the data warehouse repository. functions include input selection, separation of input structures, normalization and denormalization of source structures,and conversions of names and addresses. this turns out to be a long and complex list of functions. Examine each data element planned to be stored in the data warehouse. Data Loading. Define the initial load. Determine how often each major group of data must be kept up-to-date in the data warehouse. How much of the updates will be nightly updates? Does your environment warrant more than one update cycle in a day? How are the changes going to be captured in the source systems? Define how the daily, weekly, and monthly updates will be initiated and carried out.

Data warehouse characteristics Subject oriented Data from both internal and external sources is presented Scrubbed so that data from heterogeneous sources are standardized Time-variant Nonvolatile Read only Not normalized; may be redundant Metadata included

Characteristics of Data Warehouses- Subject oriented Organized around major subjects, such as product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.

Characteristics of Data Warehouses- Integrated Constructed by integrating multiple, heterogeneous data sources. Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions (e.g.,LastName and FamilyName in DB1 and DB2 have the same signification) encoding structures (e.g, Attribute User_Id is along int in DB1 and it is a string in DB2 attribute measures (e.g, cm vs inch) …

Characteristics of Data Warehouses- Time Variant Data warehouse data : provide information from a historical perspective (e.g., past 5-10 years) Every data in the data warehouse contains an element of time.

Characteristics of Data Warehouses- Non Volatile Operational update of data doesn’t occur in the data warehouse environment. Doesn't require transaction processing, recovery, and concurrency control mechanism. Require only two operations in data accessing Initial loading of data and quering.

Characteristics of Data Warehouses- Metadata included Metadata refers to data about data The primary purpose of metadata should be to provide context to the data; that is, enriching information leading to knowledge Plays vital role in explaining how , why, and where data can be found, retrieved, stored and used efficiently in an information system

Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration: Build wrappers/mediators on top of heterogeneous databases Query driven approach A query posed to a client site, will be transformed into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Data warehouse: Update-driven Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

Data Warehouse vs. operational databases DW Traditional DB Large amount of data from multiple sources that may include different DB models or files acquired from independent systems and platforms. It is a transactional (relational, object-oriented) Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Optimizes for retrieval. Focusing on daily operations or transaction processing Optimizes for routine transaction processing Provide information from a historical perspective (e.g., past 5-10 years). Current value data. It is nonvolatile. In traditional DB ,transactions are the agent of change to the database. Supports DSS, Data Mining and OLAP. Supports OLTP.

From tables to Data cubes A data warehouse is based on a multidimensional data model which views data in the form of data cube. A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions: Dimension tables contains descriptions about the subject of the business. such as item (item_name, brand, type) or time (day, week, month, quarter, year Fact table contain a factual or quantitative data Fact table also contains measures (such as dollars_sold) and keys to each of the related dimension tables.

From tables to Data cubes (cont.) Relational representation of pivot table

From tables to Data cubes (cont.) 2-D view of sales cross-tabulation (pivot table)

From tables to Data cubes (cont.)

Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions & measures Star schema: a fact table in the middle connected to a set of dimension tables. Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake. Fact constellations: multiple fact tables share dimension tables, viewed as a collection of stars.

Example of Star Schema

Example of Snowflake Schema

Example of Fact Constellation

Multidimensional Data Dimensions are : product, month, region Measure is sales_amount

Data Marts It is a subset of data warehouse, typically consisting of single subject are Dependent Created from warehouse Replicated Functional subset of warehouse Independent Scaled down, less expensive version of data warehouse Designed for a department or SBU or department Organization may have multiple data marts Difficult to integrate

OLAP It refers to variety of activities usually performed by end users in online systems. No agreement on what activities are considered OLAP. However, one OLAP tool includes such activities as: Requesting ad hoc report and graphs Conducting statistical analysis Modeling and visualization capabilities Building DSS

OLAP Tools Known as business intelligence, business analytics, decision support, data access, database front ends OLAP vs. OLTP tools Codd’s 12 rules of OLAP tool Multidimensional conceptual view Transparency Accessibility Consistent reporting performance Client-server architecture Generic dimensionality Dynamic sparse matrix handling Multi-user support Unrestricted cross-dimensional operations Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels

(On Line Transaction Processing) (On Line Analytical Processing) OLTP vs. OLAP OLTP (On Line Transaction Processing) OLAP (On Line Analytical Processing) User Any one Decision-makers, analysts. Function Day to day operations. Decision support. DB Design Application-oriented (E-R based) Subject-oriented (Star, snowflake) Data Current. Historical. View Detailed. Summarized. Access Read/write. Read Mostly. # Records accessed Tens. Millions. #Users Thousands. Hundreds. Db size 100 MB-GB. 100 GB-TB.

Typical OLAP operations Roll up (drill-up): summarize data by climbing up hierarchy by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, introducing new dimensions Slice and dice: project and select Slice Performs a selection on one dimension of the given cube, resulting in a sub-cube. Reduces the dimensionality of the cubes. Dice Refers to range select condition on one dimension, or to select condition on more than one dimension. Reduces the number of member values of one or more dimensions. Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes.

OLAP-Roll up (drill-up)

OLAP-Drill down (roll down)

OLAP-Slice

OLAP-Dice

Data Mining Knowledge extraction Data archaeology Data exploration Process that uses statistical, mathematical, artificial intelligence, and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases Automatic and quick data analysis Data mining includes tasks/activities known as: Knowledge extraction Data archaeology Data exploration Data pattern processing Data dredging Information harvesting

How Data Mining Works Three types of methods are used to identify patterns in data Simple models (SOL-based query, OLAP, human judgment) Intermediate models (regression, decision trees, clustering) Complex models (neural networks, other rule induction) Data mining application classes Classification Clustering Association Sequencing Regression Forecasting Others

Hypothesis Vs. Discovery Driven Data Mining Hypothesis driven data mining begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition. For example, a marketing manager may begin with the proposition, "Are DVD players sales related to sales of television sets?" Discovery- driven data mining finds patterns, associations, and relationships among the data. It can uncover facts that were previously unknown

Tools and Techniques Data mining tools and techniques Statistical methods (association , regression and cluster ) Decision trees (classification , clustering ) Case based reasoning(pattern detection ) Neural computing (pattern detection ) Intelligent agents Genetic algorithms

Text Mining It is the application of data mining to nonstructured or less structured text files It helps the organization to: Find the "hidden" content of documents, including additional useful relationships. Relate documents across previous unnoticed divisions; for example, discover that customers in two different product divisions have the same characteristics. Group documents by common themes; for example, all the customers of an insurance firm who have similar complaints and cancel their policies

Multidimensionality It is an efficient way to organize data in different ways for analysis and presentation. Its major advantage is that the data will be organized according to managers need, not analysts Three factors ate considered in multidimensionality: dimensions, measures, and time. Here are some examples: Dimensions: products, salespeople, market segments, business units, geographic locations, distribution channels, countries, industries Measures: money, sales volume, head count, inventory profit, actual vs. forecasted Time: daily, weekly, monthly, quarterly, yearly.

Data Visualization Technologies supporting visualization and interpretation Digital imaging, GIS, GUI, tables, multidimensions, graphs, VR, 3D, animation Identify relationships and trends Data manipulation allows real time look at performance data

Multidimensionality Multidimensionality has some limitations The multidimensional database can take up significantly more computer storage Multidimensional products cost significantly more Database loading consumes system resources and time, depending on data volume and number of dimensions. Interfaces and maintenance are more complex than in relational databases.

Geographic Information System (GIS) Computerized system for managing and manipulating data with digitized maps Geographically oriented Geographic spreadsheet for models Software allows web access to maps Used for modeling and simulations

GIS (con.)

References " 4 Regression." Regression. N.p., n.d. Web. 13 Nov. 2014. "5 Classification." Classification. N.p., n.d. Web. 13 Nov. 2014. "7 Clustering." Clustering. N.p., n.d. Web. 13 Nov. 2014. "8 Association." Association. N.p., n.d. Web. 13 Nov. 2014.