Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Intelligence. Unit 1 Important Concepts.

Similar presentations


Presentation on theme: "Business Intelligence. Unit 1 Important Concepts."— Presentation transcript:

1 Business Intelligence

2 Unit 1 Important Concepts

3 INTRODUCTION Organizations need business intelligence Business intelligence (BI) – knowledge about your customers, competitors, business partners, competitive environment, and internal operations to make effective, important, and strategic business decisions 3-3

4 INTRODUCTION IT tools help process information to create business intelligence according to: – OLTP – OLAP 3-4

5 INTRODUCTION Online transaction processing (OLTP) – the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information – Databases support OLTP – Operational database – databases that support OLTP 3-5

6 INTRODUCTION Online analytical processing (OLAP) – the manipulation of information to support decision making – Databases can support some OLAP – Data warehouses only support OLAP, not OLTP – Data warehouses are special forms of databases that support decision making 3-6

7 INTRODUCTION 3-7

8 What Is a Data Warehouse? Data warehouse – logical collection of information – gathered from operational databases – used to create business intelligence that supports business analysis activities and decision-making tasks “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant 3-8

9 3-9 What Is a Data Warehouse?

10 3-10 What Is a Data Warehouse? Multidimensional Rows and columns Also layers Many times called hypercubes

11 Data Warehouses a record of an enterprise's past transactional and operational information designed to favor efficient data analysis and reporting data warehousing is not meant for current "live" data

12 Data Warehouses large amounts of data – sometimes subdivided into smaller logical units (dependent data marts)

13 3-13 What Are Data-Mining Tools? Data-mining tools – software tools that you use to query information in a data warehouse Query-and-reporting tools Intelligence agents Multidimensional analysis tools Statistical tools

14 Data Warehouses Components of a data warehouse: Sources -> Data Source Interaction Data Transformation Data Warehouse (Data Storage) Reporting (Data Presentation) Metadata

15

16 Data Warehouses ADVANTAGES complete control over the four main areas of data management systems: Clean data Query processing: multiple options Indexes: multiple types Security: data and access

17 Data Warehouses DISADVANTAGES Adding new data sources takes time and associated high cost Data owners lose control over their data, raising ownership, security and privacy issues Long initial implementation time and associated high cost Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries

18 OLTP vs. OLAP OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse

19 OLTP Database vs. Data Warehouse relational databases - groups data using common attributes found in the data set objectives are different

20 OLTP database Data Warehouse Designed for real time business operations Designed for analysis of business measures by categories and attributes

21 OLTP database Data Warehouse Mostly updates Many small transactions Mb - Gb of data Mostly reads Queries are long and complex Gb - Tb of data

22 OLTP database Data Warehouse Current snapshot Raw data Thousands of users (e.g., clerical users) History Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts)

23 SUMMARY four questions for you

24 Designed for real time business operations Designed for analysis of business measures by categories and attributes 1 2

25 Designed for real time business operations Designed for analysis of business measures by categories and attributes Data Warehouse OLTP database

26 Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. 1 2

27 Optimized for bulk loads and large, complex, unpredictable queries that access many rows per table. Optimized for a common set of transactions, usually adding or retrieving a single row at a time per table. OLTP database Data Warehouse

28 Loaded with consistent, valid data; requires no real time validation. Optimized for validation of incoming data during transactions; uses validation data tables. 1 2

29 Loaded with consistent, valid data; requires no real time validation. Optimized for validation of incoming data during transactions; uses validation data tables. OLTP database Data Warehouse

30 Supports thousands of concurrent users. Supports few concurrent users relative to OLTP. 1 2

31 Supports thousands of concurrent users. Supports few concurrent users relative to OLTP. Data Warehouse OLTP database

32 Data, Information & Knowledge Data is just symbols Information is data that are processed to be useful; provides answers to "who", "what", "where", and "when" questions Knowledge is application of data and information; answers "how" questions

33 Data Data is raw. It simply exists and has no significance beyond its existence (in and of itself). It can exist in any form, usable or not. It does not have meaning of itself. In computer parlance, a spreadsheet generally starts out by holding data.

34 Information Information is data that has been given meaning by way of relational connection. This "meaning" can be useful, but does not have to be. In computer parlance, a relational database makes information from the data stored within it.

35 Knowledge Knowledge is the appropriate collection of information, such that it's intent is to be useful. Summaries of information in a database for example. Or modeling and simulation tools exercise some type of stored knowledge.

36 Copyright © 2005 Pearson Addison- Wesley. All rights reserved. 1-36 Examples - Supermarket OLTP Event is 3 cans of soup and 1 box of crackers bought; update database to reflect that event OLAP Last winter in all stores in northeast, how many customers bought soup and crackers together? Data Mining Are there any interesting combinations of foods that customers frequently bought together?

37 11 Database designing rules Rule 1: What is the nature of the application (OLTP or OLAP)? Rule 2: Break your data in to logical pieces, make life simpler Rule 3: Do not get overdosed with rule 2 Rule 4: Treat duplicate non-uniform data as your biggest enemy Rule 5: Watch for data separated by separators

38 Database designing rules Rule 6: Watch for partial dependencies Rule 7: Choose derived columns preciously Rule 8: Do not be hard on avoiding redundancy, if performance is the key Rule 9: Multidimensional data is a different beast altogether Rule 10: Centralize name value table design Rule 11: For unlimited hierarchical data self-reference PK and FK

39

40 Normal form examples 1 NF : First Name, Middle name, Surname- different columns 2 NF : Syllabus column of 5 th standard should depend on both primary keys roll no. & standard 3 NF : Average column depends on marks & no. of subjects Normalization rules are important guidelines but taking them as a mark on stone is calling for trouble.

41 Rule 1: What is the nature of the application (OLTP or OLAP)? Transactional: End user is more interested in CRUD, i.e., creating, reading, updating, and deleting records. The official name for such a kind of database is OLTP. Analytical: End user is more interested in analysis, reporting, forecasting, etc. - less number of inserts and updates. - main intention here is to fetch and analyze data as fast as possible. - The official name for such a kind of database is OLAP.

42 Rule 1: What is the nature of the application (OLTP or OLAP)? In other words if you think inserts, updates, and deletes are more prominent then go for a normalized table design, else create a flat denormalized database structure.

43

44 Rule 2: Break your data into logical pieces, make life simpler The first rule from 1 st normal form. If your queries are using too many string parsing functions like substring, charindex, etc apply this rule E.g. Query- student names having “Koirala” and not “Harisingh”, very complex query The better approach would be to break this field into further logical pieces to write clean and optimal queries.

45

46 Rule 3: Do not get overdosed with rule 2 Decomposing, is it needed? The decomposition should be logical. It’s rare that you will operate on ISD codes of phone numbers separately (until your application demands it). So it would be a wise decision to just leave it as it can lead to more complications.

47 Rule 4: Treat duplicate non-uniform data as your biggest enemy Focus and refactor duplicate data, it creates confusion. For instance, in the below diagram, you can see “5th Standard” and “Fifth standard” means the same.

48 Rule 4: Treat duplicate non-uniform data as your biggest enemy One of the solutions -move the data into a different master table altogether and refer them via foreign keys. E.g. new master table called “Standards” and linked the same using a simple foreign key.

49 Rule 5: Watch for data separated by separators The 2 nd rule of 1 st normal form says avoid repeating groups. Too much data stuffed in syllabus column. These fields are termed as “Repeating groups”. To manipulate this data, the query would be complex and the performance of the queries degrades.

50 Rule 5: Watch for data separated by separators Columns which have data stuffed with separators need special attention and a better approach would be to move those fields to a different table and link them with keys for better management.

51 Rule 6: Watch for partial dependencies Watch for fields which depend partially on primary keys. E.g Primary key is created on roll number and standard. The syllabus is associated with the standard in which the student is studying and not directly with the student. Move the syllabus field and attach it to the Standards table. This rule is the 2 nd normal form: “All keys should depend on the full primary key and not partially”.

52

53 Rule 7: Choose derived columns preciously

54 OLTP applications: getting rid of derived columns would be a good OLAP :a lot of summations, calculations, these kinds of fields are necessary to gain performance. The 3 rd normal form: “No column should depend on other non-primary key columns”. See the situation and then decide if you want to implement the 3 rd normal form.

55 Rule 8: Do not be hard on avoiding redundancy, if performance is the key Need for performance: think about de- normalization. Normalization: make joins with many tables Denormalization: the joins reduce and increase performance.

56 Rule 8: Do not be hard on avoiding redundancy, if performance is the key

57 Rule 9: Multidimensional data is a different beast altogether OLAP projects mostly deal with multidimensional data. E.g. get sales per country, customer, and date, where sales figures have three intersections of dimension data.

58

59

60

61 Rule 10: Centralize name value table design Name and value tables :has key and some data associated with the key. E.g. currency table and a country table. Have only a key and value. For such kinds of tables, creating a central table and differentiating the data by using a type field makes more sense.

62

63 Rule 11: For unlimited hierarchical data self- reference PK and FK Unlimited parent child hierarchy. E.g. A multi-level marketing scenario where a sales person can have multiple sales people below them. For such scenarios, using a self-referencing primary key and foreign key will help to achieve the same

64

65 Business Models Depends on business requirements E.g. E-commerce business model

66 3-66 Data Marts Data warehouses can support all of an organization’s information Data marts have subsets of an organizationwide data warehouse Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept

67

68 Assignment 1 Differentiate between OLTP and OLAP. Explain the design aspects of OLTP & OLAP What is BI & what are its components?

69 References OLTP Vs OLAP ppts Notes by Shivprasad koirala


Download ppt "Business Intelligence. Unit 1 Important Concepts."

Similar presentations


Ads by Google