Download presentation
Presentation is loading. Please wait.
Published byBethanie Lawson Modified over 9 years ago
1
Do It Strategically with Microsoft Business Intelligence! Bojan Ciric Strategic Consultant
2
ARCHITECTURE
3
Technical aspects of the project Data Integration Data Analysis Data Visualization
4
Solution Architecture
5
SQL Server as a backbone
6
DATA INTEGRATION
7
ETL: Extract, Transform, and Load 1.Extract data from the source systems 2.Transform the data to convert it to a desired format 3.Load the data into the data warehouse
8
Data Integration More than Just ETL Transform corporate data into meaningful and actionable information Challenges – Retrieve and merge data from multiple sources – Cleanse and transform the data – Load the data into appropriate data stores for analysis and reporting Enterprises spend 60%–80% of their BI resources in the data integration stage
9
Top tips for successful ETL: Choose the right ETL tool large volume of data and complex transformations requires powerful tool available to dealing with these challenges Use a lot metadata since that essence of ETL is composed by mapping of operational systems to destination as well as data transformation, store all of that in metadata instead of applications. Over the time you will get benefits through maintenance lower costs. Engage experienced ETL developers they are expensive but several times more efficient then inexperienced Choose the right destination data model operational sources you cannot choose, but good destination models allows you less complex transformations and better information potential Establish data governance during ETL process, it is extremely important and quite efficient to establish data governance within the organization if no any. Good Data Governance policy and procedures is essential environment for the ETL process during the operational work.
10
Let’s do ETL with SSIS SQL Server Integration Services (SSIS) service SSIS object model Two distinct runtime engines: – Control flow – Data flow 32-bit and 64-bit editions
11
The Package The basic unit of work, deployment, and execution An organized collection of: – Connection managers – Control flow components – Data flow components – Variables – Event handlers – Configurations Can be designed graphically or built programmatically Saved in XML format to the file system or SQL Server
12
Control Flow Control flow is a process-oriented workflow engine A package contains a single control flow Control flow elements – Containers – Tasks – Precedence constraints – Variables
13
Data Flow The Data Flow Task – Performs traditional ETL and more – Fast and scalable Data Flow Components – Extract data from Sources – Load data into Destinations – Modify data with Transformations Service Paths – Connect data flow components – Create the pipeline
14
Data Flow Sources Sources extract data from – Relational tables and views – Files – Analysis Services databases
15
Data Flow Destinations Destinations load data to – Relational tables and views – Files – Analysis Services databases and objects – DataReaders and Recordsets
16
Row Transformations Update column values or create new columns Transform each row in the pipeline input
17
Rowset Transformations Create new rowsets that can include – Aggregated values – Sorted values – Sample rowsets – Pivoted or unpivoted rowsets This is a heavy-weight performer of SSIS Are also called asynchronous components
18
Split and Join Transformations Distribute rows to different outputs Create copies of the transformation inputs Join multiple inputs into one output Perform lookup operations
19
Custom Developed Components Fully aligned with IBM BDW model Source To BDW Mapping Classification Management Surrogate Key Management Accounting Unit Management Pexim Rank Resolver Pexim Message Mapper Classification Resolver Pexim Surrogate Key Pexim Accounting Unit Rank Management
20
ETL Concept Evolution – Custom Components
21
ETL Concept Evolution – Logic out of SQL Queries
22
DEMO Microsoft SQL Server Integration Services
23
DATA ANALYSIS OLAM, Multidimensional Data, Data mining and so on
24
OLAP or Multidimensional Data Online Analytical Processing = Multidimensional Data Measures and Dimensions Uses a calculation engine for fast, flexible transformation of base data (such as aggregates) Supports discovery of business trends and statistics not directly visible in data warehouse queries
25
Cube (UDM) Unified Dimensional Model Combination of measures (from facts) and dimensions as one conceptual model Rich data model enhanced by – Calculations – Key Performance Indicators (KPIs) – Actions – Perspectives – Translations – Partitions Formally, cube is called a UDM
26
Querying a Simple Cube What sales did we expect to achieve in North America for CY 2004 Q1? 5,005,000
27
Star Schema
28
Star Schema Benefits Transforms normalized data into a simpler model Delivers high-performance queries Delivers higher performing queries using Star Join Query Optimization Uses mature modeling techniques that are widely supported by many BI tools Requires low maintenance as the data warehouse design evolves
29
Snowflake Dimension Tables Define hierarchies using multiple dimension tables Support fact tables with varying granularity Simplify consolidation of data from multiple sources Potential for slower query performance in relational reporting No difference in performance in Analysis Services database
30
Hierarchies Benefits – View of data at different levels of summarization – Path to drill down or drill up Implementation – Denormalized star schema dimension – Normalized snowflake dimension – Self-referencing relationship
31
Fact Table Fundamentals Collection of measurements associated with a specific business process Specific column types – Foreign keys to dimensions – Measures – numeric and additive – Metadata and lineage Consistent granularity – the most atomic level by which the facts can be defined
32
Fact Table Examples
33
Parent-Child Hierarchy A dimension that contains a parent attribute A parent attribute describes a self-referencing relationship, or a self-join, within a dimension table Common examples – Organizational charts – General Ledger structures – Bill of Materials
34
Parent-Child Hierarchy Example
35
Slowly Changing Dimensions Support primary role of data warehouse to describe the past accurately Maintain historical context as new or changed data is loaded into dimension tables Implement changes by Slowly Changing Dimension (SCD) type – Type 1: Overwrite the existing dimension record – Type 2: Insert a new ‘versioned’ dimension record – Type 3: Track limited history with attributes
36
SCD Type 1 Existing record is updated History is not preserved
37
SCD Type 2 Existing record is ‘expired’ and new record inserted History is preserved Most common form of SCD
38
SCD Type 3 Existing record is updated Limited history is preserved Implementation is rare
39
SQL Server 2008 Analysis Services OLAP component – Aggregates and organizes data from business data sources – Performs calculations difficult to perform using relational queries – Supports advanced business intelligence, such as Key Performance Indicators Data mining component – Discovers patterns in both relational and OLAP data – Enhances the OLAP component with discovered results
40
Cube = Unified Dimensional Model Multidimensional data Combination of measures and dimensions as one conceptual model – Measures are sourced from fact tables – Dimensions are sourced from dimension tables
41
Dimensions and Facts Basis of All BI Fact – something that happened – Sale, purchase, shipping... – Transaction or an event – Verb – Essentially a Measure Dimension – describes a fact – Customer, product, account... – Object – Noun A fact (measure) is expressed in terms of dimensions – 16 balls sold to John on 20090115.
42
Dimensions Describe business entities Contain attributes that provide context to numeric data Present data organized into hierarchies
43
Dimensions Members from tables/views in a data source view (based on a Data Warehouse) Contain attributes matching dimension columns Organize attributes as hierarchies – One All level and one leaf level – User hierarchies are multi-level combinations of attributes – Can be placed in display folders Used for slicing and dicing by attribute
44
Hierarchy Defined in Analysis Services Ordered collection of attributes into levels Navigation path through dimensional space Very important to get right!
45
Measure Group Group of measures with same dimensionality Analogous to a fact table Cube can contain more than one measure group – E.g. Sales, Inventory, Finance Defined by dimension relationships
46
Dimension Model
47
Calculations Expressions evaluated at query time for values that cannot be stored in fact table Types of calculations – Calculated members – Named sets – Scoped assignments Calculations are defined using MDX MDX = M ulti D imensional E X pressions
48
DEMO Microsoft SQL Server Analysis Services - OLAP
49
DATA MINING
50
Data Mining Discovery of (very) hidden patterns in mountains of data Correlation search engine Recent combination of statistics, probability analysis, database technologies, machine learning, and AI
51
What does Data Mining do?
52
Typical Uses Data Mining Seek Profitable Customers Understand Customer Needs Anticipate Customer Churn Predict Sales Build Effective Marketing Campaigns Detect and Prevent Fraud Correct Data During ETL
53
Typical Scenarios Who are our customers? Are there any relationships between their demographics and their buying power? Customer Classification & Segmentation Who are our most profitable customers? Can I predict profit of a future customer based on demographics? Are they creditworthy? How much should I charge them to give a good loan and protect against losses? Profitability and Risk How do they behave? What are they likely to do once they bought that really expensive car? Should I intervene? Customer Needs Analysis What are my sales going to be like in the next few months? Will I have credit problems? Will my server need an upgrade in the next 3 months? Forecasting
54
Summary of Techniques AlgorithmDescription Decision Trees Finds the odds of an outcome based on values in a training set Association Rules Identifies relationships between cases Clustering Classifies cases into distinctive groups based on any attribute sets Naïve Bayes Clearly shows the differences in a particular variable for various data elements Sequence Clustering Groups or clusters data based on a sequence of previous events Time Series Analyzes and forecasts time-based data combining the powerof ARTXP (developed by Microsoft Research) for short-term predictionswith ARIMA (in SQL 2008) for long-term accuracy. Neural Nets Seeks to uncover non-intuitive relationships in data Linear Regression Determines the relationship between columns in order to predict an outcome Logistic Regression Determines the relationship between columns in order to evaluate the probability that a column will contain a specific state
55
Mining Process
56
DEMO Data Mining
57
Summary Data Mining is a powerful, predictive technology Turns data into valuable, decision-making knowledge SQL Server 2008 Analysis Services support Predictive Analytics Mine your mountains of data for gems of intelligence today!
58
DATA VISUALIZATION
59
Key Performance Indicator (KPI) Quantifiable measurement comparing business performance to goals Measure of overall organizational health when combined into a collection for a business scorecard Three main ways to build KPIs: – Using OLAP (cubes) – Directly in Performance Point Server – Using data mining (predictive KPI)
60
KPI Characteristics Value Goal Status Trend
61
Dashboards and Scorecards Scorecard – Table (pivot-like) of KPIs Dashboard – Contains scorecards, reports, and other analytical visualisations
62
DEMO Building Dashboard with Microsoft Excel 2010
63
PowerPivot for Excel 2010 PowerPivoting Massive Data Volumes With a few mouse clicks, a user can create and publish intuitive and interactive self-service analysis solutions.
64
Empower Your Users—With Familiar Tools The ease of Excel now with: Unmatched computational power Advanced analytic expressions
65
Empower Your Users—With Familiar Data Users know reports: What’s available What’s useful Where to find them How to use them IT provions reports: With security and scalability
66
Excel 2010 and Excel Services Interactive slicers enable users to look at the data from various directions in Excel 2010 and in the browser through PowerPivot for SharePoint and Excel Services. Interactive Slicing and Dicing
67
© 2010 Asseco SEE. All rights reserved. The information herein is for informational purposes only and represents the opinions and views of Asseco SEE and/or Bojan Ciric. The material presented is not certain and may vary based on several factors. Portions © 2009 Asseco SEE & entire material © 2009 Microsoft Corp. Some slides contain quotations from copyrighted materials by other authors, as individually attributed or as already covered by Microsoft Copyright ownerships. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view Asseco SEE as of the date of this presentation. Because Asseco SEE & Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft and Asseco SEE cannot guarantee the accuracy of any information provided after the date of this presentation. Asseco SEE makes no warranties, express, implied or statutory, as to the information in this presentation. E&OE.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.