Download presentation
Presentation is loading. Please wait.
Published byRoss York Modified over 9 years ago
1
Enterprise Data Management Optimization Dr. Boris Zibitsker BEZ Systems boris@bez.com www.bez.com St. Louis CMG
2
Outline Enterprise Data Management with Moving Target Enterprise Data Management Options and Tradeoffs Role of Modeling in EDM Optimization –How to use performance prediction models to evaluate and justify enterprise data management alternatives, set performance expectations, verify results and organize a continuous proactive EDM process Examples Illustrating the Best Practice of EDM Proactive Performance Management During Application and Information Life Cycle Applying Modeling for Optimizing EDM Strategic Decisions –How to justify enterprise data warehouse –How to justify master data management Applying Modeling for Optimizing EDM Tactical Decisions –How to reduce time of loading growing volume of data –How to reduce data access time –How to predict the impact of new application implementation Applying Modeling for Optimizing EDM Operational Decisions –Predicting how change of the workload’s priority will affect performance –Comparison of actual results vs. expected and organizing continuous proactive service level management Summary 2 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
3
Challenges of Enterprise Data Management 3 Data © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 Changing business demand Loading more data Increasing number of user Implementing new applications Upgrading hardware and software How to optimize EDM to provide accurate and timely information with minimum cost and with moving target
4
4 Scaling Tradeoffs in a Multi-tier Distributed Environment Distribution: Adding more servers, nodes Centralization: Server consolidation Data compression More: CPUs/Server, JVM/Server Disks/Server Reduce Queueing Time Faster CPUs, Disks Reduce Service Time Parallelization Concurrency DBMS Servers Web Servers Application Servers Storage Subsystem EDW Sales Marketing HR © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
5
Optimization of Strategic, Tactical and Operational Enterprise Data Management Decision Strategic Decisions (Yearly) Architecture: Centralized EDW vs. Distributed DW and DM vs. Master Data Management Where to place data Where to run applications Tactical Decisions (Weekly/Monthly) Dormant data Indexes Partitioning Compression Operational Decisions (Hourly) Concurrency Parallelism Priority Resource reallocation Compare different options Select criteria of comparison, like cost, response time, throughput, availability, accuracy, consistency, manageability, flexibility Define relative importance/weight of each criteria Build models showing relationship between different parameters and each criteria for each option Find an optimum option/solution as a compromise between different criteria 5 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
6
Wrong EDM Decisions Can Delay Action Time and Negatively Affect Business © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 6 ETL Data Access Action Time Value lost Bus Event Business Value Time How Long Will it Take to Load and Aggregate More Data? How Long Will It Take to Access More Data? How Can the Accuracy and Timeliness of Information Be Improved? InformationAction
7
Difference Between Efficiency and Effectiveness of EDM Decisions Effectiveness Accurate and timely information Ability to make right decisions Impact on the bottom line Efficiency Cost Performance Scalability Availability Consistency 7 Strategy Operations Tactics © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
8
Input Workloads Hardware Software Prediction Engine Performance Prediction & Optimization Output Recommendation & Expectations By Workload Options Hardware Software DBMS Plan Workload Growth Database Size Growth Hardware Upgrade Software Parameters New Application Server Consolidation DBMS Wizards Index Adviser MV Adviser Data Partitioning Data Compression Optimization Engine 8 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 OS DBMS Server Applica- tion Server
9
9 Simplified Model of 3-tier Architecture Max? 1 2 n CPU Disk Memory Max? 1 2 n CPU Disk Memory Active Sessions Threads or Active Sessions Rejected Requests Arriving requests No Rejected Requests DBMS Servers Net Max? 75 1 2 n CPU Disk Memory Active Sessions Rejected Requests Users Arriving Requests NetworkWeb Servers Net Client 200125 75 60 15 50 25 # of Threads & Active Sessions Control Concurrency Memory Limitation Level of Parallelism Affects Performance Application Servers © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
10
Workload Characterization
11
Each Workload Has Unique Performance, Data & Resource Utilization Profiles Table 1 Table 3 Table m Table 2 Appl SQL User … … CPU Disk CPU Disk User Business Process Workloads Resource Utilization Data 11 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
12
Verification and Control Trend Analysis, Baseline Analysis for Fixed/Rolling Period Trend Analysis Period-to-Period Comparisons & Change Validation Proactive Corrective Actions & New Expectations Workload Centric Approach to Service Level Management Operational Decisions Problem Isolation Current and Predicted Service Breach Business to Infrastructure Drill down Zoom In / Out Include / Exclude Filters Performance Utilization Data Access Scheduling, Workload Management Strategic Decisions Justification of Architecture : Setting Realistic SLO and SLA Capacity Planning New Application Implementation Virtualization Consolidations Tactical Decisions Concurrency Control Priority Database Tuning Index Creation Memory Adjustments Partitioning Compression Appl. Server Tuning #JVM & #JVM Threads Connection Pool Size 12 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
13
13 Typical Steps of Applying Modeling During Application and Information Life Cycle Application Life Cycle Feasibility study New application implementation Performance management Capacity planning Disaster recovery Application consolidation Application Life Cycle Feasibility study New application implementation Performance management Capacity planning Disaster recovery Application consolidation Information Life Cycle Data loading (ETL) Data modeling Database tuning Data growth Backup and restore Data replication Data consolidation Enterprise data management Information integration Information Life Cycle Data loading (ETL) Data modeling Database tuning Data growth Backup and restore Data replication Data consolidation Enterprise data management Information integration Measure CharacterizePlan Advise Manage Model & Optimize © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
14
Example of Configuration Planning Tasks for Multi-tier Distributed Environment For each workload, identify how many users can be supported by one JVM How many JVMs will be required to support each of the workloads The number of servers required to support all workloads The optimum number of CPUs per server CPU type and speed Server memory size Number of host channels Storage subsystem type Control unit cache size Number of disk channels Number of disks per server Maximum number of active sessions within DBMS server per workload Dispatching priority for each workload Maximum degree of parallelism Indexing Materialized views Partitioning Data compression 14 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
15
Performance Prediction
16
Predicting Impact of Workload Growth This Month Next Month In 2 Months In 3 Months In 4 Months Arrival Rate (Req/sec) 56789 Service Time (sec) 0.10.120.140.160.18 Utilization (%) 0.50.60.70.80.9 Response Time (sec)0.20.30.460.81.8 A = 5 Req / secScpu = 0.1 sec Utilization Law U=A*S Ucpu = 5 Req/sec * 0.1sec = 0.5 Response Time law R=S/(1-U) Rcpu = 0.1 sec / (1 - 0.5) = 0.2 sec Little’s Law N = A * R CPU Based on expected workload growth of 20% per month, predict when the system will not be able to meet SLO (0.6 sec). What will be the impact of doubling CPU speed? How long will the system satisfy SLO? 16 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
17
Predicting Impact of Doubling CPU Speed This Month Next Month In 2 Months In 3 Months In 4 Months Arrival Rate (Req/sec) 56789 Service Time (sec) 0.10.120.140.160.18 Utilization (%) 0.50.60.70.80.9 Response Time (sec) 0.20.30.460.81.8 Doubling CPU Speed 0.060.090.130.220.47 Based on expected workload growth of 20% per month, predict when the system will not be able to meet SLO (0.6 sec). What will be the impact of doubling CPU speed? How long will the system satisfy SLO? 17 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
18
18 Example of Planning (see spreadsheet)
19
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 19 Workload Characterization & Forecasting
20
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 20 Modeling AS Hardware Upgrade Impact
21
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 21 Predicted AS Upgrade Impact
22
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 22 Modeling DBMS Server Upgrade Impact
23
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 23 Modeling DBMS Server Upgrade Impact
24
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 24 Predicted DBMS Server Upgrade Impact
25
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 25 Predicted Parallel Processing Impact
26
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 26 Predicted Parallel Processing Impact
27
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 27 Predicted Parallel Processing Impact
28
Strategic Decisions How to Justify Enterprise Data Warehouse Master Data Management Hardware DBMS
29
Optimization of Placement Data and Applications © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 Very Large Disks Large Disks Small Disks Tapes Solid State Data EDW DW AS Hub Applications 29
30
EDW Justification of EDW Predicting How EDW Will Affect ETL and Information Access Time Data Mart 3 Data Mart 3 Data Mart 4 Data Mart 4 Data Mart 2 Data Mart 2 Data Mart 5 Data Mart 5 Data Mart 1 Data Mart 1 Source ETL Source Information Access Time (DM) Source Extract Standard Transform Standard Transform Stage Data Mart Transform Data Mart Transform ETL(DM) Time ETL (EDW) Data Mart 4 Data Mart 4 Data Mart 5 Data Mart 5 Data Mart 3 Data Mart 3 Data Mart 2 Data Mart 2 Data Mart 1 Data Mart 1 Data Mart 6 Data Mart 6 A A B B C C ∑(A,B) Information Access Time (EDW) Factors Affecting EDW Justification: Hardware cost Software licenses ETL process Support personnel 30 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
31
31 What Is the Best Architecture and Hardware Configuration for Specific EDW Workloads? DB2 UDB vs. Oracle RAC vs. Teradata © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
32
Differences Between Parallel Processing on Teradata and Oracle Limited # of Available AMP Worker Tasks 32 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
33
Modeling Scaling Out 33 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
34
Predicting Impact of Different Hardware Platforms and Configurations 34 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
35
Prediction Results Show That Increase in # of Oracle RAC Nodes Will Reduces CPU Utilization, Improve Response Time and Throughput, but Will Increase Contention for Disk 35 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 I/O Rate * 10
36
36 Master Data Store (MDS)―Planning and Managing Challenges What are the performance implications of supporting centralized Master Data Store vs. distributed repositories of Master Data? ODS EDW DM MDSMD Current Historical © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
37
37 What Are the Performance Implications of Supporting Centralized MDM vs. Distributed Repositories for MDM? (Hub vs. Spoke Architectures) Hub Start with hub and when frequency of accesses increases, consider spoke MDS Current & Historical Data © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
38
Tactical Decisions How to Reduce Time to Load Growing Volume of Data How to Reduce Data Access Time How to Predict the Impact of New Application Implementation
39
39 Technology Processes Workload Data Increase in Volume of Data and Change of Pattern Accessing Data Affects Each Workload’s Performance Increase in volume of data and pattern of data access affects: © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 Predict Future Bottleneck Predict Future Bottleneck Identify Critical Workload Users, SQL Tables, Which Will Cause Problems Identify Critical Workload Users, SQL Tables, Which Will Cause Problems Use DBMS Wizards to Find Tuning Options Use DBMS Wizards to Find Tuning Options Use Modeling to Justify Change & Verify Results Use Modeling to Justify Change & Verify Results ETL Time Disk utilization Aggregation and summarization time Data access time Session, thread usage time Buffer utilization and hit ratio DBMS server and application server CPU utilization Internode communication utilization Enterprise service bus utilization Response time and throughput
40
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 40 Predicting Database Tuning Impact Creation of the Index – See Spreadsheet
41
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 41 Predicting Database Tuning Impact
42
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 42 Predicting Database Tuning Impact
43
© Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 43 Predicting Database Tuning Impact
44
44 Example: Can I load growing volume of data on time, and how will data load affect other workloads? It will take 6 times longer to load growing volume of data in 10 months. RT for HR application will increase almost 2 times & throughput for ETL will be reduced almost 2 times It will take 6 times longer to load growing volume of data in 10 months. RT for HR application will increase almost 2 times & throughput for ETL will be reduced almost 2 times © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 Transform Extract Load Transform Transport ETL SourceETL Target
45
45 What is the Minimum Hardware Upgrade Required to Load Growing Volume of Data on Time? © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
46
46 What if we Increase the Number of Parallel ETL Utilities Loading Data in Parallel Starting Next Month (p2) and Upgrade Hardware (p5)? Increase in # of loads will allow significant reduction of load time, but there will be very significant elongation of the RT for HR, Marketing and Sales workloads © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
47
47 Predicted Impact of the Implementation of Parallel Processing Based on Oracle 10g RAC Implementation of parallel processing will improve response time for complex queries almost 2 times © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
48
48 Example Showing How Modeling Results Identify Potential Bottlenecks and SQL That Will Cause Problems in the Future by Workload When will SLO not be met? What will cause the problem? Who will cause the problem? How do you fix the problem? What are database and application tuning alternatives? What are the expected savings?
49
DB Advice Capacity Planning Recommendations Processing SQL through SQL/DBMS Access Advisor gives a list of recommendations 49 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
50
Example Showing Predicted Impact of Recommended Indexes 50 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
51
Predicted Data Compression Impact on Different Workloads Data compression will have different impact on different workloads. DW workloads with primarily SELECT type of requests will benefit more. 51 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
52
Data partitioning will have a positive impact on performance for all workloads. Predicted Impact of Data Partitioning 52 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
53
Performance Prediction Results Based on Oracle Memory Advisor Reflect the Impact of the Workload Growth and Memory Pool Size Change 53 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
54
Predicted Impact of Adding a New Application Set up realistic expectations and reduce risk of surprises Prediction on how new application will perform in production environment Prediction on how new application will affect performance of existing applications Test Production Alternatives In a future Database Replay 54 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
55
Predicting New HR Application Implementation Impact 55 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
56
Modeling Results Help Customers to Set Up Realistic SLO and Negotiate SLA for Major Workload Hardware Configuration & TCO SLO Users and IT select SLO level that will provide acceptable performance with acceptable Total Cost of Ownership (TCO ) Prediction results allow customers to negotiate SLA between business and IT For expected workload and database size growth, IT guarantees delivery of a certain level of responsiveness and throughput Expected workload & DB growth Predicted RT for one of the Workloads SLA 56 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
57
Operational Decisions Workload Priority Scheduling Organizing Continuous Proactive Performance Management Virtual Tape Library
58
58 Predicting How Change of the Workload’s Priority Will Affect Performance Sales workload priority increase will improve Sales RT, but other workloads will suffer © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
59
Comparison of Actual Results vs. Expected and Organizing Continuous Proactive Service Level Management Find difference between predicted results or expectations (red line) and actual measurement data Track how often the actual results do not meet expectation (SLA) When number of exceptions exceeds the threshold, generate alert Explain difference and develop new corrective recommendations 59 © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008 Identify when SLO will not be met for each workload Identify what will be a bottleneck Identify which workload will cause the performance degradation
60
60 Summary Use modeling to evaluate options and justify EDM strategic, tactical and operational decisions to satisfy contradictive business requirements for timeliness and flexibility, accuracy, acceptable performance and minimum cost Organize a continuous process of applying models for justifying EDM decisions, setting expectations, verifying results and finding effective proactive corrections during application and information life cycle Workload characterization and modeling allow identification of which data and applications are used by individual lines of business and business processes, and focus EDM decisions and efforts on proactively addressing the most important strategic, tactical and operational IT issues © Boris Zibitsker, BEZ, St Louis CMG - Feb 12, 2008
61
Thank You! Questions? Dr. Boris Zibitsker boris@bez.com www.bez.com
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.