Download presentation
Presentation is loading. Please wait.
Published byPauline Benson Modified over 9 years ago
1
© 2009 IBM Corporation AIX Rightsizing Clea Zolotow Senior Technical Staff Member, IBM Corporation Nicholas Lydakis, Manager, Capacity Planning, WellPoint Corporation June 3, 2011
2
© 2009 IBM Corporation AIX Rightsizing 2 ABSTRACT There are many ways to reduce cost in a datacenter. One of the easiest ways to decrease costs is to decrease the number of servers on the floor. Now, along with physical consolidation, we can logically simplify the datacenter by utilizing virtualization. Some technical barriers to virtualization are Performance concerns – workloads competing for resources; Growth concerns – workloads cannot reserve space for growth; and Architectural constraints – servers run out of IO or memory before they run out of CPU. This presentation provides an mass analysis methodology to address performance and growth concerns, and architectural constraints as well as methodologies that can be used to coadunate LPARs to achieve higher utilization rates at the hardware level. This methodology has been quite successful at IBM. Our biggest cost savings was a run rate of $2.4 million yearly in hardware and a $2 million software savings due to decreased engine utilization.
3
© 2009 IBM Corporation AIX Rightsizing 3 Virtualization = Infrastructure Simplification Efficient Virtualization provides the best ROI and minimize the RISK Logical Simplification Multiple virtual servers (OS’s) per physical server Significant savings – fewer servers, higher utilization Rapid “provisioning” Automatic workload mgmt Preserve logical “server to application” relations Virtualization Virtual Servers, Storage, Networks Storage Servers Networking Physical Consolidation Linux Server Networking Fewer sites Use of larger servers / SAN’s Mostly environmental savings Disparate management tools Labor intense provisioning Workload mgnt and isolation issues SAN Windows Server Unix Server Linux Servers Unix Servers 1 workload per server Manual provisioning No sharing Vertical silo’s Disparate mgmt tools Multiple sites Management Servers Complex Networking Storage Windows Servers
4
© 2009 IBM Corporation AIX Rightsizing 4 Virtualization’s popularity today is based on its ability to optimize IT Virtualization has been around for decades And it is here to stay Large and small organizations alike are rapidly adopting the technology Virtualization motivators Reduce costs 57% Simplify IT infrastructure and administration 48% Increase server utilization 48% Increase scalability of infrastructure 29% Enhance resiliency and reliability 25% Improve application performance 15% Manage a heterogeneous server environment 9% Source: IBM Systems and Technology Group (1Q06) Why do organizations adopt virtualization? For reasons that range from reduced IT costs to simplified IT environments, streamlined management and increased IT flexibility
5
© 2009 IBM Corporation AIX Rightsizing 5 Each Workload is Evaluated for Suitability Based on Technical Attributes Priority Workloads for Consolidation: WebSphere ® applications Domino ® Applications Selected tools: Tivoli ®, WebSphere ® and internally developed WebSphere MQ DB2 ® Universal Database ™
6
© 2009 IBM Corporation AIX Rightsizing 6 Current Mid-Range Server Location by State – Physical Consolidation Opportunities still exist!
7
© 2009 IBM Corporation AIX Rightsizing 7 Analysis methodology to address performance and growth concerns: Rightsize individual LPARs (CPU and Memory) Know your current hardware utilization rates and derive potential cost savings to get customer/app owner buy-in. Rightsize individual LPARs –Initial pass is “perfect world” –Second pass is initial meeting with app owners. –Third and subsequent passes take into account most-loved and business critical applications. Roll out resizing in waves. –Capacity planning has to measure pre- and post-wave to ensure that there is headroom for processing. –Find potential resource problems before the app owner Actual hardware savings is usually 50% or less than perfect world analysis.
8
© 2009 IBM Corporation AIX Rightsizing 8 UNIX Virtualized vs. Non-Virtualized Utilization Large Company – Recent Data
9
© 2009 IBM Corporation AIX Rightsizing 9 Capped and Uncapped Mode In the configuration of Micro-Partitioning, two types are available, capped and uncapped. The difference is in defining the ability of a partition to use extra capacity available in the system. If a processor donates unused cycles back to the shared pool, or if the system has idle capacity (because there is not enough workload running), the extra cycles may be used by other partitions, depending on their type and configuration. Capped mode The processing capacity never exceeds the assigned processing capacity. Uncapped mode The processing capacity may be exceeded when the shared processing pool has available resources.
10
© 2009 IBM Corporation AIX Rightsizing 10 Capped Mode A capped partition is defined with a hard maximum limit of processing capacity. That means that it cannot go over its defined maximum capacity in any situation, unless you change the configuration for that partition (either by modifying the partition profile or by executing a dynamic LPAR operation). Even if the system is otherwise idle, the capped partition cannot exceed its entitled capacity.
11
© 2009 IBM Corporation AIX Rightsizing 11 Uncapped Mode With an uncapped partition, you must specify the uncapped weight of that partition. If multiple uncapped logical partitions require idle processing units, the managed system distributes idle processing units to the logical partitions in proportion to each logical partition's uncapped weight. The higher the uncapped weight of a logical partition, the more processing units the logical partition gets.
12
© 2009 IBM Corporation AIX Rightsizing 12 Min, Max and Desired When assigning processor values you must specify minimum, desired, and maximum values for both processing units and virtual processors. If any of the three types of resources cannot satisfy the specified minimum and required values, the activation of a partition fails. If the available resources satisfy all the minimum and required values but do not satisfy the desired values, the activated partition will get as many of the resources that are available. Min Processing Unit.1 Desired Processing Unit.5 Max Processing Unit 1 Min Virtual CPU 1 Desired Virtual CPU 1 Max Virtual CPU 2 The maximum value is used to limit the maximum processor resources when dynamic logical partitioning operations are performed on the partition. This is the Cap
13
© 2009 IBM Corporation AIX Rightsizing 13 Rightsizing Methodology: AIX CPU Sizing Parameters (Uncapped) Minimum=the lowest configuration available without rebooting Physical Entitlement=the starting configuration of the LPAR Maximum=the highest configuration available without rebooting Virtual Entitlement=the maximum the LPAR can receive
14
© 2009 IBM Corporation AIX Rightsizing 14 Rightsizing Methodology: AIX CPU Sizing Parameters (Capped) Minimum=the lowest configuration available without rebooting Maximum=the highest configuration available without rebooting Physical Entitlement=the capacity of the LPAR can receive
15
© 2009 IBM Corporation AIX Rightsizing 15 Advanced Power Virtualization AIX 5L V5.2 Linux Hypervisor Dynamically resizable 2 CPUs 4 CPUs 6 CPUs Linux AIX 5L V5.3 Virtual I/O paths AIX 5L V 5.3 Micro-Partitioning Manager Server LPAR 2 AIX 5L V5.3 LPAR 1 AIX 5L V5.2 LPAR 3 Linux PLM partitions Unmanaged partitions Hypervisor PLM agent AIX 5L V5.3 6 CPUs Ethernet sharing Virtual I/O server partition Storage sharing 1 CPU i5/OS V5R3** 1 CPU IVM Virtual I/O Server –Shared Ethernet –Shared SCSI and Fibre Channel-attached disk subsystems –Supports AIX 5L V5.3 and Linux partitions Micro-Partitioning –Share processors across multiple partitions –Minimum partition 1/10 th processor Partition Load Manager –Balances processor and memory request Managed via HMC or IVM
16
© 2009 IBM Corporation AIX Rightsizing 16 Tooling and Data Retrieval: SRM To the right is the SRM methodology and data streams. This works like many other performance and capacity systems. Minutely agents are deployed (1) and sent to an interim holding spot (2) where the the data gets processed and crunched to 15 minute intervals or hourly intervals (3) where it’s stored in DB2 (4) and presented on the SRM website(4).
17
© 2009 IBM Corporation AIX Rightsizing 17 Tooling and Data Retrieval: Brio (ODBC) After the data is loaded to the SRM data warehouse, it is extracted to the PC utilizing Microsoft’s Open Data Base Connectivity (ODBC). There, the architectural and utilization information is merged together to produce three reports utilized for rightsizing and server consolidation studies.
18
© 2009 IBM Corporation AIX Rightsizing 18 Rightsizing Methodology: AIX CPU Sizing Parameters Part One: Pull the data: Part Two: Analyze it Use this later, start with the forest, not the trees. =ROUNDUP(IF( A3="Capped",( G3*I3/100)*1.3, J3),0) =IF(K3/10> M3,K3/10, M3) =ROUNDUP(IF( A3="Capped", G3*I3/100,J3),1 )
19
© 2009 IBM Corporation AIX Rightsizing 19 The Big Picture In the previous example, I chose only the 34 32-way boxes at this corporation (1088 CPUs). 385 physical CPUs on capped LPARs are currently allocated to the workload. After rightsizing, in a perfect world, we uncapped all the LPARs and could run them on 261 virtual CPUs and 174.8 physical CPUs, or 5.5 32-way boxes, a savings of 25 physical frames after accounting for headroom (2 CPUs per frame) and 4 engines per frame dedicated to VOIS. Your mileage will vary.
20
© 2009 IBM Corporation AIX Rightsizing 20 Technical Barriers to Virtualization: Workloads Competing for Resources Monitoring workloads is essential. Silo-ed corporations seem to believe that in shared-host systems, someone else is stealing their CPU. The next chart shows how physical utilization can be calculated at the frame level. Uncapped LPAR utilization is calculated by utilizing the number of CPUs dispatched to service the workload and therefore includes any LPAR overhead of frame overhead (PURR value, physical processors consumed). Capped LPAR utilization can be calculated in two ways: –Simple count of engines as they are no longer in the shared pool (i.e., the number of physical CPUs). –CPU Utilization * the number of physical CPUs assigned. To prove to management that the boxes are underutilized and run a cost savings project, I usually use CPU Utilization (as seen on the next page). To prove to application owners that the CPUs isn’t being “stolen” I use the “simple count of engines” for the capped environment and the CPU dispatched for the uncapped.
21
© 2009 IBM Corporation AIX Rightsizing 21 The top (yellow bar) is the number of physical CPUs, here 32. The red square is the 90 th percentile of the CPU utilization of the frame utilizing hourly data. The top of the blue line is the maximum CPU utilization of the frame. The bottom of the blue line is the average utilization of the frame. Technical Barriers to Virtualization: Workloads Competing for Resources
22
© 2009 IBM Corporation AIX Rightsizing 22 Growth Concerns – Workloads Cannot Reserve Space for Growth; In an uncapped environment, workloads can reserve space for growth by utilizing the amount of virtualized CPUs available to the workload. This was used to “sell” the benefits of uncapped LPARs to the application owners. In the previous example, a 30% uplift was built into the calculation for the virtual CPUs: –=ROUNDUP(IF(A3="Capped",(G3*I3/100)* 1.3,J3),0). –As you work with your individual environment, you can customize that uplift. –Note that uplift not only covers growth, but intra-hour peaks (as I utilized hourly average data).
23
© 2009 IBM Corporation AIX Rightsizing 23 Architectural Constraints – Servers Run out of IO or Memory Before They run out of CPU; These machines require 1,393,664 MB of memory to run their workload. (Memory optimization will have to wait for another day.) Spread over 7 machines, each machines (evenly) would require 199,095 MB of memory, or 200,704 (4096) or 204,800 (8192). Unfortunately, these machines came with 131,072. Further, there are 7 Oracle databases in which the application owner will not let the LPAR run on shared VOIS, adding to the number of frames and the number of engines.
24
© 2009 IBM Corporation AIX Rightsizing 24 Methodologies to Coadunate LPARs coadunation the state or condition of being united by growth. — coadunate, adj.
25
© 2009 IBM Corporation AIX Rightsizing 25 Coadunation Example Mixing workload shares headroom but you pay in response time at low utilization....workload management shifts peaks based on business priorities to use "white space" but response time of lower priority work is traded off...
26
© 2009 IBM Corporation AIX Rightsizing 26 Data Preparation Data is readily available from the SRM database at srmweb.raleigh.ibm.com. Data is extracted and normalized to the receiving machine using the Ideas International database. The CSV file is briefly edited then run into SPOT. This extraction and load process takes about 20 minutes (depending on the response time of the SRM database). The SPOT tool takes about 10 minutes to run each datacenter (Southbury and Boulder). Total study time is 60 minutes. Easy!
27
© 2009 IBM Corporation AIX Rightsizing 27 SPOT Screenshot #1
28
© 2009 IBM Corporation AIX Rightsizing 28 SPOT Screenshot #2
29
© 2009 IBM Corporation AIX Rightsizing 29 SPOT Screenshot #3
30
© 2009 IBM Corporation AIX Rightsizing 30 Results of Co-adunation Study, Boulder Boulder has 24 physical frames holding 93 LPARs, averaging 3.875 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 5 x445s, which would then run an average of 47.4% busy, a savings of 19 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average 18.2 LPARs per frame.) Current host utilization for Boulder for March, 2007 was 7.33% busy.
31
© 2009 IBM Corporation AIX Rightsizing 31 Results of Co-adunation Study, Southbury Southbury has 17 physical frames holding 59 LPARs, averaging 3.47 LPARs per frame. Based on CPU utilization, the LPARs could all be deployed to 4 x445s, which would then run an average of 45.6% busy, a savings of 13 physical frames. 2 LPARs would be migrated to stand-alone. (This is an average of 14.25 LPARs per frame.) Current utilization for Southbury was 7.62% busy.
32
© 2009 IBM Corporation AIX Rightsizing 32 Conclusion There are many ways to reduce cost in a datacenter. Decrease the number of servers on the floor using physical or virtual consolidation. Address Concerns: –Performance concerns – workloads competing for resources; –Growth concerns – workloads cannot reserve space for growth; and –Architectural constraints – servers run out of IO or memory before they run out of CPU. Utilize a statistical or bin-packing mass analysis methodology to coadunate LPARs to achieve higher utilization rates at the hardware level. Get those cost savings!
33
© 2009 IBM Corporation AIX Rightsizing 33 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.