CHECO Fall 2013 Conference ERP Storage – What Works, What Does Not Presenter: Rick Beck, Director, IT Application Services September 17, 2013
Background Previous environment Reason for change RedHat Linux Database Servers 32GB RAM EVA SAN with fiber channel at 8Gb throughput 5 total production Oracle databases (Banner, ODS/EDW, help desk, Luminis, Web CMS,) on 3 servers Reason for change Servers were 6 years old; storage 5 years old and at capacity Slowdowns during peak usage (first week of school; first day of timed registration) of 16
New Environment Virtualized with Oracle Virtual Machine (OVM) and Oracle Linux on 3 physical Servers 64GB RAM dedicated to production Banner database. Failover with DataGuard to a standby database. Not active. HP P2000 SAN (small business SAN) 24 - 3TB drives at 7200 rpm 3 solid state 3TB drives local on the server 2 Vdisks Configured with 512GB luns and raid 10
The Players Infrastructure team – storage purchase and setup Database director and DBAs Oracle contractor with experience of 55 conversions to OVM and ASM
Implementation HW purchased in June 2012 Project Kickoff with Contractors – October 2012
Staffing Issues Departures before the implementation began: the Database Director 2 of the 3 DBAs (3rd DBA left before first production go live) The person who architected the storage solution
Plan Issues Production environments were to be done first and then clone to test after Prod go lives. No consideration made for middle tier: Jobsub server, 10 application servers Plan changes added additional $119k to the project.
SAN Issues HP P2000 needed drivers to work with OVM Solid state drives could not be mixed on the SAN with other drives and were traded in.
Other Issues Firewall issues because new database servers were on different subnet than application servers Took significant time to troubleshoot problems as Firewall was always a consideration. System Admins did not receive training on OVM until just before Production Go Live
Results Instead of 5 fold increase in speed, system was sluggish, with IO intensive activities running up to 5 times slower!!
Mitigation: Analysis Consultant reviewed logs to determine if Oracle or OVM configuration issues were to blame. Changes resulted in minimal performance increases All analysis pointed to storage performance OEM and Quest Spotlight showed that system slowdowns during times of peak OLTP and batch job processing were due to I/O waits.
Mitigation: Actions Taken Cabinet had 24 more bays, so 24 additional drives purchased: 15000 rpm -600GB Configured 2 Vdisks: 4.8 TB as raid 10 – used for Banner Prod 3 TB raid 5 (needed the space) – used for warehouse database (ODS/EDW) ASM disk sizes were set to 1.99TB max as that was all that ASM could handle. Use ASM to move the database files to the faster disk. (while system was up)
Mitigation: Results I/O waits no longer a problem. Survived Fall term startup with best Banner performance in years. ODS/EDW data warehouse still has load speed issues.
Lessons Learned Need broader involvement in planning: Solid state was supposed to be local disk – people who knew had left. Middle tier issues would have surfaced earlier. Needed a vendor call to get Oracle OVM, HP and implementation contractor to discuss technical issues before project startup
Lessons Learned (Continued) Cutting corners may not end up saving money. Need to develop internal knowledge of unfamiliar technologies New DBAs did not have OVM and ASM experience and are just now getting up to speed. Oracle ASM can only handle 1.99TB virtual disk sizes.
Questions? Rick Beck beckr@msudenver.edu