ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG November 5, 2008
2 PR60006_01 Phase 2 ERCOT Update - Overview Background: SCR 745: To achieve improved Market performance and reliability through a reduction of ERCOT Retail Systems unplanned outages. This effort was believed to be achievable through the implementation of three efforts: PR60006_01 ERCOT Outage Evaluation PhI: ERCOT Outage Evaluation Phase I and Phase II Phase I, NAESB and Proxy Clustered (Delivered 02/2007-Goal Achieved) Phase II, Paperfree Clustered environment with File Server Redundancy and High Availability (Clustered achieved. Redundancy outstanding) PR60006_02 ERCOT Outage Evaluation PhIII, Database Clustering Solution (agreed to cancel due to stability received from AIX DB Transition)
3 PR60006_01 Phase 2 ERCOT Update – Outages Retail Transaction Processing Unplanned Outages by # of Incidents NAESB Seebeyond TIBCOPaperfreeSiebelTML Retail Databases Total * Based on IT Incident Report on 11/05/2008 and metrics in SCR745 posted on
4 PR60006_01 Phase 2 ERCOT Update – PF Outage Details (3yrs) Issue Date # min SLA Impacted Applicatio n ImpactedIssue DescriptionRoot Cause Service Impact Service Impact Detail 9/25/06829RetailPaperfreePaperfree File Server not respondingInfrastructureOutage Unplanned Outage 10/2/0618RetailPaperfreePaperfree File Server network outageInfrastructureOutage Unplanned Outage 1/3/07130RetailPaperfree Memory failure in the clustered environmentInfrastructureOutage Unplanned Outage 1/5/07270RetailPaperfreeProblem pulling data from NAESBInfrastructureOutage Unplanned Outage 1/8/07195RetailPaperfree Attempted to replace the Paperfree architecture as identified by the on- going Paperfree issues analysisInfrastructureOutage Unplanned Outage 2/7/0785RetailPaperfree Connectivity issue between application and SANInfrastructureOutage Unplanned Outage 3/20/08105 Retail Market Degradation Issues Post SCR745 Phase 2 solution Polyserve Applicaton/TRXN VolumesOutage Unplanned Outage 3/22/08240 Retail Market Rollback from SCR745 Phase 2 implementation Polyserve Applicaton/TRXN VolumesOutage Unplanned Outage 8/11/0853Retail ERCOT experienced an unplanned outage of the Electronic Data Interface (EDI) server. Paperfee FileserverOutage Unplanned Outage 8/26/0854Retail ERCOT experienced an unplanned outage of the Electronic Data Interface (EDI) server. Paperfree File ServerOutage Unplanned Outage
5 ERCOT Status and Recommendation: Status - ERCOT received and reviewed recommendations from HP for Performance improvements necessary for the Polyserve File Clustering Solution. These recommendations require architectural changes, server rebuilds, and testing which is difficult to deliver in 2009 due to resource and environment constraints from Nodal and Zonal projects. Accomplishments: Paperfree Blades Implementation – 2004 (provided improved scalability) Paperfree 3950 Upgrade – 2007 (Improved processing) Paperfree code changes – 2007 (improved exception processing) Increased monitoring and training for support staff (improved support) – On going AIX implementation (DB stability) Recommendation: ERCOT seeking TDTWG recommendation to close project based on: (A) Reduction in overall outages due to previous efforts above (B) ERCOT evaluating replacement of Paperfree application under Retail Application Upgrade Project due to: File based application There are more efficient products available such as: TIBCO, Inovus TLE, etc… Old application architecture Does not have optimal performance for future growth needs Development has been experiencing issues in upgrading to new version in the development environment PR60006_01 Phase 2 ERCOT Update – Next Steps
6 Risks/Impacts Other impacts from moving forward with Paperfree architectural changes at this time: Nodal – IT resource constraints Distributed Generation, Small Renewables – Code/Resources Advanced Metering, Interim Solution – Code/Resources
7 TDTWG Questions