ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 TDTWG April 2, 2008.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

IBM SMB Software Group ® ibm.com/software/smb Maintain Hardware Platform Health An IT Services Management Infrastructure Solution.
Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
Emergency Database Failover: Impacts & Recovery Plan
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
1 Load Research Sampling (LRS) Project Summary 12/16/2004.
RMS Update to TAC August 7, RMS Update to TAC ► At July 9 RMS Meeting:   RMS Voting Items:
1 TDTWG Update to RMS Wednesday March 11, Primary Activities 1.Reviewed ERCOT System Outages and failures 2.ERCOT update of browser support for.
Retail Market Subcommittee Update to TAC Kathy Scott April 24,
Objectives: Upgrade Siebel to a supported application Upgrade Oracle database to current version Deliver all existing user functionality with no degradation.
1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.
ERCOT PMO Update Robert Connell Director Program Management Fourth Quarter Results (Through 12/31/04) January 11, 2005.
RO Project Priority List Update EDW Projects Update RMS Meeting Adam Martinez Mgr, Market Ops Divisional Projects Organization ERCOT April 12, 2006.
Information Technology Report Dave Pagliai Manager, IT Support Services February 2015 ERCOT Public.
RMS Update to TAC January 3, Goals Update ► Complete and improve SCR745, Retail Market Outage Evaluation & Resolution, implementation and reporting.
RMS Update to TAC May 8, RMS Update to TAC ► At April 9 RMS Meeting:  Antitrust Training  RMS Voting Items: ► NPRR097Changes to Section 8 to Incorporate.
IO – CART Project Status Protocol Revision Subcommittee Update 06/22/06.
Data Extracts & Reporting Recent Issues ERCOT Information Technology Data Extracts Working Group 11/27/07.
Market Impact Assessment TF Final Report to RMS June 11, 2008.
Information Technology Service Availability Metrics March 2008.
ERCOT IT Update Ken Shoquist VP, CIO Information Technology Board Meeting November 2003.
RMS Update to TAC January 8, Voting Items From RMS meeting on 12/10/2008  RMGRR069: Texas SET Retail Market Guide Clean-up – Section 7: Historical.
1 TDTWG Update to RMS Wednesday November 7, 2007.
Objectives: Upgrade Siebel to a supported application Upgrade to Oracle 9i database Deliver all existing user functionality with no degradation in performance.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG May 7, 2008.
Retail Business Processes PR 50121_07 Project Update Retail Market Subcommittee September 13, 2006 Adam Martinez Mgr, Market Operations DPO.
Information Technology Service Availability Metrics Trey Felton IT Account Manager COPS/RMS September 2009.
PR50121_07 Retail Business Processes (RBP) Project Update Retail Market Subcommittee November 8, 2006 Adam D. Martinez Mgr, Market Operations DPO.
PMO Update to PRS Troy Anderson ERCOT Program Management Office December 17, 2009.
Retail Transaction Processing Year End Review and Recent Issues RMS January 2007.
1 Business Integration Update to PRS January 17, 2013.
1 TDTWG Scope and Goals 2015 Wednesday January 8, 2014.
Objectives: Develop a solution to either enhance or replace the FasTrak tool Scope/Why is this important?: Increase the transparency for issues that are.
COPS – ERCOT PROJECTS UPDATE Retail & Services Platform Development Karen Farley March 22, 2005.
1 New RO Projects Hope Parrish June NEW RO Projects for 2008 Requested by ERCOT - Overview Objective Objective of the following information is to.
Information Technology Report Trey Felton Manager, IT Service Delivery October 2011 ERCOT Public.
PMO Update to PRS Troy Anderson ERCOT Program Management Office January 21, 2010.
June 2010 COPS/RMS Information Technology Report Trey Felton Manager, IT Administration.
May 9 th, 2007 Retail Market Subcommittee Meeting PR50121_07 Retail Business Processes (RBP) Project Update A Sub-project of the Service Oriented Architecture.
February 10, 2010 RMS ERCOT 1/24/10 Production Issue Overview and Lessons Learned Karen Farley Manager, Retail Customer Choice.
Objectives: Upgrade Siebel to a supported application Upgrade Oracle database to current version Deliver all existing user functionality with no degradation.
1 TDTWG Accomplishments 2010 Friday January 28, 2011.
ERCOT Service Availability Metrics and Retail Systems Update April 2007.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS February 2, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
1 TDTWG Update to RMS Tuesday March 3, Primary Activities 1.ERCOT System Outages and Failures 2.MarkeTrak Performance 3.Discussed 4 th QTR Performance.
9/13/2006 RMS Duplicate Retail Transactions. RMS9/13/2006 Background Duplicate Retail Transactions Types of duplicate transactions: –PaperFree duplicate.
1 TDTWG Report to RMS Recommended Solutions for SCR 745 ERCOT Unplanned System Outages and Failures Wednesday, August 10th.
Information Technology Service Availability Metrics March 2008.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
Market Coordination Team Update Retail Market Subcommittee April 11, 2007 Susan Munson Retail Market Liaison.
RMS Update to TAC November 1, RMS Activity Summary RMGRR057, Competitive Metering Working Group Name Change (VOTE) Update on RMS Working Group and.
Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
EIS Projects Update COPS Meeting Adam D. Martinez Mgr, Market Ops Divisional Projects Organization ERCOT April 28, 2006.
1 Yearly Project Prioritization Process Overview and New RO Projects Troy Anderson et. al. June 2007.
April 2010 COPS/RMS Information Technology Service Availability Metrics Trey Felton Manager, IT Administration.
2006 Production Implementations Opportunities for MP testing prior to release COPS 4/28/2006.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS March 1, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
1 TDTWG Update to RMS Wednesday May 6, Primary Activities 1.Reviewed ERCOT System Outages and Failures 2.Reviewed Service Availability 3.Reviewed.
MMWG Performance Measures Questionnaire. Performance Measure Reporting Requirements The reporting requirements allowed the commission to obtain information.
1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10.
TDTWG UPDATE TO RMS 1 Tuesday April 1, Reviewed ERCOT System Outages and failures ERCOT presented the monthly Incident Report Planned/Unplanned.
August 17, 2006 Data Extract Working Group EDW Project Updates 2006 / 2007 MO & RO PPL.
Information Technology Report Dave Pagliai Manager, IT Support Services February 2016 ERCOT Public.
RO Projects Financial Overview Retail Market Subcommittee May 09, 2007 Adam Martinez Market Operations Division Projects Organization.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG November 5, 2008.
Lead from the front Texas Nodal 1 TDWG Nodal Update – June 6, Texas Nodal Market Implementation Server.
MODPO Project Update Overview of December Implementations & EDW Changes Commercial Operations Subcommittee December 11, 2006.
July 2008 RO Projects Financial Overview Retail Market Subcommittee August 13, 2008 Hope Parrish Market Operations Division Projects Organization.
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2
Maximum Availability Architecture Enterprise Technology Centre.
Presentation transcript:

ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 TDTWG April 2, 2008

2 PR60006_01 ERCOT Update Background: SCR 745: To achieve improved Market performance and reliability through a reduction of ERCOT Retail Systems unplanned outages. This effort was planned to be implemented in two subprojects; PR60006_01: ERCOT Outage Evaluation Phase I and Phase II Phase I, NAESB and Proxy Clustered (Delivered 02/2007) Phase II, Paperfree Clustered environment with File Server Redundancy PR60006_02: Phase III, Database Clustered environment (below PPL cut line for 2008) Phase II Status: 02/27/2008 – Integration, Performance/Volume and Failover Testing 03/08/2009 – Production Implementation 03/22/2008 – Rollback to previous Paperfree Infrastructure due to Performance Issues

3 Testing Results: 11 High Availability / Fault tolerance tests - completed. Steady transaction flow volume test – completed. –1 related open defect; to be addressed in future release(s). Description: Node Fencing on s hutdown from RSA results in application failure. This type of event believed low probability and would indicate catastrophe event. ERCOT recommendation to Go- Live. Despite open defect with PolyServe software, the advantages provided would include –Local E and G drives (Removes Application SMB protocol issues) –Maintenance capabilities without affecting all nodes in cluster –High Availability / Fault Tolerance –Hardware Performance and Reliability PR60006_01 ERCOT Update - Continued

4 DateDescriptionResolutionRoot Cause 03/12/2008Retail Application OutageRestart processes in orderHuman Error (See SLA Update) 03/12/ files not loading into L*Permissions were grantedPermissions issue (See SLA Update) 03/19/2008Hard Crash of Polyserve Cluster due to SAN Switch Failure Moved Polyserve cluster to different switch SAN Switch Failure caused Node Fencing: If polyserve loses connectivity to SAN, the cluster will lock. HP Ticket logged 12/11/2007 (see slide 3). 03/12/2008 – 03/22/2008 Performance degradation1.03/19/2008 Implemented SIR to add additional transaction processing enhancements. 2.03/22/20008 Rollback to old infrastructure until performance tuning recommendations from HP can be implemented / tested Unknown

5 PR60006_01 ERCOT Update - Next Steps 1.Complete. Roll iTEST back to old infrastructure of Paperfree Fan Out (Blades). Required to mitigate impact to PR60008: Ts&Cs and PUCT Performance Measures 2.TDTWG Meeting to discuss issues – 04/02/ Complete. Analyze performance tuning options provided by HP for feasibility. 4.In Progress. Replan Effort for Execution Schedule (Test & Implementation) Things to take consider: PaperFree Availability Metrics Prior to March 2008 as a result of 2007 Intermediate Resolutions Previous Logged incident for PaperFree file server – 02/ /2008 – 100% availability (meeting SCR Goal) Intermediate Resolutions Code Changes –File Management (Copy / Move / Delete) Retry –Re-Map drives before processing vs. application startup Hardware Replacement –Implementation of 3950 (4-Way) server for file server Increased Training Increased Monitoring

6 PR60006_02: Phase III, Database Clustered environment Recommendation from ERCOT to Cancel this project – Resolved with AIX deployment Last Incident logged – 01/05/ /2008 – 100% Availability PR60006_02 ERCOT Update