1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Performance Testing - Kanwalpreet Singh.
Retail Market Subcommittee Update to TAC Kathy Scott January 28,
1 RMS Workshop Retail Systems Disaster Recovery ERCOT May 6 th, 2014.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Simplify your Job – Automatic Storage Management Angelo Session id:
RMS Update to TAC August 7, RMS Update to TAC ► At July 9 RMS Meeting:   RMS Voting Items:
Module 12: Designing High Availability in Windows Server ® 2008.
Retail Market Subcommittee Update to TAC Kathy Scott April 24,
© 2005 Mt Xia Technical Consulting Group - All Rights Reserved. HACMP – High Availability Introduction Presentation November, 2005.
ERCOT PMO Update Robert Connell Director Program Management Fourth Quarter Results (Through 12/31/04) January 11, 2005.
ERCOT MARKET EDUCATION
RMS Update to TAC May 8, RMS Update to TAC ► At April 9 RMS Meeting:  Antitrust Training  RMS Voting Items: ► NPRR097Changes to Section 8 to Incorporate.
Retail Data Transport Upgrade ERCOT Recommendation ERCOT Public June 2015.
EIDE Design Considerations 1 EIDE Design Considerations Brian Wright Portland General Electric.
March 26, 2015 Technical Advisory Committee (TAC) Update to RMS Kathy Scott April 7, 2015 TAC Update to RMS 1.
Module 13 Implementing Business Continuity. Module Overview Protecting and Recovering Content Working with Backup and Restore for Disaster Recovery Implementing.
Retail Market Subcommittee Update to COPS Kathy Scott July 16,
Texas Data Transport Work Group Review RMS Meeting May 29, 2002.
ERCOT IT Update Ken Shoquist VP, CIO Information Technology Board Meeting November 2003.
RMS Update to TAC January 8, Voting Items From RMS meeting on 12/10/2008  RMGRR069: Texas SET Retail Market Guide Clean-up – Section 7: Historical.
ERCOT PROGRESS REPORT Board of Directors Austin, Texas October 15, 2002.
June 22 and 23,  The information and/or flow processes contained in this Power Point presentation: ◦ Were created to allow interested parties to.
1 TDTWG Update to RMS Wednesday November 7, 2007.
1 TDTWG Update to RMS Wednesday February 14 th, 2007.
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 TDTWG April 2, 2008.
Texas Test Plan Team Market Testing Update to RMS October 16, 2002.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG May 7, 2008.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Retail Transaction Processing Year End Review and Recent Issues RMS January 2007.
August 28, 2014 Technical Advisory Committee (TAC) Update to RMS Kathy Scott September 9, 2014 TAC Update to RMS 1.
High Availability in DB2 Nishant Sinha
1 TDTWG Scope and Goals 2015 Wednesday January 8, 2014.
ERCOT MARKET EDUCATION Retail 101. Retail Transaction Processing.
Unit-Specific Bid Limits based on Modified Generic Cost.
1 TDTWG Update to RMS TDTWG Thursday, March TDTWG TDTWG has continued work necessary to further support of the NAESB EDM V1.6 Project Work primarily.
Information Technology Report Trey Felton Manager, IT Service Delivery October 2011 ERCOT Public.
TDTWG Update to RMS Wednesday January 14. TDTWG Update to RMS Scope Texas Data Transport Working Group (TDTWG) is responsible for creating and maintaining.
1 TDTMS Update to RMS November 3, Leadership Affirmation (RMS Voting Item) Due to creation of a new Working Group, TDTMS conducted leadership elections.
February 10, 2010 RMS ERCOT 1/24/10 Production Issue Overview and Lessons Learned Karen Farley Manager, Retail Customer Choice.
1 TDTWG Accomplishments 2010 Friday January 28, 2011.
1 Market Operations Presentation Board of Director’s Meeting January 17, 2006.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS February 2, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
1 TDTWG Update to RMS Tuesday March 3, Primary Activities 1.ERCOT System Outages and Failures 2.MarkeTrak Performance 3.Discussed 4 th QTR Performance.
1 TDTWG Report to RMS Recommended Solutions for SCR 745 ERCOT Unplanned System Outages and Failures Wednesday, August 10th.
Retail Market Subcommittee Update to COPS Kathy Scott November 5,
RMS Update to TAC November 1, RMS Activity Summary RMGRR057, Competitive Metering Working Group Name Change (VOTE) Update on RMS Working Group and.
Role of Account Management at ERCOT 2006 TAC Subcommittee Review ERCOT Board February 21, 2006.
RMGRR 042 – Mass Transition Process Necessary for PUCT Rule Review of ERCOT Comments Retail Market Subcommittee October 11, 2006 Adam Martinez Mgr,
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS March 1, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
1 TDTWG Update to RMS Wednesday May 6, Primary Activities 1.Reviewed ERCOT System Outages and Failures 2.Reviewed Service Availability 3.Reviewed.
1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG November 5, 2008.
Lead from the front Texas Nodal 1 TDWG Nodal Update – June 6, Texas Nodal Market Implementation Server.
February 26, 2015 Technical Advisory Committee (TAC) Update to RMS Kathy Scott March 3, 2015 TAC Update to RMS 1.
Project Update and Summary of Project Priority List (PPL) Activity
Emergency Database Failover: Impacts & Recovery Plan
Scaling Network Load Balancing Clusters
High Availability 24 hours a day, 7 days a week, 365 days a year…
High Availability Linux (HA Linux)
Network Load Balancing
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2
Maximum Availability Architecture Enterprise Technology Centre.
Chapter 15: Networking Services Design Optimization
Storage Virtualization
SpiraTest/Plan/Team Deployment Considerations
Presentation transcript:

1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th

2 Motion SCR745 includes: (1.) a system evaluation and (2.) a recommended solution based on a review of the evaluation. SCR745 will be sent to the TAC and Board for consideration and possible approval.

3 SCR 745 Analysis Approach SCR 745 requested ERCOT to perform in depth analysis in order to determine root causes for unplanned system outages. ERCOT in depth analysis indicates the current architecture supporting the Retail Market contains multiple single points of failure. While it is not possible to totally eliminate any possibility for an ERCOTsystem outage, it is possible to implement solutions that drastically reduce unplanned system outages for ERCOT by removing these single points of failure. This presentation includes the solutions identified.

4 Retail Systems NAESB PaperFree TCH-EAI (Transaction Clearing House) All Retail (Database Server) Market Participant

5 The following options are being presented to assist RMS in reviewing and eventually approving the best solutions for resolving unplanned ERCOT system outages. The 4 options include: 1 of 2 options for NAESB Proxy Server improvements 1 of 3 options for NAESB Application (dependent on NAESB Proxy Server option) 1 of 2 options for PaperFree improvements 1 of 3 options for Database Server for All Retail System Options Included

6 Current NAESB Architecture The Retail Transaction communication system using the North American Energy Standard Board Electronic Delivery Mechanism (NAESB EDM) V 1.6. This system is an internet based protocol. The current NAESB architecture includes 2 NAESB Proxy servers in Taylor and 2 NAESB Proxy servers in Austin (to be used for disaster recovery only). Due to the large quantity of data and critical timing for that data, the current NAESB architecture is insufficient for supporting the Texas Retail Market.

7 NAESB Proxy Server Options Option 1 – Fully Clustered* V880 Solution – 4 V880 NAESB Proxy Servers Summary – Maximum reliability solution. This option will provide a fully clustered and fault tolerant solution; opportunity to consolidate the current 18 production proxy servers including the servers identified in Option 2 This option virtually eliminates the potential for NAESB proxy outages, unplanned or planned. This option will provide 99.99% availability for the NAESB proxy servers. *Cluster: A group of servers that are typically on different physical machines and have the same applications configured within them, but operate as a single logical server.

8 NAESB Proxy Server Options Option 2 – 4 V120 NAESB Proxy Servers. Summary – Minimum reliability solution. This option will provide redundancy to address the single point of failure. Two servers will be located in Taylor and two servers will be located in Austin. This will not be a clustered solution it will be a load balance solution. V120 servers cannot cluster. This solution will reduce the frequency and duration of proxy outages, is not as costly as option 1 but is also not as a robust solution as Option 1.

9 NAESB Application Options Option 3 - Separate Application Server Cluster This option moves peripheral NAESB processes (data encryption, decryption) to the PaperFree cluster and separates inbound and outbound transmissions to disconnected clusters.

10 NAESB Application Options Option 4 Hybrid Application Cluster This option creates an application cluster for inbound transactions and moves outbound transaction processing to the PaperFree system in order to utilize PaperFree’s load balancing and high availability capabilities.

11 NAESB Application Options Option 5 – Combined Application Cluster This option combines inbound and outbound transaction processing into a single application cluster.

12 Summary of NAESB Application Cost Option 1 V880 Server Cluster$370,000 Option 2 V120 Server Redundancy $97,000 Option 3 Separate Application Server Cluster$175,000 Option 4 Hybrid Application Cluster$165,000 Option 5 Combined Application Cluster$235,000 Must choose one selection of Option 1 or Option 2 and one selection of Option 3, Option 4 or Option 5. An additional cost of $66,105 identified for Training, Business Process and Monitoring. Blue highlighting identifies recommended solution

13 PaperFree Paper Free includes the data validation and transformation system. The current architecture contains a single disk share for multiple load balanced application servers. This disk is the single point of failure for this system.

14 PaperFree Options Option 1 – Clustered File System Server solution This option represents the maximum availability solution.

15 PaperFree Options Option 2 – Local File System Solution –This option supports the load balancing applications –The system will still be active with a single sever failure; however server interruptions may result in delays in processing persistent data for the server experiencing an interruption.

16 Summary of PaperFree Costs Option 1 – Clustered File System Server solution –$75,000 Option 2 – Local File System Solution –$105,000 Blue highlighting identifies recommended solution

17 All Retail System

18 All Retail System The All Retail System is the database server which houses each system’s database ( NAESB, PaperFree, Siebel and TCH-EAI). This Database server is a single point of failure for multiple Retail Systems. All Retail System Goal: Provide high availability for all databases that support the Retail Applications including; NAESB, PaperFree, Siebel, TCH-EAI. This will allow processing of data to continue in the event of a database server failure.

19 Database Server High Availability Options Option 1 - All HP-UX Oracle Real Application Cluster (RAC) Option 2 - All Linux Oracle Real Application Cluster (RAC) For options 1 and 2: Provides active redundancy for database connectivity for all retail databases Complex to implement Removes single point of failure at the database server level

20 Database Server High Availability Options Option 3: –NAESB Linux Oracle RAC and Different Standby/cluster solution for the rest of the Retail databases Provides active redundancy for database connectivity for NAESB database Less complex to implement as NAESB database is small and easier to migrate Provides option to migrate PaperFree and Siebel to migrate into this RAC Removes single point of failure at the database server level –Veritas cluster, or Oracle Standby or Oracle RAC for other databases on HP-UX or Linux for appropriate availability requirements. Phased implementation NAESB first and other databases next Removes single point of failure at the database server level

21 Database Server High Availability Options Summary –All three options provide highest availability architecture for NAESB database. –Option 1 and 2 provide highest availability architecture for all databases, however, they are most expensive and complex to implement and manage. –Option 3 provides highest availability option for the NAESB database and will provide appropriate high availability solutions for the rest of the retail databases in subsequent phases. Easier to implement in phased manner addressing acute availability needs first.

22 Summary of Database Server High Availability Costs Cost –Options 1&2 Oracle RAC Hardware – $450,000 Cluster SW – $400,000 Oracle RAC SW - $400,000 Cluster Ext Service - $100,000 Oracle RAC Ext Service - $100,000 Internal project cost (FTE) - $180,000 Total: $1,630,000 –Option 3 Partial Oracle RAC + Alternate Solution for remaining Hardware – $400,000 - $600,000 Cluster SW –$100,000 - $400,000 Oracle RAC SW - $0-$400,000 Cluster Ext Service –$0-$120,000 Oracle RAC Ext Service - $120,000 - $180,000 Internal project cost (FTE) - $120,000 - $180,000 Total: $890,000 - $ 1,650,000

23 Next Steps Today if recommended by RMS, TDTWG will facilitate a technical workshop to be held before the next RMS meeting. This workshop is intended to help RMS members and interested Market Participants review the in depth system evaluation in order to select recommended solution(s) for approval at the August RMS meeting.

24 Questions