What’s New in OpenEdge Replication?

What’s New in OpenEdge Replication?
June 5, 2017 Jeff Owen, Dapeng Wu, Safina Khanem, Jim Clark Progress

Agenda PICA Queue Monitoring Agent Restart Replication Sets
Replication Monitor, VSTs, DSRUTIL, builddb Utility We’ll cover changes in promon that improve monitoring of the PICA queue Then will talk about the ability to restart an agent on a target replica Next, we will talk about the significant new functionality that has been added in And we’ll finish with improvements to the replication monitork, VST, and utilities Progress

10.2B08 and 11.3.0: Improved PICA Queue Monitoring

10.2B08: PICA Queue Monitoring Improvements
Display Db Service Manager Screen 04/27/17 Status: Database Service Manager Communication Area Size : KB Total Message Entries : Free Message Entries : Used Message Entries : 0 Used HighWater Mark : 1 Area Filled Count : 0 Service Latch Holder : -1 Access Count : 5304 Access Collisions : 0 Useful for sizing –pica Is PICA Queue filling often? The Db Service Manager screen was enhance in 10.B08. Added used high water mark, area filled count, access count, and access collisions. The used highwater mark can be useful for understanding PICA queue sizing requirements - Is it set higher than necessary? Area Filled count let’s us know who often the queue filled to capacity. Has it happened? More than once? Access collisions gives us insight into the level of contention. - Is the queue a source of contention. Is there contention for the queue? Progress

11.6.0: Restart Agent

What is Restart Agent? The ability to restart the Replication Agent without restarting the target database Command - dsrutil <targetdbname> –C restart agent Helpful in a scenario where replication is stopped or agent has been terminated and needs to be restarted without shutting down the target database and having to restart the database to allow the agent to connect with the Replication server. The ‘dsrutil –C restart agent’ is used to allow the Replication Agent to restart without restarting the target database. This would be helpful in a scenario where replication is stopped or agent has been terminated and needs to be restarted without shutting down the target database and disconnecting all the clients and having to restart the database to allow the agent to connect with the server. Requirements – Target database needs to be up and running. For multiple agents, this command needs to be run for each target database. When an agent is brought back up, the Replication Server would attempt connecting to the agent and if the server does not automatically connect with the agent, we can use the ‘dsrutil –C connectagent <agentname>’ command to initiate connection with the restarted agent. Progress

Restart Agent - Use Cases
1 Improvements to disaster recovery environment Make changes to properties files 2 Agent terminates due to timeout while applying AI Files 3 Improve disaster recovery environment – If the transition properties of the agent needs to be changed; the agent can be terminated and after making the change in the properties file we can restart the agent. Make changes to properties files – Connect timeout, transiton-timeout, minport, maxport. Agent terminates due to timeout while applying AI Files (When we are preparing for a manual transition) Trigger transition on agent Apply AI extent Wait till AI extents are copied from source machine (or back up is restored) Agent times out and hence exits Restart agent Continue to apply AI extents Progress

11.7.0: Synchronous Commit Mode For Two Targets

Two target support for Synchronous Commit Mode
Source Replica AI Block Commit! Although we support Synchronous commit for 2 target databases, we still recommend the usage of Asynchronous mode because of the increased performance as compared to 2 target Synchronous commit. Target Replica One Target Replica Two Progress

11.7.0: Replication Sets

Replication Set: A replication environment that contains a source replica and two target replicas that can transition together There are several characteristics of Replication Sets: AI enabled and running all replicas including the targets A property to enable Replication Set functionality The ability for agents to communicate with each other A property to identify the priority for target replicas to transition Progress

Replication Set Inter-agent Communication
Let’s start with Inter-agent communication Progress

Inter-agent Communication
Allows the targets to transition together if the source database is gone. One becomes new source, the other remains a target and replication can continue. Targets communicate their location relative to source. Used to determine if AI extents can be unlocked. If one target is behind, the healthier target may need to keep a number of extents locked to synchronize with the less healthy target. Allows a health check at the beginning of a recovery transition to determine coordinator. Inter-agent communication provides the important ability for agents to transition together The data communicated between targets includes how current each target is and is used as part of managing AI extents. The agents also share and compare data to determine which target is the healthiest (most up to date) Progress

Inter-agent Communication
Specify replication-set=1 in the [transition] section of the properties file [transition] replication-set=1 database-role=reverse transition-to-agents=agent1,agent2 restart-after-transition=1 source-startup-arguments=-S pi replserv When the agents receive the inter-agent communication message from the server, they will initiate their connections and send a response to the server. replication-set=1 must be set in properties file for each DB. It is the property the enables Replication Sets ! And in turn enable inter-agent communication. At Replication Server startup, the server will connect to both agents and then tell the agents to connect to each other. The inter-agent communication is established before the server synchronizes with the agents. Each agent should have the other agent specified in its properties file and for , the remote agent should be listed in the control-agents property for the target. Progress

What if… Server goes offline? Agent goes offline?
Targets remain connected if they remain running When server is restarted, it will re-initiate the network. Agent goes offline? Replication between source and other agent continues. When the agent is restarted and server told to connect to it, the network will be re-initiated. Agents are started without Server? Agents will depend on their properties files to connect to each other. If the server goes offline, the agents will either shutdown or go into pre-transition depending on the agent-shutdown-action specified in their properties files. If the agents remain running, they will remain connected during pre-transition. If an agent goes offline due to a clean shutdown, the agent can be restarted and the dsrutil –C connectagent command can be used on the server. The server will perform recovery for the agent and re-initiate the network during recovery. Note that connectagent command is not allowed if the server lost contact and failed recovery for that agent. In that case, terminating and restarting the server is the better approach. If the agents were never contacted by the server, they will try to connect to each other using the information in their control-agent sections. For , this must be specified. Progress

Inter-agent Communication Visibility
Messages in database log file: Source: (18458) Attempting to establish inter-agent network connection on agent <agent name>. Target: (18476) The inter-agent network on agent <agent name> is complete. Monitor and VST enhancements to track the status on target database. New _Repl-InterAgentActivity VST Replication inter-agent status screen in dsrutil –C monitor New status on Replication Server for initiating and recovering inter-agent connections. Short-lived status, unlikely to be seen unless there is a problem with an agent responding. Examples to follow. Progress

AI Extents Remaining Locked for a Long Time on a Target
Check that the agents are connected to each other: Through monitor or VST. Verify the local agent’s information for remote agent’s last acknowledged TX Sequence. Inter-agent status screen in monitor. _ReplIntAgtAct-RemoteLastBlockSeq and _ReplIntAgtAct-RemoteSourceTRID Check if an agent is falling significantly behind the source. If you think you have a problem with AI extents remaining locked on a target, there are a few ways to debug the problem. First, make sure you know which source AI Sequence and source TXN Sequence the local agent is on. The local agent is the agent with the extents remaining locked for long time. Check that the agents are connected to each other. This can be checked through the monitor or VST on the local target through the inter-agent status screen and also through the new VST. If they are not connected, the extents will remain locked. On the local agent, check where it thinks the remote agent is. This can be checked through the inter-agent status screen in monitor and also through the new VST. Check if the agent is falling behind the source. AI Extents remaining locked on a target is necessary for recovery transition to work. Recovery transition cannot work if the agents cannot synchronize. However the extents will be unlocked on the local target when the agent decides it no longer needs them. Progress

Inter-agent Status in Monitor
OpenEdge Replication Monitor Page 1 Database: /testdb/target2 Database is enabled as OpenEdge Replication: Target Local Agent: agent2 ID: 2 Host name: Port: 4388 Server latency at last message: 0 second(s) Synchronization points: 1 Remote Agent: agent1 ID: 1 Host name: localhost Port: 4387 Database name: /testdb/target1 Connection state: Connected Last message sent at: Wed May 3 15:42: Last message received at: Wed May 3 15:42: Last ping sent at: Wed May 3 15:42: Last ping received at: Wed May 3 15:42: Messages sent: 266 Messages received: 266 Last acknowledged Source AI sequence: 2 Last acknowledged Transaction: 22 This is a snapshot of the new screen in the monitor. This screen is information regarding the local agent followed by the remote agent. The section for the remote agent contains some identifying information about the remote agent, as well as information regarding the connection. The highlighted area at the bottom of the screen is information regarding where the local agent, agent2, believes the remote agent, agent1, is in its processing of AI blocks from the source. This information can be useful for determining why AI extents are locked on this target. Progress

Fail Over Transition

Fail Over Transition: A transition where a source replica and a target replica switch roles
The Transition Failover command causes two replica databases to “switch roles” Useful for maintenance operations Both database replicas must be online When completed: The original target becomes the new replication source The original source becomes a replication target Source and Target(s) restart Normal replication processing resumes, the replication set is maintained Progress

Initiating Failing Over
dsrutil <source_db_name> -C transition failover So how is the target database specified? transition-to-agents=agent1,agent2 Note: This is the opposite of the transition command! So how is the target database specified? The transition-to-agents property The transition-to-agents property contains a list of agent names The command will transition to the first agent specified in the list. If the first agent cannot be contacted, it will transition to the second agent in the list Progress

Fail Over Transition Configuration
Target One Source Target Two Target One Source First, inter-agent communication is paused Progress

When is Transition Failover Useful?
Maintenance is required for the server hosting the source replica The server hosting the source replica is being replaced The server hosting the source replica is being moved For hardware and software upgrades New server replaces existing server Changes to disaster recovery plan required machine moves Progress

Best Practices for Fail Over Transition
All replica properties files must have these properties set: transition=manual role=reverse agent-shutdown-action=recovery replication-set=1 transition-to-agents=agent1,agent2

Best Practices for Fail Over Transition
All replica properties files should define a [control-agents.xxx] section(s) Define the same transition-to-agents property in each replica: transition-to-agents=agent1,agent2 All replica properties files should define an [agent] section Define control agents in case the database runs as a replication source Transition-to-agents defines the priority order of the targets that will fail over. This examples identifies replica for agent1 has priority to transition to a source replica If the agent1 is not available, agent2 has priority to transition to a source replica And agent section is recommended for source replicas that may transition to a target Progress

Best Practices to Revert Configuration
Edit the current source replica’s properties file, identifying new target replica: transition-to-agents=agent0 Terminate and restart the replication server running on replica 1 Trigger role switch for the replicas: dsrutil targetdb1 –C transition failover After transition completes the original configuration is restored: The newest target replica will have reverted back to is original role as a source The new source replica will have reverted back to is original role as a target Edit previous source replica’s properties file, restoring the original values: transition-to-agents=agent1,agent2 Terminate and restart agent1 on replica 1 Progress

Replication Set: Recovery Transition

Recovery Transition: The transition of a database from the role of a target replica to a source replica after the failure of the original source replica

Inter-Agent Connection
Recovery Transition Database role changes after a failure in the source database Target can transition to a new source, or a normal database Possible failures: Source database crashes Site failures Storage failures Network failures Target One Target Two Inter-Agent Connection Source Recovery transition is used to handle unexpected events such like failures from the source database. In case of replication set, when recovery transition happens, one target will become the new source, and the other target will become the target of the new source. Progress

11.7.0: Two Ways to Configure Recovery Transition
Manual Transition Transition manually with “dsrutil –C transition” command Source is lost, but AI data still available Apply AI extents before transition Recover as much data as possible Targets can be restarted after failures, then begin transition [Server] properties: transition=manual Auto Transition Transition happens automatically after “transition-timeout” Transition as soon as possible (HA) transition=auto In manual mode, transition will not occur until it’s manually initiated by running a transition command. This mode allows DBAs more controls over the transition process, for example, to apply AI extents before transition. Targets can be restarted after failures, then begin transition: For example, the Source site is completely lost, but targets only had power outages and databases are still intact. In this case, targets can be restarted and transitioned. Auto transition happens automatically after certain transition-timeout value has expired. Auto transition should be used for HA. When a source failure is detected, transition will start automatically after certain time (transition-timeout). Auto Transition may have data loss (more likely than manual transition) since it may not have the chance to apply AI extents. Manual Transition may also have data loss, depending on how much AI data available at the time the crash happens. Zero data loss is possible, with serious performance impact. Must have –Mf 0 and synchronous mode for at least one target. Progress

11.7.0: Which Agent will transition?
Priority of the agents to transition is identified in the properties file. The transition-to-agents property specifies the priority: [transition] replication-set=1 database-role=reverse transition-to-agents=agent1,agent2 restart-after-transition=1 source-startup-arguments=-S pi replserv Two issues related to replication set: Which agent will transition to be the new source? How to handle inter-agent latency? Agent2 will start transition after some time if it hasn’t received any transition information from Agent1. Agent1 has higher priority over Agent2 --- Agent1 is the priority agent, and Target1 is the priority target. Progress

Introducing the Transition Coordinator
A healthier Target that serves as temporary Source to bring remaining targets up to date Doesn’t have to be the same as the priority target Coordination Phase: A phase in recovery transition process Perform health check on the targets at the beginning of the transition Decide which target will become the coordinator Target databases will temporarily switch roles Synchronize with the less healthy target Another problem related to two targets configuration: The two targets are very likely to be out of sync when the source crashes. How does the recovery transition handle the difference between the targets? Two major tasks for the coordinator: Health check Sync with less healthy target Health check happens before transition is about to begin. Must be 100% accurate to avoid any replication failures that might happen late Any details on how the health check works? Progress

Transition Coordinator: Synchronization with less healthy target
Source Target 1 Target 2 Transaction #1 Transaction #2 Source crash! After coordination, any committed transactions will be synchronized on both databases; any uncommitted transactions will be rolled back on both databases. Health Check and Synchronization Transaction #2 Progress

How Does the Coordinator Work?
Which target will be the coordinator if the source is lost? Which AI extents will the coordinator use to bring the other target up to date? Which AI extents on a target can be unlocked and recycled? Will give some tech details about the coordinator, especially how it synchronize with the less healthy target. After next slide, we’ll expect to answer these three questions. Explain the role of AI extents during normal replication and coordination phase. Progress

11.7.0: From where to synchronize with the less healthy target?
TxnSeq AI Seq 100 1 200 2 300 3 50 AI extent (.a3) AI extent (.a2) AI extent (.a1) Target 1 Source AI extent (.a3) AI extent (.a2) AI extent (.a1) TxnSeq Exchange AI extent (.a3) AI extent (.a2) AI extent (.a1) Target 2 Source Txn Seq/AI Seq Target 1 Local/Remote TxnSeq Target 2 100/1 100/50 50/100 200/2 200/100 100/200 300/3 300/200 200/300 Three questions: Which target will be the coordinator if the source is lost? --- Target 1 Which AI extents will the coordinator use to bring the other target up to date? --Target1.a2, a3 Which AI extents on Target 1 can be unlocked and recycled? a1 IA Communication during normal replication process exchanges TxnSeq: last transaction sequence started/applied in a database. TxnSeq exchange happens during sync points (Source or targets AI switch) AI Seq: the sequence number of AI extents. Will get a new AI seq whenever an AI switch occurs. Internally we maintain a map of TxnSeq/AI Seq on both source and targets. We can find one seq from the other one. For example, 100/1 on Source means txn seq of 100 was started and located in AI seq of 1 on the source. Same map on both targets. The table below shows the information that has been collected during normal replication process. For example, the third row: Txn 300 has been started on source, and it’s located in source AI seq 3. At the same time, Target 1 has applied txn seq 300; and Target 2 has applied txn seq 200. So Target 1 is ahead of Target2, and thus the coordinator. Target 1 can refer to its own map (upper–right conner) to locate which AI seq has txn seq 200, which is where the Target 2 was at the time of source crash. It maps to AI seq 2. Which means it should start from AI seq 2 to synchronize with Target2. After AI seq 2 is completed, it will then move on to AI seq 3 till Target 2 is fully in sync with Target1. AI seq 1 on Target1 is not in use – not required to stay in locked state because in case of recovery transition at this time, only AI seq 2 and 3 will be used. Explain AI can be locked due to the latency between the targets. Same behavior as in the AI extents on the source due to the latency between the source and targets. Which target will be the coordinator if the source is lost? Which AI extents will the coordinator use to bring the other target up to date? Which AI extents on Target 1 can be unlocked and recycled? Progress

11.7.0: Recovery Transition step by step
Source Target 1 Target 2 Source crash! Enter Pre-Transition Transition Starts Determine the Coordinator Target1 is the Coordinator OR Target2 is the Coordinator Coordinator starts up and synchronizes with the other Target To restore the original one source, two targets configuration, a rebase will be required. Database role changes The Source starts up and synchronizes with the new Target Time New Source New Target Progress

11.7.0: Best Practices For Configuring Recovery Transition
All databases should have identical transition-to-agents properties Consider a larger connection-timeout value for the targets Allow for time to apply AI extents to the priority target, if desired Consider the time required to reseed a new target when: Setting the control-agents properties control-agents=agent2 for priority target (agent1) Setting the connection-timeout properties Manage AI with AI Archiver on Source and Targets Recovery Backup Configure “control-agents” property control-agents=agent2 for priority target (agent1) Configure “database-role” property database-role=reverse on priority target database-role=reverse on secondary target to keep replication enabled database-role=normal on secondary target to have replication disabled Manage AI with AI Archiver On all databases (Source and Targets) Additional protection on AI data Unlock and recycle AI extents Recovery Backup pros: additional protection from a failure from transition. cons: slow down the transition process, especially for large dbs. Progress

11.6.0: Replication Virtual System Tables: Enhancements

What’s New With VSTs 1 2 3 Three VST tables expanded
Four new VST tables 2 Static vs. Dynamic distinction in tables 3 Expanded VSTs now include all information found in the replication monitor not previously found in the replication VSTs. Two new VSTs for the information found in the Database Service Manager / Database Service Manager Objects screen in Promon. Two new VSTs for the dynamic (volatile) information found in the replication agent and replication remote agent screens in the replication monitor. All static (unchanging) information in the replication monitor was added to the proper currently existing VSTs. All dynamic (volatile) information in the replication monitor was added to the new replication VSTs. Progress

Expanded VSTs 1 2 3 _Repl-Server _Repl-Agent _Repl-AgentControl
_Repl-Server : Expanded to include all information from the replication monitor server screen not previously found in the VST. _Repl-Agent: Expanded to include all information from the replication monitor agent screen not previously found in the VST. _Repl-AgentControl: Expanded to include all information from the replication monitor remote agent screen not previously found in the VST. Progress

_DbServiceManagerObjects _Repl-AgentControlActivity
New VSTs 1 _DbServiceManager _DbServiceManagerObjects 2 _Repl-AgentActivity 3 _DbServiceManager : New VST containing all information found in the DB Service Manager screen from Promon. Useful information related to the PICA queue, used for sending and receiving messages between the replication server and the replication agent. _DbServiceManagerObjects : New VST containing all information found in the DB Service Manager Objects screen from Promon. Useful information related to the registered plugins of the active database. _Repl-AgentActivity : New VST containing all dynamic information found in both the _Repl-Agent VST and the replication agent screen in the replication monitor. _Repl-AgentControlActivity : New VST containing all dynamic information found in both the _Repl-AgentControl VST and the replication remote agent screen in the replication monitor. _Repl-AgentControlActivity 4 Progress

11.7.0: Replication Virtual System Tables: Enhancements

What’s New With VSTs 1 2 Replication Properties
New VST: Inter-agent Communication Data 2 Many of the replication properties useful to transition were added to the _Repl-Server and _Repl-Agent VSTs. To assist with facilitating replication recovery transition incorporating replication sets, a new VST was added for replication inter-agent communication data. Progress

What’s New With VSTs 1 2 _Repl-Server and _Repl-Agent
_Repl-InterAgentActivity 2 _Repl-Server : expanded to include all of the replication server’s transition properties in addition to the replication keep alive, schema lock action, and agent shutdown action properties. _Repl-Agent : expanded to include all of the replication agent’s transition properties in addition to the replication agent’s connect timeout, minimum port, and maximum port properties. _Repl-InterAgentActivity : New VST for all inter-agent communication activity. Relevant to transition involving replication set. Some useful fields include last time a ping message a local agent received from a remote agent, last acknowledged source AI sequence a remote agent has sent to the server (and the local agent), and the last acknowledged source transaction number a remote agent has send to the server (and the local agent). Progress

11.7.0: Replication Monitor Enhancements
Progress

Replication Monitor Enhancements
Prior to 11.7, the monitor was missing some information found in only the VST. All additions to existing screens are at the bottom. Should not break your screen scraper Notable enhancements include addition of transition properties to the agent and server status screen, as well as other properties. Allows sanity checks on transition behavior before invoking transition! A new Inter-agent status screen on target databases. Enhancements to the monitor were made in One of our goals is to keep promon consistent with the database VSTs, and the same goal applies to the Replication Monitor and its VSTs. These enhancements make the monitor and VSTs consistent, and also provide additional information such as the transition properties. All of these changes to existing screens are at the bottom to preserve any screen scrapers. These changes are useful because they provide a way to validate the current transition properties for the agent or server before invoking transition. This way you can expect the right behavior! Progress

builddb Enhancement

11.7.0: Prostruct builddb support
Re-creates a control area (.db) from the structure description (.st) of an existing database. Used to recover when an existing database control area is lost or damaged. Prior to OpenEdge 11.7, the command will disable replication and cause a rebase. In OpenEdge 11.7, replication will remain enabled and no rebase is needed. In 11.7, a change was introduced to allow a DBA to correct an issue that could occur involving the loss of the .db file from either the source or target replication enabled database. Sometimes customers have reported the accidental deletion of the db file. In a non-replication enabled database, the DBA was able to re-generate the correct .db file by using the prostrct utility with the builddb option. In previous releases, use of this option on a replication enabled database would cause replication to be disabled and in some cases, after imaging was disabled. In the case, the use of builddb would cause a replication rebase. Now in 11.7, the DBA first needs to make sure prior to running prostrct with the builddb option, they have a current .st file for the database where the accidental deletion occurred. Best practices would dictate a current .st file is always available. Incorrect .files will yield incorrect results and may corrupt open access to the database. Review the current .st file and make sure all extents (all types) are present and match the file directory version of the database. If the correct .st file is present, _proutil with the builddb option can be used to re-generate the appropriate .db file for the database. Builddb option will regenerate the .db file for the database. Replication and after imaging will remain enabled and the database will remain a source or target replication database. The source and target database will synchronize at startup after this operation completes successfully. Progress

11.7.0: DSRUTIL Status Verbose

dsrutil status verbose
Example: Command Line: dsrutil db1 -C status agent1 -verbose When you want to get status and its description with it. Use –verbose status command. For example dsrutil db0 –C status agent1 –verbose. This command will return the status number of the agent along with the description. This command is useful as it gives out descriptive information. There is a table at the end of the presentation that describes the status value and their descriptive text 3048: Startup Synchronization Progress

For more information… Demonstrations of the new features will be available at the Reception and Expo Other sessions for your consideration: 541: What’s new with the VSTs 419: OpenEdge Roadmap and Vision 437: OpenEdge Deployment Info Exchange Progress

We Have Questions for you!
How many in the audience have single target environments? How many in the audience have two target environments? There are many transition use cases available today. We would like to reduce the use cases to: Fail Over and Recovery transition require that all replicas are online Transition to a source or to a normal database may happen offline Are there other use cases we should consider? Progress

Questions for us?

Appendix 1: Replication server state values and their description
Status Description Unknown 6001 Server Initialization 6002 Connecting to Agents 6003 Configuring Agent(s) 6004 Recovery Processing 6005 Startup Synchronization 6021 Normal Processing 6022 Performing Transition 6023 Replication is Suspended 6060 Server is ended Progress

Appendix 2: Replication Agent state values and their description
Status Description Unknown 1001 Initial Connection 1002 Initializing 1003 Target Database in Quiet Point 1032 Initial Connection Failed 1033 Recovery Failed 1034 Invalid Target Database Configuration 1035 Agent Failed 1036 Agent is Ignored 1037 Agent is Stopped Status Description 1038 Agent is Terminated 1063 Agent is Ended 2080 Pre Transition 2081 Applying After-image Extent 2082 Transitioning 2083 Listening 2084 Waiting while JTA transactions are resolved 3048 Startup Synchronization 3049 Normal Processing 3050 Recovery Synchronization Status Description 3051 Online backup of Target Database 3052 Target Database in a Quiet Point 3053 Target Database in BI Stall 3054 Target Database in AI Stall 3055 Being Transitioned Progress

What’s New in OpenEdge Replication?

Similar presentations

Presentation on theme: "What’s New in OpenEdge Replication?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

What’s New in OpenEdge Replication?

Similar presentations

Presentation on theme: "What’s New in OpenEdge Replication?"— Presentation transcript:

Similar presentations

About project

Feedback