OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software.

OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software

About the speaker  Head Winemaker – White Star Software One of the oldest and most respected consulting and training companies in the Progress OpenEdge sector  Lackey – DBAppraise Managed database services backed up by experienced Progress OpenEdge professionals not rookies off the bench  Read a book or two  Snappy Dresser  Knows a bit about systems and OpenEdge

Agenda  Are you really 24X7?  Redundancy  Replication  Maintenance  Failing over  Conclusion

What is High Availability?  A real business need that requires full access to current data at any time of the day or night  Many sites are kind of 24X7 but only a small percentage of companies have real business requirements that necessitate access to the data 24 hours a day.  Some applications have high availability needs but only during given hours which simplifies maintenance  The need is growing every day

Are You Really 24X7?  Business runs 24 hours a day 3-shift manufacturing, Utility, Casino, Website,…  Business needs access 24 hours Work during the day, report and plan at night  Weekend requirements

What is High Availability?  The ability to keep running your business  Continuous Access which allows for failures with zero impact to the users  Minimally Invasive failure management like using HACMP clustering with OpenEdge as a cluster service  Major Failover where physical location of the application must be changed  Minimal recovery time in case of disaster  It is not disaster recovery – DR is only used when HA fails

Before you begin  Understand your business  Understand the cost of downtime  Do not build a solution that costs more that what you are protecting

People  Who “owns” the data  Be inclusive with invites most will drop out  This is not solely an IT decision −You are the keeper, not owner of the data −You know what is technically possible −You know the cost of the tech needed to build the solution  The goal is to eliminate surprises if/when a problem occurs

Planning  Budget – it is not free  Hardware – fault tolerant, redundancy, …  Software – OpenEdge plus ALL the other stuff you have to run the operation  Knowledge – Buy or Rent  Time – schedule and outage time  Personnel constraints – Who is on call and who is their backup

Causes of Downtime  Hardware −Disks are most vulnerable as they are the only moving part unless you have SSD −Power - All the hardware requires power  Software −OS bug −OpenEdge (core or application) bug  Natural disaster −Fire −Flood  Sabotage  Human Error

Basic Rules  Good Hardware −Trusted vendor −Good support (local support if possible)  No Windows (OK, maybe 2008)  You need a good recovery plan  You will run with after imaging enabled

Redundancy  Hardware  Software  Personnel

Redundancy: Hardware  Power (UPS or UPS + Generator)  Mirrored disks  Network - in machine and general network  Non-interleaved memory (some use FT memory)  Multiple CPUs  Support hardware (PCs, terminals, phone,…)  Complete failover environment

Hardware  Why have a UPS and a generator? −UPS has limited capacity −Generators can run for a long time −Have a reliable source of extra fuel

Hardware  Do not let standby systems sit idle  Use them for development or test  Keep copies of all support files −.pf −.ini −.d

Redundancy: Software  Host-based are least fault tolerant  Web-based can provide a good environment provided the AppServer calls are stateless  In client/server model remember that file servers need to be redundant as well

Redundancy: Software  NameServer on the broadcast and clustered  Don’t use the NameServer  Cluster your AppServers so if a single AppServer fails there is another to pick up the load

Redundancy: Staffing  Is the failover machine close?  Can it reliably be accessed remotely (failure point)  Possible to call in additional resources? −More hands −Different skills −Relief of tired staff  Is it necessary to support all functions or only core?

Replication of Data  Database data −OpenEdge replication (synchronous) −Log-based replication (asynchronous) −Hardware-based replication (?)  Application and User files −OS utililty (fsync, rsync, …) −Hardware (remote mirroring) −Third-party (polyserve)

Replication: OpenEdge  Pros: −Supported product −Synchronous −Fast (Really Fast)  Cons −Cost −Yet another thing to support −Additional resource usage

Replication: Log-based  Pros: −Cheap (Not free, but close) −Easy to setup and maintain  Cons: −No formal support −Additional resource utilization

Hardware Replication  Pros: −Easy setup −Easy Maintenance  Cons: −Expensive −Possibility of data corruption unless ALL writes are guaranteed

Maintenance  Script everything to eliminate human error  Scheduled Maintenance −Application changes −Backups −Index maintenance −Adding space  Unscheduled maintenance −Eliminate unscheduled maintenance buy monitoring and trending

Maintenance: Application  Schema −Use fast schema add then add default value −Still requires an outage for some changes due to table locks  Code changes −If you are n-tier you can stop the AppServer to reduce the interruption −Switch to a different propath and move clients over through natural attrition

Maintenance: Backups  Progress backup −Reliable −Online option  Split mirror backup  Replication backup −Eliminate overhead on production db −Must be a no recover backup for log-based replication

Maintenance: Index  Index rebuild cannot be run against a replicated database  Use index compact online proutil -C idxcompact  Notes: −Watch for open transactions as idx compact will do a significant amount of logging −Schedule outside of busy times to allow replication to keep up

Maintenance: Add Space (Online and offline approaches)  prostrct addonline to add space while you are running  Process −Make sure your umask is correct −Validate your add.st file −prostrct addonline db add.st  prostrct is supported for both source and target databases with the exception of prostrct unlock  Process −Shutdown source and target −Make changes to source −Make changes to target −Start both databases

Maintenance All maintenance should be scripted and tested in a test environment before proceeding with the Production run −Eliminate the human element (no typos) −Know how long it will take −Make sure maintenance does not cause a problem −Apply and test schema changes thoroughly

Building a failover plan  Who −Business and technical personnel −Gets informed – email, conference call, call tree,… −Makes Decisions −Does the work  What −What resources are affected?  Where −Location of physical resources −Location of personnel −Location of replacement/replication target

Building a failover plan - continued  When −Times of backups −Times of data archiving −Times of backup archiving −Times of log archiving  Why −What are we protecting ourselves from −Why did we choose not to deal with some event

Risk Assessment  Things to consider −Risk – Natural Disaster, Human caused, hardware, … −Likelihood −Impact to application environment −Time to recover  It is OK to say we considered that and it was not high enough in likelihood in our eyes to create a solution  Determine the dependency of each level −Hardware requires power −OpenEdge application requires PostalSoft

Solutions  Document redundancy where it exists  Document places where redundancy is missing or unknown (on purpose or omission)  Ensure reasonable software update procedures are in place and documented  Verify security, division of responsibilities and software release policies per layer  Need to develop Risk Assessment form

Aspects of a failover plan  When −When do we decide to move to the standby environment? −Who makes the decision? −Who does the work along with a backup for who does the work −Defined process −Service level agreements with customers −Milestones in the process  Why −This is a tougher decision than you think −Fix or flee – lost time vs. lost data

Documenting your plan  Your plan should be able to be executed by anyone  You cannot have enough detail  Automate as much of the process as possible to eliminate the human element  Document and automate both the failover and the failback

Test your plan  Switch over to your standby environment and run for a day or more  You don’t want to cause an extended outage testing your plan  You will only find issues if you run at full load  Do this at least once a year  Follow your document and correct mistakes as you go

Keep documents and support files up-to-date  Keep your failover and failback documents up-to-date  Keep contact lists up-to-date  Keep all individual process documents up-to-date  Keep copies of your support files −Scripts −Application (.pf,.ini,.properties, …)  Good password management  Keep everything accessible (online and hard copies)

Points to Remember  Build redundancy into all aspects of your operation  Look at the likelihood of a failure and its impact to the customer  Protect your entire application environment both hardware and software  Build a total solution but think about the cost/benefit of each component  Automate tasks to eliminate human error  Test your failover plan at least once a year

Questions? Adam Backman adam@wss.com

Thank you for your time!

OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software.

Similar presentations

Presentation on theme: "OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software.

Similar presentations

Presentation on theme: "OpenEdge High Availabilty Adam Backman Grand Poobah – White Star Software."— Presentation transcript:

Similar presentations

About project

Feedback