Tivoli Workload Automation: Planner functionality and recovery actions © 2012 IBM Corporation
TWS Distributed – 8.2.x Job Scheduling Console 8.2 TWS Master Scheduling DB Schema cannot be exported and accessed to external apps Difficult DB recovery procedure TMF Dependency for JSC connection No Open APIs for external applications no LDAP support Job Scheduling Console 8.2 TWS Master Job Command Line 8.2 TWS Domain Managers Install, manage and administer TMF for JSC usage only Need to install, managed and administer TMF for a JSC usage only Platform support tied to TMF availability No Open APIs for external applications or Web GUI usage no LDAP support for security authentication Scheduling DB Schema cannot be exported and accessed to external apps DB recovery needs additional software or hardware for ad-hoc recovery procedures and to keep DB updated with Backup Master Addition of improved Planning Capabilities requires lot of data restructures and dev efforts 8.2 TWS FTA
TWS Distributed – 8.6 RDBMS Public DB Views CLI 8.6 TWS Domain Managers + Backup DM 8.6 TWS Master + Backup Master WebUI server (TDWC 8.6) HTTP Load Balancer DB2 in HA Job RDBMS Public DB Views Standard procedure for recovery, backup and sharing across Masters Facilitates addition of new planning capabilities WAS Very limited admin required More platforms Open APIs: J2EE EJBs and WS LDAP, Kerberos 8.6 FTA 8.6 FTA 8.6 FTA 8.6 FTA
Production Plan Generation A new script JnextPlan (that replaces the old Jnextday script) creates the production plan. The following command produces a production plan that starts today at 00:00 (start of day) and ends at 23:59. JnextPlan The script allows the creation of a production plan covering multiple days or a few hours. Running the following command results in a production plan that starts today at 00:00 and finishes tomorrow at 23:59. JnextPlan –for 4800 JnextPlan can be run with a zero minute extension, this updates static information like Workstations, Windows users, Calendars and removes completed Job Streams (Carry Forward must be set to ALL) without adding new job stream instances. JnextPlan –for 0000
Production Plan Generation JnextPlan syntax: JnextPlan [- from mm/dd/[yy]yy [hhmm [tz | timezone tzname]]] [–to mm/dd/[yy]yy [ hhmm [tz | timezone tzname]]] | [–for [h]hhmm] [–days n] -from sets the start time of the new production plan. The format of the date is specified in the localopts file; hhmm identifies the hours and the minutes and tz is the time zone. The “start of day” option can be updated with a new command line called “optman” that replace the globalopts file. -to is the new plan end time; the date format is the same of the from parameter; if it is not specified the default value for this parameter is the "the date and time specified in the -from field + 23 hours and 59 minutes". -for is the plan extension in terms of time; the format is the following: hhhmm where hhh are the hours and mm are the minutes. If it is not specified the default value is 24 hours. -days is the plan extension in terms of days. Default values maintain the backward compatibility!! Use “optman ls” to show the default values (including the “start of day” time) stored in DB (in this release the globalopts file doesn’t exist any more!)
Jnextday and JnextPlan TWS 8.6.0 TWS 8.2.x StartAppServer MakePlan conman “continue & link @!@;noask” conman “continue & link @!@;noask” Planman Preproduction plan generation Symnew creation schedulr compiler reptr –pre Symnew reptr –pre Symnew conman "continue & stop @!@;wait;noask" SwitchPlan conman "continue & stop @!@;wait;noask" stageman stageman wmaeutil.cmd ALL -stop planman confirm conman "continue & start" conman "continue & start" CreatePost Reports reptr -post …/schedlog/M$DATE rep8 -F …. -i …/schedlog/M$DATE reptr -post …/schedlog/M$DATE rep8 -F …. -i …/schedlog/M$DATE UpdateStats logman logman
Final Schedule & JnextPlan STARTAPPSERVER The Final Job Stream is now made up of five different jobs: MAKEPLAN SWITCHPLAN StartAppServer Checks that WAS is running and starts it if not MakePlan Creates Symnew and make pre-production report SwitchPlan Stops the TWS agents Runs Stageman to merge old Symphony and Symnew Confirms the switch of the plan to the planner Starts the Master and the Symphony distribution CreatePostReports Creates post production reports UpdateStats Runs logman to update Pre-Production plan and job history and statistics In FINAL runs in parallel to CreatePostReports CREATEPOSTREPORTS UPDATESTATS
FINAL Job Stream
Production Plan status Planman showinfo This command retrieves the information related to the production plan status.
Production Plan Extension JnextPlan and Production Plan extension When the production plan already exists and a JnextPlan is run, the production plan is extended (by default for 24 hours). After the extension, the new production plan contains the new instances related to the extension period and all the job stream instances not yet completed, which are carried forward. Note: in the TWS 8.6 the Encarryforward keyword is used to specify the “carry forward” property. This keyword is stored in the database and its default value is ALL. During migrating data from a previous version, the value is copied from the previous configuration. Encarryforward keyword all: ignores if the Carry Forward key is enabled or not in the job streams definitions, and carries forward all uncompleted job streams. yes: Carries forward only those uncompleted job streams that have the Carry Forward key enabled .
Planning considerations JobStreams not copied into the Current Plan: If they don’t have a Run Cycle If Run Cycle doesn’t result in a ”run” day in the planning period Ad-Hoc submission is allowed: If the Job Stream is defined in the database If the Job Stream is not draft and is valid the ON request flag is no longer mandatory (it was already ignore in previous releases)
Planning considerations (continued) DRAFT / ACTIVE definition DRAFT: defined in the database; not used for Production Plan ACTIVE: defined in the database and used for Production Plan Sample Usage scenario: JobStreamA must not run tomorrow Set JobStreamA to DRAFT Extend Current Plan JobStreamA is not included into the Production Plan VALID FROM definition A JobStream can have multiple versions VALID FROM date specification can differentiate JobStream versions Sample usage scenario: JobStreamA: JobA -> JobB -> JobC; JobD must be added to workflow. Needs to be ready to run in 2 days when new apps goes live production Modify JobStreamA and insert new definition like JobA …JobC -> JobD and insert a the VALID FROM date specification Extend the Plan JobStreamA will have 2 sets of versions in the plan, in accordance with the dates
Planning considerations (continued) Production Plan By default starts at 00:00AM but can be modified By default covers 24 hours (normal workdays) A higher or lower period can be specified Can span few days (during weekends or holidays) or more Once created the first time it is always Extended Extension can varies from a 1 minute to days Symphony Size Same size and structure of expanded symphony of 8.2.x Longer plans will produce larger Symphony files At lease 512 bytes (1 Symphony record) for each Job Stream Instance and for each Job Instance
Symphony with more than 24hrs Database Plan Jobs Job Streams Workstations NT Users “Symphony” JnextPlan Prompts Resources Calendars Pre Production (LTP) The scheduled workload for one or more production days The collection of all defined scheduling objects. Job Stream instances and external dependencies for several days The Symphony file contains objects needed for production plan period: Workstations, Calendars, Job Streams, Jobs, Dependencies The JnextPlan runs on Master Scheduler as part of the production plan: JnextPlan extends the Production Plan and create a new Symphony file The Pre Production Plan contains job stream instances calculated in advance for several days and external dependencies resolved on those instances according to matching criteria JnextPlan takes modeling data from database and creates the production plan building the Symphony file Can cover more or less than 24h Don’t evaluate runcycles every time, but uses an intermediate Pre-Production Plan to cache runcycles evaluation and dependencies resolution
Production Plan Extension Database Workstations Resources Job Streams Calendars Jobs Pre Production Plan …. 10 days today tomorrow Symnew Plan is not recreated, but extended. JnextPlan takes modeling definitions and pre-production plan to create objects for the next period. And merge it with the old plan removing the completed job streams. Current plan uses similar sources as the long term plan, but it added the resources database, long term plan and old current plan (if available) for consideration. In the extension, it reads all these information and generate detailed information on what job to run and at what time. It also generates dependency and timing requirement for the jobs. Current Plan Extension Old Symphony New Symphony Remove completed job streams Add detail for next plan period
StartAppServer Checks that WAS is running and starts it if not. In case of failure: Rerun the job
MakePlan Replans or Extends Pre-Production plan if needed. Produces the Symnew file. Generate Pre-Production reports in the joblog. In case of failure: Global lock may be left set, use planman unlock to reset it. Rerun the job to recover Pre-Production plan is automatically re-verified and updated. Symnew is recreated.
MakePlan How to stop it: Stopping the job may not stop the processing still running inside WAS or on DB. Force the DB statement closure if a DB statement is running too long and cause Makeplan to abend. Restart WAS is required if processing is still running in WAS and Makeplan does not terminate. Best Practice: Check if the database statistics is enabled. If not, it is strongly suggested to schedule the runstatistics script stored in the dbtools TWS directory.
MakePlan – Error messages If MakePlan stdlist shows the following messages: AWSBEH023E Unable to establish communication with the server on host "127.0.0.1" using port "31116". This error means that the application server (eWAS) is down and MakePlan is not able to continue. In this case, the suggestion is to start the eWAS and check the eWAS logs in order to identify the reason of the eWAS stop. AWSBEH021E The user "twsuser" is not authorized to access the server on host "127.0.0.1" using port "31116". This is an authorization error . The suggestion to address this error is to check the twsuser credentials in the useropts file. AWSJPL018E The database is already locked. This means that a previous operation of MakePlan is stopped and the global lock is not reset. To recover the situation runs “planman unlock”.
MakePlan – Error messages If MakePlan stdlist shows the following messages: AWSJPL006E An internal error has occurred. A database object "xxxx” cannot be loaded from the database. In general “xxxx” is an object like workstation, job, job streams. This error means that a connection with the database is broken. In this case check in the SystemOut.log and the ffdc directory the error because additional information related to the database issue is logged. AWSJPL017E The production plan cannot be created because a previous action on the production plan did not complete successfully. See the message help for more details. This error means that a previous operation on the preproduction plan is preformed but finished with an error. In general it is present when “ResetPlan -scratch” is performed but not successfully finished. AWSJPL704E An internal error has occurred. The planner is unable to extend the preproduction plan This error means that MakePlan is not able to extend the preproduction plan. Different root causes are associated at this issue, in general always related to the database, like no space for the tablespace , full transaction logs. The suggestion is to check more information in the SystemOut.log or in the ffdc directory.
SwitchPlan Stops all the CPUs Runs stageman To merge old Symphony file with SymNew To archive the old Symohony file in schedlog directory Runs planman confirm to update in DB plan status information (e.g. plan end date and current run number) Restart the master to distribute the Symphony file and restart scheduling. In case of failure: Planman confirm has not been run yet (check logs and “planman showinfo”) Rerun SwitchPlan Planman confirm has failed Manually run “planman confirm” and “conman start” Planman confirm has been already run (e.g. plan end date has been updated) Run “conman start” How to stop it: If conman stop is hanging, just kill conman command. This may impact plan distribution that will need to stop the agents left running before distributing the new Symphony.
SwitchPlan – Error messages If SwitchPlan stdlist shows the following messages: STAGEMAN:AWSBHV082E The previous Symphony file and the Symnew file have the same run number. They cannot be merged to form the new Symphony file." There are several possible causes for the Symphony and Symnew run numbers to be the same: 1. MAKEPLAN did not extend the run number in the Symnew file. 2. SWITCHPLAN was executed before MAKEPLAN 3. The stageman process has been run twice on the same Symnew file without resetting the plan or deleting the Symphony file. AWSJCL054E The command "CONFIRM" has failed. AWSJPL016E An internal error has occurred. A global option "confirm run number" cannot be set In general, these error messages are present when the last step of the SwitchPlan that is “planman confirm” fails. The suggestion is to analyze the SystemOut.log to check more information and to rerun “planman confirm”.
UpdateStats Runs logman to update job statistics and history Extends the Pre-production plan if its length is shorter then minLen In case of failure: Rerun the job or manually run “logman <file>” on the latest schedlog file. If not run, the statistics and history will be partial. Pre-Production plan is updated anyway at the beginning of Makeplan. How to stop it: Kill the job or logman process, the statistics and history will be partial until the job or logman is rerun.
CreatePostReports Generate Post-Production reports in the job output In case of failure: Rerun the job if reports are needed
Recovery Plan Procedure Symphony Corruption Follow these steps on the master domain manager: Set the job limit to 0, using conman or the Tivoli Dynamic Workload Console. This prevents all jobs from starting. logman –prod Updates the Pre-Production Plan. planman showinfo Retrieves the start time of the first non-completed job stream instance and the end time of the production plan. ResetPlan Archives the current Symphony file. JnextPlan -from –to Creates a new Symphony file for the period in which there are still outstanding jobs. Only incomplete job stream instances are included in the new Symphony file. Set the job limit to the previous value. The Symphony file is distributed and the production cycle starts again.