Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Similar presentations


Presentation on theme: "Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France."— Presentation transcript:

1 lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France

2 Contents Context Rationale Feedback and Suggestions Conclusions

3 Context http://goc.grid.sinica.edu.tw/gocwiki/Procedure_for_new_CA_release D.Groep opens a ticket in GGUS SA3 creates a test repository with the new CA rpms. SAM team makes a new version of the CA sensor. CERN-PPS upgrades to the new CA rpms. SAM validation instance runs over CERN-PPS to test the new sensor. The new sensor is put into the SAM production instance. SA3 updates the lcg-CA production repository. Broadcast to sites.

4 Context Friday 01/02/08 https://gus.fzk.de/ws/ticket_info.php?ticket=31993&from=search « Due to miscommunication step 5 was already executed while the rest of the procedure was blocked on step 3, which was finally done Monday at the end of the afternoon, but then step 6 could not be executed due to a problem with AFS permissions, which should get solved on Tuesday morning... Earlier « Release of CA1.17-1 that sites are complaining that they are in a "warning status" without being told that there is a new set of rpms; i.e *whithout* being told as rpms are set in the repository. i.e the step 8 of the process seems not to be properly followed up. In this specific instance, sites were appearing in "warning state" while the new CA version was updated. The associated GGUS ticket was closed - SAM tests and rpms released - without relevant "broadcasts" being published » The topic came out again at the EGEE'07 ROC managers and again on the ROC list.

5 Rationale A round the table has recently been done with the involved SAM/integration/deployment/security teams and no dramatic truths have emerged against the current process. Except SAM'tests modifications request recorded below, recommendations to be examined in the ROC managers attendance basically concern external CA upgrade procedure mechanisms to improve and ensure the procedure is smoothly (and rapidly, i.e less than a day) followed up until the very last step and to help fulfill the sites' need for communication on that process.

6 Feddback from mailing lists SAM CA tests [SAM – GGUS ticket # 32204 – How about SAM tests modifications Comes from Stephen Burke's remark on Feb 04th on rollout makes also echo to several others‘ – M.Lithmath, Jeremy Coles, COD meetings. « Now, none of this would have been a problem if the CA sensor only required a _minimum_ version instead of an exact version. Maybe I am missing some technical detail here, but I argue that the test should be changed. » M.Lithmath LHC Computer Grid - Rollout > [mailto:LCG-ROLLOUT@JISCMAIL.AC.UK] On Behalf Of Maarten Litmaath said: > Yes, that is a possibility, but it gives more work to the SAM team. As opposed to causing trouble for 250+ sites... anyway, the current procedure manages to have a switch from generating warnings to errors, is it that hard to have two switches - nothing -> warning -> error? Stephen.mailto:LCG-ROLLOUT@JISCMAIL.AC.UK No specific answer has ever been given except a non-warning "one-day grace period« - Nov 19th 2007 Last update on ticket: From SAM team : proposes now procedure modifications; no improvements in the tests so far. From H.Cordier : Please take into account SAM tests improvements suggestions and help that have been proposed on rollout. Namely from Eygene Ryabinkin : rea@GRID.KIAE.RU.]rea@GRID.KIAE.RU

7 Suggestions /feedback from round the table 1/2 The procedure is followed up till the end 1. Start the whole process on Mondays only: Proposed by SAM yesterday # 32204 Wednesdays 12:00 seem to be more reasonable as urgent updates need to take place and in emergency the whole process should not take longer than a day. 2. Involve the OSCT-DC in the ticket so that they close the GGUS ticket at the last step of the process, in order to decouple the people doing the procedure from the people verifying the process, making sure the repositories are updated and broadcast done. OSCT validate the need of an external observer and their involvement is validated by CERN teams. OSCT validate the need of an external observer and their involvement is validated by CERN teams.

8 Suggestions /feedback from round the table 2/2 Improve communication towards sites 3. Introduce a CA release process indicator 3. Introduce a CA release process indicator, to allow sites to follow the process - namely when a release is about to be prepared. Indeed, if site admins wish to be informed they could just subscribe to a RSS flow against the change of status of this CA release process indicator. Consequently, site admins know at which stage the new CA release process is in. Integration team /J.Flammer supports this idea and mentions that the GGUS ticket number could be published together with the indicator process. This small page could be developped within integration team - TBC.

9 Example of process indicator for sites First step could be that D. Groep changes the status of the CA release process indicator from "done" to a status "initialized" at D-15. At D-Day, D. Groep creates the GGUS ticket according to the current procedure *and* modifies the CA release process indicator to « in progress ». The procedure goes then unchanged until the last step when the integration team instead of closing the ticket directly at the end of step 8, assigns the ticket to OSCT-On-Duty. An extra "step 9" could be : The duty is for the OSCT-DC to close the given GGUS ticket after checking that both links in the step 7 of the procedure are correctly updated and that the broadcast in step 8 is done. Finally, he then sets the CA release process indicator to "done".

10 Additional Remarks SAM test improvements #32204  OCC Remaing improvement margins seem to be reliant on SAM/integration/deployment internal organisation and are very difficult to have leverage on (priorities ????) and out of scope here, except for ROC/sites to mention *each time* that the existing process is no sufficient counterpart to this lack. Namely, improvement in direct communication between CERN teams / phone numbers cf.J.Flammer, F.Schaer Add INFO status within SAM tests in addition to WARNING status, proposal by Gergely D., Fred Schaer, S. Burke.

11 Summary Conclusions Round the table closed. Sites should add comments in their RC reports for further debate. Actions needed from ROC and OCC to validate and follow-up : –3 proposals on the procedure itself. –SAM tests modification and GGUS ticket # 32204 –Nominate a responsible body for keeping the procedure updated: Btw/ 2nd link in item 7 of the procedure does not work – mail from Romain on January 23 rd 2008.


Download ppt "Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France."

Similar presentations


Ads by Google