UTC-Timing problem P.Charrue for the BE/CO/Timing team 1.

Slides:



Advertisements
Similar presentations
Information Technology Disaster Recovery Awareness Program.
Advertisements

Alert radio repeater automated backup, failover, recovery David Leader HydroLynx Systems.
Expected Value. Objectives Determine the expected value of an event.
Gemini issues Ritchard Hewitt Gas Codes Development Manager.
Distributed Systems Spring 2009
Time in Embedded and Real Time Systems Lecture #6 David Andrews
BE-CO work for the TS Nov 8 Nov 11P.Charrue - BE/CO - LBOC1.
Centralized Architectures
Synchronization in Distributed Systems. Mutual Exclusion To read or update shared data, a process should enter a critical region to ensure mutual exclusion.
Empowering Business in Real Time. © Copyright 2009, OSIsoft Inc. All rights Reserved. Data Center & IT Monitoring Use Cases Regional Seminar Series Carolyn.
Fundamentals of Networking Discovery 1, Chapter 9 Troubleshooting.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW Understand the difference between service.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
E. Hatziangeli – LHC Beam Commissioning meeting - 17th March 2009.
WORKFLOW IN MOBILE ENVIRONMENT. WHAT IS WORKFLOW ?  WORKFLOW IS A COLLECTION OF TASKS ORGANIZED TO ACCOMPLISH SOME BUSINESS PROCESS.  EXAMPLE: Patient.
Emlyn Corrin, DPNC, University of Geneva EUDAQ Status of the EUDET JRA1 DAQ software Emlyn Corrin, University of Geneva 1.
Creating a common understanding on Adverse events information requirements CCC-TI Isabelle Laugier Nov 2 nd 2012.
Schedule and Work We are moving into a very intensive 4 months with a lot of work to be carried out Electrical services to the devices – after the South.
Improving Networks Worldwide. UNH InterOperability Lab Serial Attached SCSI (SAS) Clause 7.
T HE BE/CO T ESTBED AND ITS USE FOR TIMING AND SOFTWARE VALIDATION 22 June BE-CO-HT Jean-Claude BAU.
LHC BLM Software revue June BLM Software components Handled by BI Software section –Expert GUIs  Not discussed today –Real-Time software  Topic.
Gloria Hauck Thiele University of Michigan EDUCAUSE ‘99, October 27, 1999 Y2K: The Beginning of the End of the Final Chapter.
John Coughlan Tracker Week October FED Status Production Status Acceptance Testing.
Network Computing Laboratory 1 Vivaldi: A Decentralized Network Coordinate System Authors: Frank Dabek, Russ Cox, Frans Kaashoek, Robert Morris MIT Published.
00:15: Stable beams fill 1883, 1.1E33 cm-2s-1.  Seems the 144 bunches beam 1 doe not fit properly on the injection kicker waveform. Systematically the.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
Switching Topic 3 VTP. Agenda VTP basics Components Frames and advertisements Domains and revision numbers VTP operations VTP pruning VTP issues.
1 Commissioning and Early Operation – View from Machine Protection Jan Uythoven (AB/BT) Thanks to the members of the MPWG.
TELL1 command line tools Guido Haefeli EPFL, Lausanne Tutorial for TELL1 users : 25.February
TEL62 AND TDCB UPDATE JACOPO PINZINO ROBERTO PIANDANI CERN ON BEHALF OF PISA GROUP 14/10/2015.
Technical Stop feed-down P.Charrue on behalf of the BE Controls Group 5th September 2011P.Charrue - 8h30 meeting1.
Post Mortem Workshop Session 4 Data Providers, Volume and Type of Analysis Beam Instrumentation Stéphane Bart Pedersen January 2007.
R. Fantechi. Shutdown work Refurbishment of transceiver power supplies Work almost finished in Orsay Small crisis 20 days ago due to late delivery of.
DIAMON Project Project Definition and Specifications Based on input from the AB/CO Section leaders.
COMP 655: Distributed/Operating Systems Summer 2011 Dr. Chunbo Chu Week 6: Synchronyzation 3/5/20161 Distributed Systems - COMP 655.
AC-Dipole Upgrades LHC Optics Measurement and Corrections review June CERN N. Magnin – TE/ABT/EC Thanks to E.Carlier, R.A.Barlow, J.Uythoven.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Saturday 11.9 ● From Friday – Minimum required crossing angle is 100  rad in 2010 – Plenty of aperture at triplets: > 13  (n1 > 10) – Can stay with 170.
LHC RT feedback(s) CO Viewpoint Kris Kostro, AB/CO/FC.
BE-CO work for the TS Outcome of the actions 23 – 28 Apr May 12P.Charrue - BE/CO - LBOC1.
TE-MPE-CP, RD, 28-Sep Problems with QPS DAQ Systems During LHC Operation, 1 st Results from 2010 CNRAD Tests R. Denz TE-MPE-CP.
MPE Workshop 14/12/2010 Post Mortem Project Status and Plans Arkadiusz Gorzawski (on behalf of the PMA team)
Deploy National Continuous Integration Server Zhe LI R&D Engineer, INRIA Rennes Team: KerData July
LIU Configuration Management EDMS Documentation, Layout and ECRs This presentation follows the LIU-PSB specific presentation done on 9 th October 2014.
OPERATES SCADA OPERATION SYSTEM Explain the operational SCADA
Securing Network Servers
CS 380 Switch/Router Lab Project Introduction
A monitoring system for the beam-based feedbacks in the LHC
Data providers Volume & Type of Analysis Kickers
LHC General Machine Timing (GMT)
Tests of new host and firmware August/September editions
Phase Angle Limitations
Cross-site problem resolution Focus on reliable file transfer service
Network/Controls issue 15 September h00 – 9h30
SLS Timing Master Timo Korhonen, PSI.
Injectors BLM system: PS Ring installation at EYETS
VAR Preparation Meeting
-9:00: Test IP transverse adjustment (CMS) and optics verification.
J. Uythoven for the MPE-MI & MS Teams
TYPES OF SERVER. TYPES OF SERVER What is a server.
Summary of Week 11: 14 – 21 March
PEAK’s RMTNET Tool Sarma (NDR) Nuthalapati, Principal NetApps Engineer
Global Post Mortem Event Event Timestamp: 15/06/12 15:54:
Flexible Distributed Reporting for Millions of Publishers and Thousands of Advertisers Berlin |
Thu 6/4 End of fill #2470. ~1 pb-1 in 4 hours. Dumped by CMS BCM.
Wednesday 10:00 test of the un-squeeze to 90 m at 4 TeV.
TAG Agenda – April 6th 2006 Minutes from last TAG meeting – 13.45
LHC BLM Software audit June 2008.
Preliminary analysis of PS septum 31 failure August 8th, 2008
Thursday 17/3 Morning: evaluation of the origin of the timing problem occurred over night and the implication on machine protection 09:00 – 12:00 Dry ramp.
Presentation transcript:

UTC-Timing problem P.Charrue for the BE/CO/Timing team 1

Observations Starting last Wednesday at 20:49:14 – UTC timestamps of some systems was observed to be wrong Impact of this problem – Logging and post mortem data are tagged with wrong UTC time Normal events and SMP data were correctly distributed (Ruediger) 2

What is the problem Every 1ms: 8 events are sent on the timing network Every 100ms: the SMP system sends few events to central timing to be distributed Every 1s: the UTC event is sent on the timing network The problem is a result of two issues: 1.Central Timing priorities configuration 2.Bug in the Timing Receiver Cards firmware Central Timing priorities configuration The central timing firmware is configured with the following priorities: – 1 st => events – 2 nd => SMP, asynchronous events – 3 rd => UTC When the SMP events distribution coincides with the UTC-second frame distribution, the central timing does not send the UTC frame in favor of the SMP events Bug in the Timing Receiver Cards (CTR) firmware A bug in the CTR firmware was discovered in 2010 that, when the UTC frame does not arrive, the CTR substitutes it with an older frame resulting in a wrong UTC time – A new corrected CTR firmware is available since January – EPC already upgraded and experienced NO problem last night 3

Progress Report The timing team managed to reproduce the problem in their lab A new version of the central timing firmware is ready – The priority of the UTC slot is increased – Asynchronous events will not preempt the distribution of UTC The new firmware is tested in the timing lab and the UTC problem did not appear The new firmware has been deployed on the BE/CO testbed and after 16 hours of intense events distribution tests, no problem was observed 4

Actions Central Timing Server Today at 9h00 the new firmware will be deployed on the operational central timing master server (A) – The slave (B) server will remain with the previous version – The timing team will be in the CCC to monitor the events A ramp of low intensity (3 bunches?) will be done to check if all is working ok A decision to declare the new firmware operational will be taken by the EIC and the timing experts Timing Receiver Cards CO will coordinate the upgrade of the CTR firmware with the remaining equipment groups which did not take the new version in January (~200 CTRs) – Firmware upgrade takes a few minutes and can be done remotely Reminder The timing system is designed without safety-critical applications in mind and event losses is possible to occur under certain conditions Critical or protection functionality should NOT rely on the timing system 5