BaBar Transition: Computing/Monitoring

Slides:



Advertisements
Similar presentations
ClearCube Blade Manager 4.0 Overview and Demonstration Rev
Advertisements

André Augustinus ALICE Detector Control System  ALICE DCS is responsible for safe, stable and efficient operation of the experiment  Central monitoring.
RPC Trigger Software ESR, July Tasks subsystem DCS subsystem Run Control online monitoring of the subsystem provide tools needed to perform on-
April 28, 2005 EPICS Collaboration Controls Group Status of the Channel Access Zippy Archiver (CZAR) B. Bevins, et. al.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Terri Lahey LCLS FAC: Update on Security Issues 12 Nov 2008 SLAC National Accelerator Laboratory 1 Update on Security Issues LCLS.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Terri Lahey EPICS Collaboration Meeting June June 2006 LCLS Network & Support Planning Terri Lahey.
Module 8: Designing Active Directory Disaster Recovery in Windows Server 2008.
MICE CM26 March '10Jean-Sebastien GraulichSlide 1 Detector DAQ Issues o Achievements Since CM25 o DAQ System Upgrade o Luminosity Monitors o Sequels of.
09/12/2009ALICE TOF General meeting 1 Online Controls Andrea Alici INFN and Universtity of Bologna, Bologna, Italy ALICE TOF General Meeting CERN building.
THIN CLIENT COMPUTING USING ANDROID CLIENT for XYZ School.
Designing a HEP Experiment Control System, Lessons to be Learned From 10 Years Evolution and Operation of the DELPHI Experiment. André Augustinus 8 February.
Data management in the field Ari Haukijärvi 2nd EHES training seminar.
Chapter 16 Designing Effective Output. E – 2 Before H000 Produce Hardware Investment Report HI000 Produce Hardware Investment Lines H100 Read Hardware.
Requirements Review – July 21, Requirements for CMS Patricia McBride July 21, 2005.
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
ICT Standards and Guidelines The Structure of the Project Akram Najjar CNSI – Senior Consultant Director of InfoConsult.
Dec 8-10, 2004EPICS Collaboration Meeting – Tokai, Japan MicroIOC: A Simple Robust Platform for Integrating Devices Mark Pleško
Status of NA62 straw electronics and services Peter LICHARD, Johan Morant, Vito PALLADINO.
System Security Chapter no 16. Computer Security Computer security is concerned with taking care of hardware, Software and data The cost of creating data.
California State University, Northridge Certification Process Team B Carlos Guzman John Kramer Stacey LaMotte University of Phoenix.
André Augustinus 10 September 2001 DCS Architecture Issues Food for thoughts and discussion.
ATF Control System and Interface to sub-systems Nobuhiro Terunuma, KEK 21/Nov/2007.
The Scheme of Slow Control System in BESIII Xie Xiaoxi Gao Cuishan
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
March 2008EPICS Meeting in Shanghai1 KEKB Control System Status Mar Tatsuro NAKAMURA KEKB Control Group, KEK.
Clara Gaspar, March 2005 LHCb Online & the Conditions DB.
Computing Division Requests The following is a list of tasks about to be officially submitted to the Computing Division for requested support. D0 personnel.
Oct 8-9, 2005ACS Collaboration Meeting – Archamps, France The MicroIOC From Custom To Production First customer: PSI 25 pieces.
Introduction CMS database workshop 23 rd to 25 th of February 2004 Frank Glege.
Databases in CMS Conditions DB workshop 8 th /9 th December 2003 Frank Glege.
Chapter 2 Securing Network Server and User Workstations.
Online Software 8-July-98 Commissioning Working Group DØ Workshop S. Fuess Objective: Define for you, the customers of the Online system, the products.
Moving Your Paperwork Online Western Washington University E-Sign Web Forms.
1 EIR Nov 4-8, 2002 DAQ and Online WBS 1.3 S. Fuess, Fermilab P. Slattery, U. of Rochester.
DØ Online Workshop3-June-1999S. Fuess Online Computing Overview DØ Online Workshop 3-June-1999 Stu Fuess.
DoE Review January 1998 Online System WBS 1.5  One-page review  Accomplishments  System description  Progress  Status  Goals Outline Stu Fuess.
Reliability of KLOE Computing Paolo Santangelo for the KLOE Collaboration INFN LNF Commissione Scientifica Nazionale 1 Roma, 13 Ottobre 2003.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
LAV thresholds requirements Paolo Valente. LAV answers for Valeri’s questions (old) 1.List of hardware to control (HV, LV, crates, temperatures, pressure,
The BaBar Online Detector Control System Upgrade Matthias Wittgen, SLAC.
COMPUTER NETWORKS Quizzes 5% First practical exam 5% Final practical exam 10% LANGUAGE.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
 1- Definition  2- Helpdesk  3- Asset management  4- Analytics  5- Tools.
I/Watch™ Weekly Sales Conference Call Presentation (See next slide for dial-in details) Andrew May Technical Product Manager Dax French Product Specialist.
Networking Objectives Understand what the following policies will contain – Disaster recovery – Backup – Archiving – Acceptable use – failover.
Artificial Intelligence In Power System Author Doshi Pratik H.Darakh Bharat P.
«My future profession»
Gu Minhao, DAQ group Experimental Center of IHEP February 2011
Trigger, DAQ, & Online Planning and R&D needs
Chapter 7. Identifying Assets and Activities to Be Protected
Presented by Li Gang Accelerator Control Group
SNS Status Report Karen S. White 10/15/08.
ATF/ATF2 Control System
ATLAS MDT HV – LV Detector Control System (DCS)
How SCADA Systems Work?.
APARTMENT MAINTENANCE SYSTEM
Physical Architecture Layer Design
Information Technology Unit State Treasury Agency Ministry of Finance Azerbaijan Republic Elnur Aliev Baku April 11, 2018.
Bluetooth Based Smart Sensor Network
The IFR Online Detector Control at the BaBar experiment at SLAC
The IFR Online Detector Control at the BaBar experiment at SLAC
The Online Detector Control at the BaBar experiment at SLAC
Technology Department Annual Update
WPIC Department of Psychiatry Office of Academic Computing
Presentation transcript:

BaBar Transition: Computing/Monitoring Steffen Luitz BaBar Online Coordinator 8/6/07

Content The current BaBar Online System Requirements for minimal maintenance State Minimal maintenance state implementation Conclusion

The Current BaBar Online System – Main Components Data Acquisition ~150 single-board computers (SBCs) in ~ 20 9u VME crates Level-3 Trigger / Event Processing Farm 50 1u Dual CPU/Dual Core servers Detector Controls ~ 20 SBCs Monitoring and controlling tens of thousands of channels HV, LV, temperatures, pressures, flows, strains, etc. Computing Infrastructure 10s of file and application servers Operator and user consoles Network switches and routers

Minimum Maintenance State Requirements (1) Detector monitoring / controls Temperature, humidity, gas flows, strain gauges, power status, etc. Few 100s of “channels” Automatic alerting (e-mail/paging) Logging / archiving of monitoring data Maintain history

Minimum Maintenance State Requirements (2) Data Acquisition System Considered to maintain a capability to take calibration data In case it would be needed to understand beam data Significantly more labor intensive to maintain Active maintenance of online software required Keep up with operating system versions, etc. Regular exercising of hardware and software Detect and repair hardware and software rot Not very useful without on-detector front-end electronics powered up Prohibitive amounts of effort needed to power up front-ends Even if done, the credibility of tests may be very limited For now we decided against maintaining such a capability

Minimum Maintenance State Requirements (3) Computing and Network Infrastructure Minimize number of systems / devices Minimize power consumption Strong preference for no external A/C required Maximize UPS bridge time Maximizes re-use of still usable systems / devices Secure remote power control and console access UPS for all monitoring – 24h minimum Make everything conform to SLAC computer center standards Hardware and Software

Minimum Maintenance State Requirements (4) Other requirements Re-use as many systems/devices as possible BaBar Offline Other SLAC Preserve ALL central Online system data currently on BaBar online system disks (and tapes) Archived ambient data Error logs Software archives Databases Physical and Cyber Security Network devices and servers need to be installed in secured locations Isolate controls system network Restrict access to controls system network

Minimal Maintenance State Implementation (1) Network infrastructure Use our 3 existing 1u Cisco 3750 to build monitoring network infrastructure No new hardware required Low-powered devices, no A/C required Estimate: ~1FTE-month to rebuild network Asset preservation Re-use 1 Cisco 6500-720 (2 yrs old in 08) Retire 2 old Cisco 6500-SUP1

Minimal Maintenance State Implementation (2) Data Acquisition System / Online Event Processing Turn off VME crates and Online Farm Asset preservation Re-use 50 Online Farm nodes (1 yr old in 08) Install in computer center building for BaBar offline processing VME crates and power supplies may be re-usable

Minimal Maintenance State Implementation (3) Detector monitoring and controls Set up 1 file server and 1 console Have file sever managed by SCCS Including automated backup of software and archive Needs small number (1-3) of 6u VME crates. Use our most modern Linux IOCs to build monitoring Keep current (already frozen) BaBar EPICS version Build small monitoring software system “From scratch”, re-using existing BaBar detector controls code Use EPICS standard archiver to record monitoring data Asset preservation Except for custom VME cards, and crates, most controls system components are too old to be redeployed in any useful way Effort: total 3 FTE-months to build, 1-2h/week to maintain

Minimal Maintenance State Implementation (4) Computing Infrastructure Turn off Asset preservation Re-use newer file servers for BaBar offline Re-use newer disk arrays (T4) for BaBar offline Most other components are to old to be redeployed

Minimal Maintenance State Implementation (5) Preservation of data Move data to file servers in SLAC computer center Will need ca. 6 TByte Convert data that is only accessible through special servers (e.g. CMLOG) to flat files for easy browsing Move Oracle server to computer center Holds e.g. electronic logbook

Summary Minimal maintenance state requirements well understood Overall implementation plan well understood Next steps: work out details and schedules