STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY

Slides:



Advertisements
Similar presentations
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
Advertisements

5.3 Installing the blade server in a BladeCenter unit.
NERC Lessons Learned Summary December NERC lessons learned published in December 2014 Three NERC lessons learned (LL) were published in December.
Relex Reliability Software “the intuitive solution
Tool removed during cycle Fault #2 Conditions for setting Tool cocked prox switch goes open during cycle AND force on load cell drops below limit in fault.
LAPP electronics developments Jean Jacquemier, Yannis Karyotakis, Jean-Marc Nappa,, Jean Tassan, Sébastien Vilalte. CLIC WS 12-16/10/2009.
EES: Burn – in test Eliminate infant mortality: Not possible to detect it with AOI, FPT, X-ray or ICT! 1.
Preventative Heatstroke Cooling Shelter: ASHRAE Design Competition Andres Gomez Errick Santana Orangel Velazquez EML 4551 Ethics and Design Project Organization.
Reliability Engineering for Medical Devices Richard C. Fries Manager, Reliability Engineering Datex-Ohmeda Madison, Wisconsin.
Enhancing Reliability through HALT.  Introduction to HALT  Benefits of HALT  The HALT process.
Reliability Extending the Quality Concept. Kim Pries ASQ  CQA  CQE  CSSBB  CRE APICS  CPIM Director of Product Integrity & Reliability for Stoneridge.
1 Highly Accelerated Life Test (HALT) Wayne Bradley 8 April 2014.
ENVIRONMENTAL TESTING OVERVIEW QUALIFICATION TESTS MIL-STD-810 F PRODUCTION SCREENING HIGH ACCELERATED TESTS (HALT & HASS) ENVIRONMENTAL STRESSES SCREENING.
AccuMax Multi-Point Injection Mechanics
Usage Guidelines Throughout this template you will find tip-boxes to the left of the slides. To remove this box from the final presentation, simply click.
1 Product Reliability Chris Nabavi BSc SMIEEE © 2006 PCE Systems Ltd.
KM3Net Project 1.
Paolo Musico on behalf of KM3NeT collaboration The Central Logic Board for the KM3NeT detector: design and production Abstract The KM3NeT deep sea neutrino.
Example STP runs on bridges and switches that are 802.1D-compliant. There are different flavors of STP, but 802.1D is the most popular and widely implemented.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
August 06, 2014CLBv2, Vidyo Peter Jansweijer Nikhef Amsterdam Electronics- Technology KM3NeT CLBv2 1.
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan The current qualification and verification.
 High-Availability Cluster with Linux-HA Matt Varnell Cameron Adkins Jeremy Landes.
September 11-12, 2013KM3NeT, CLBv2 Workshop Valencia Peter Jansweijer Nikhef Amsterdam Electronics- Technology Shore station brainstorm 1.
California Integrated Waste Management Board Update On Long-Term Postclosure Maintenance And Corrective Action Financial Assurances Activities Permitting.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Design Team : Advisor: Dr. Edwin Project Web Site: Client: Paul
Reliability Applied to KM3NET
DOM Electronic Reliability Progress Report S.Colonges 09/04/2014.
18 Copyright © 2004, Oracle. All rights reserved. Backup and Recovery Concepts.
GAN: remote operation of accelerator diagnosis systems Matthias Werner, DESY MDI.
Diesel Building Fire Damper Failing Closed Mike Walker TVA.
SmartMQn Motor Protective Functions Ken Jannotta Jr. Horner APG, LLC.
January 28-30, 2014KM3NeT, Electronics Workshop A‘dam Peter Jansweijer Nikhef Amsterdam Electronics- Technology KM3NeT CLBv2 1.
Receiving 91 PB this week: A little bit more of work load! (We will be for a while production guys :) Test bench separated in two parts: 1.- The PB will.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Chapter Objectives After completing this chapter you will be able to: Describe in detail the following Local Area Network (LAN) technologies: Ethernet.
29 Oct, 2014 IFIC (CSIC – Universidad de Valencia) CLB: Current status and development.
CLB meeting APC Tentative action/work list.
 Software reliability is the probability that software will work properly in a specified environment and for a given amount of time. Using the following.
Network types Point-to-Point (Direct) Connection Dedicated circuit boards connected by cable; To transfer data from A to B: – A writes on its circuit board;
Stracener_EMIS 7305/5305_Spr08_ Systems Reliability Growth Planning and Data Analysis Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering.
Off & On Shore Electronics overview KM3Net APC Paris 05 / 09 / 2012 Frédéric LOUIS.
High Voltage Engineering
Do-more Technical Training
Operation With Small Batteries
Requirements Validation – II
Realising the SMP 1. Safe Machine Parameters Overview
Fides and RAMS training Km3Net
Software Reliability Definition: The probability of failure-free operation of the software for a specified period of time in a specified environment.
Kostas Manolopoulos Tasos Belias
Operation With Small Batteries
Importance of Coolant for your Car
T-6B On-Board Oxygen Generation System (OBOGS)
Software testing strategies 2
Thursday’s Lecture Chemistry Building Musspratt Lecture Theatre,
Critical Systems Validation
J1879 Robustness Validation Hand Book A Joint SAE, ZVEI, JSAE, AEC Automotive Electronics Robustness Validation Plan Robustness Diagram Trends and Challenges.
MIMO performance Test Methodology proposal
Value Proposition Value Engineering Through Design Optimization and source Substitution Effective SPC Deployment ESD Safe Storage and Handling Full Component/Board/System.
HALT: Notebook Computer Case Study
Digital Chart Recorder Operation
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
YuankaiGao,XiaogangLi
Md. Mojahidul Islam Lecturer Dept. of Computer Science & Engineering
RELIABILITY Reliability is -
Virtual LAN (VLAN).
T-6B On-Board Oxygen Generation System (OBOGS)
Technical Training.
Presentation transcript:

STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY Diego Real Máñez Time Calibration Workshop (IFIC) 16-05-2017 1

Phase II- White Rabbit Switch on DU Base (BASIC) Time (-) and data (-) together ON- SHORE STATION 2 x WRS (Cross configured) OFF-SHORE STATION Total DU Base 18 ports DOMs + 2 ports (timing + data: 4 colours) + 1 CLB = 21 Total ports available: 2 x 18 = 36 Time Calibration Workshop 16-05-2017 2

Phase II- White Rabbit Switch (REDUNDANCY) Time and data together ON-SHORE STATION Under study: Possibility to use redundancy with simple splitter. One CLB connected to two ports. Use MDIO Control Register that could disable the tx of the port. (8-7 ports available for redundancy. In total 15 DOMs) OFF-SHORE STATION Total DUBase ports= 18 ports DOMs + 2 ports (timing + data: 4 colours) + 1 CLB = 21 Total ports available: 2 x 18 = 36. For redundancy = 15 Cover 83 % Time Calibration Workshop 16-05-2017 3

Phase II- White Rabbit Switch (TIMING SEPARATED) Timing (To WR Switch on the on-shore station) Data (To non-WR Switch on the on-shore station …. Timing (To WR Switch on the on-shore station) Data (To non-WR Switch on the on-shore station …. Total DU Base 18 ports DOMs + 2 ports timing + 2 ports data + 1 CLB (8 colours, difficult to achieve )= 23 Free for redundancy: 13 ports ready for redundancy Cover 72% Time Calibration Workshop 16-05-2017 4

Phase II- White Rabbit Switch PORT FAN CARRIER AC/DC power supply SCB 18 operational ports. SCB (20 ports routed ports but only available 18 ports) Fan, AC/DC, and mechanics not needed for KM3NeT (own mechanics and cooling) A carrier of small form factor with 20 ports being developed for CHROMIUM project (It can be used in KM3NeT) Time Calibration Workshop 16-05-2017 5

Phase II- WRS RELIABILITY - MTBF Informal data from the WR list: 1.- Failure of an operational switch due to the power supply failure 2.- GSI they experienced some problems with fans Fans are problematic since most generic types have an MTBF of 3 to 5 years at room temperature and much less at elevated temperatures. 2,1: power supply failure, a couple of times and fans failure, the latter failure in at least 9 switches. These switches were working in racks in stable room temperature (21 Celsius). 3.- have problems with an very old switch (3.3 with a small FPGA) also due to the power supply. 4.- At GSI and at Nikhef, they got EEPROMs corrupted on some SFPs. It was not discovered whether the problem was caused by the switch or SFP itself. WR Hardware for KM3NeT: SCB + CARRIER → Qualification: FIDES + HALT / HASS Time Calibration Workshop 16-05-2017 6

RELIABILITY - FIDES FIDES FIDES is based on reliability engineering Latest updated handbook (based on more 500 billions hours functioning data) Consider environment, quality factors and processes Less pessimistic (but be careful to not be optimistic!): Provides theoretical FIT (or MTBF) & Weak points for improvement Time Calibration Workshop 16-05-2017 7

Reliability: HALT Highly Accelerated Life Test: Design test used to improve the robustness/reliability of a product through test-fail-fix process where applied stresses are beyond the specified operating limits. Performing HALT HALT testing is normally performed in a HALT Environmental chamber, a chamber that can simultaneously provide temperature control and vibration to the device under test. It must be possible to apply incremental increases (and decreases) in temperature and vibration to levels in excess of those specified for normal product operation. During testing, it is essential to exercise product operation and ensure functionality. Test setups should be optimized to maximize functional test coverage. The test setup should also allow for remote operation of the test and product from outside of the environmental chamber. Time Calibration Workshop 16-05-2017 8

Reliability: HALT Time Calibration Workshop 16-05-2017 9

Reliability: HALT Time Calibration Workshop 16-05-2017 10

Reliability: HALT Time Calibration Workshop 16-05-2017 11

Reliability: HALT Time Calibration Workshop 16-05-2017 12

Reliability: HALT Considerations (Stress Application Ordering): The ordering in which stresses are applied is governed by their likelihood of precipitating catastrophic failures. The following order is recommended: ƒ Decreasing temperature (Phase I to CLB+PB) ƒ Increasing temperature (Phase I to CLB+PB) ƒ Increasing vibration ƒ Minimum Sample Size - Multiple samples (at least 2 units) Time Calibration Workshop 16-05-2017 13

Reliability: HALT Example of temperature profile Time Calibration Workshop 16-05-2017 14

Reliability: HALT Testing Beyond the Failure Point The HALT test should not stop when a failure is encountered. If possible, the failure mode should be analyzed and fixed to allow the test to continue beyond the stress level at which the failure occurred. If a fix is not possible then testing should allow for and accommodate the known failure mode during further testing. Time Calibration Workshop 16-05-2017 15

Reliability: HALT Recording Failures: For each failure identified during testing, the following information should be recorded • Failure point • Failure description • Root cause of failure mode • Type of failure (catastrophic or recoverable) • Class of failure (generic or non-generic) Time Calibration Workshop 16-05-2017 16

Reliability: HALT / HASS in KM3NeT HALT for KM3NeT electronics Highly Accelerated Life Test Find a HALT (climatic) chamber and apply to any board (WRS, WRS CARRIER, and other boards KM3NeT) the HALT procedure To define a document with the HALT procedure for the KM3NeT electronics. HASS: Infant Mortality removal Highly Accelerated Stress Screening Procedure already defined for KM3NeT KM3NeT_ELEC_PRR_2014_003_Burn_In_Test.docx https://drive.google.com/open?id=0B2KaXS_rxnigTEdVVDJONWlXSjg Time Calibration Workshop 16-05-2017 17

Reliability: HALT for WRS: First draft SCB board HALT, together with the carrier (only temperature) Time Calibration Workshop 16-05-2017 18

Reliability: HALT for WRS: First proposal SCB board HALT, together with the carrier – starting with the WRS carrier and to continue with the CROMIUM carrier (only temperature) 1.- 10 ports on, 9 CLBs or similar and 1 port up link (rate around 200 Mbits) 2.- Measure power consumption 3.- Connect to the configuration port of the SCB 4.- Starting at 25ºC, increases of 5ºC per step (10 degrees per minute or the maximum of the climatic chamber), 5.- After each step check everything is working as expected 6.- Perform a reset of the SCB 7.- Recheck everything 8.- Continue increasing until failure 9.- If possible to fix failure and continue increasing temperature 4bis.- Starting at 25ºC decrease 10ºC per step, 10 minutes Ordering 3 SCB & carriers for starting the tests. Time Calibration Workshop 16-05-2017 19

Phase II- DELL Switch DELL has provided partial information about the reliability of the boards DELL also would be able to lend a S3124F for a month To be sent it to Granada for tests Time Calibration Workshop 16-05-2017 20