Download presentation
Presentation is loading. Please wait.
Published bySheena Rich Modified over 8 years ago
1
GLAST LAT Project7 June 2005 1 GLAST Large Area Telescope: NCR 529 Systems I&T Quality Gamma-ray Large Area Space Telescope
2
GLAST LAT Project7 June 2005 2 Agenda Goal for today –Review NCR history to be sure that everyone is on the same page –Discuss possible sources of the problem –Review configuration changes and NCR history to see if we can identify a leading candidate –Discuss possible tests –Decide on the next steps
3
GLAST LAT Project7 June 2005 3 NCR 529 Single Run History The NCR started with a TEM register test that also exercises the Tracker registers –The GTFE calibration mask readout did not match what had been written to the GTFE GTFE 23 in GTRC 8 in GTCC 2 in Bay 9 (Tracker SN 2) Specifics of this test and follow on tests are below and in the next several slides –Included single runs in several configurations –Included two different tests run over clock frequency
4
GLAST LAT Project7 June 2005 4 Single Run Errors 135002864 –Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings –Errors: 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b4a8 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b4b4b4b4b4b48 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b568 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b4b179697ad28 module aborted with reason *** errors limit reached 135002865 –Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 49 tests, 4 errors, 0 warnings –Errors: 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b169796979694 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b76968 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa5a58b4b4b4b5694 module aborted with reason *** errors limit reached 135002867 –CALIB_MASK (GTCC 2, GTRC 8, GTFE 23): loading 0xAAAAAAAAAAAAAAAAL and reading back 0xAAAAAAAAAAA01554L. 135002868 –Layer X17, Beginning test with 0/24 split:, ERROR: Reading out channel 548 from layer X17 (not included in calib mask).
5
GLAST LAT Project7 June 2005 5 Single Run Errors (continued) 135002882 –Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings –Errors: 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5b3dea6968 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa58b4b17962e2d28 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x58b262d5d2d5d2d0 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b169796970a50 module aborted with reason *** errors limit reached 135003064 –Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings –Errors: 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5a5a5a5a50 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa5a58b179697ad28 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a5a5a5a5a5a5a14 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa5a5a5a5a1013d28 module aborted with reason *** errors limit reached 135003067 –Module 'TEM [9] TCC [2] TRC [8] TFE [23]' passed 48 tests, 5 errors, 0 warnings –Errors: – 'calib_mask' ReadWrite error: written 0x5a5a5a5a5a5a5a5a, read 0x5a5bfcb5b4b5b4b4 – 'calib_mask' ReadWrite error: written 0xa5a5a5a5a5a5a5a5, read 0xa5a58b0906979694 – 'calib_mask' BcastWriteUcastRead error: broadcasted 0x5a5a5a5a5a5a5a5a, read 0x5a59b4b5b4b5b4b4 – 'calib_mask' BcastWriteUcastRead error: broadcasted 0xa5a5a5a5a5a5a5a5, read 0xa58b5edf04060a50 – module aborted with reason *** errors limit reached
6
GLAST LAT Project7 June 2005 6 Summary of Tests vs Frequency
7
GLAST LAT Project7 June 2005 7 REGISTER TESTS TEM register tests –Read / write tests are performed using multiple patterns Tests patterns for 64 bit registers are: –0x0000000000000000L –0xFFFFFFFFFFFFFFFFL –0x5A5A5A5A5A5A5A5AL –0XA5A5A5A5A5A5A5A5L Each mask register is tested two ways –Direct write followed by a read, using the 4 patterns in sequence –Broadcast write followed by read using the 4 patterns in sequence TKR GTFE CHECK SCRIPT –Writes two patterns 0xAAAAAAAAAAAAAAAA 0x55555555555555555555 –Might stop on error
8
GLAST LAT Project7 June 2005 8 Discussion of Symptoms Failures are intermittent (I believe this is consistent with the Tracker level experience with NCR 104) –Not all reads fail –Not all instances of reading the same pattern fails –The read back patterns do not show a repeatable error There appears to be some pattern sensitivity The three readouts that fail may have clock frequency sensitivity Have tested using both sides of the EM GASU Have tested with and without powering the Calorimeter Conclusion –The symptoms match what was seen on the Tracker level NCR 104 –We can’t tell if Tracker SN 3 performance has changed or whether the errors are induced by the test configuration
9
GLAST LAT Project7 June 2005 9 What Could be Wrong? Tracker GTFE sensitivity has increased –Circuit damage or degradation increased sensitivity Tracker, flight TEM/TPS and EM GASU adversely interact (i.e. this is just how this particular tracker will work in the LAT) –TEM/TPS voltage is higher, potentially increasing sensitivity Clock reaching the tracker is noisy enough to disturb the readout circuit –Termination resistance in tracker MCM incorrect –Termination resistance in flex cable incorrect –Bad cable mate in Flex cable to TEM –TEM/TPS is providing signals out of spec TPS noise feeds into clock LVDS signal distorted (e.g. one side with soft short to ground) –GASU to TEM cable introducing noise Bad mate at either end Build error in cable –EM GASU is providing signals out of spec This specific GASU output connector has a problem GASU removal and replacement introduced error Other external noise sources disturbing normal operations of the tower –EGSE power supply –Chiller running or adds ground loop –Changes to Building 33 power or grounding Software Schemas Other possibilities?
10
GLAST LAT Project7 June 2005 10 NCR History Tracker –Detailed Tracker SN 3 NCR information available in the backup charts Nothing obvious related to this issue (at least to me) –NCRs 104 and 107 document issues at a lower level of assembly NCR 104 documents the register readout issue similar to this NCR –Shows some sensitivity to duty cycle NCR 107 documents the duty cycle issue which is a different problem TEM/TPS History –NCR review complete Some worksmanship NCRs, but nothing directly relevant NCRs attached at the end of the presentation for reference –Data Package still in process –This unit had 60 operating hours
11
GLAST LAT Project7 June 2005 11 Recent Configuration Changes Changes in the configuration from 4 tower testing till now –Towers 8 and 9 installed and cabled with flight cables –Towers 4 and 5 were de-mated on the side facing towers 8 and 9, then re-mated –Installed the chiller on wings of LAT –Installed Shear plates over bays 0, 1, 4, 5.
12
GLAST LAT Project7 June 2005 12 Possibly Related NCRs Bay #NCR #Summary 5542 TkrReadingConfigurationTest, debug showed phasing errors – Likely TEM Busy configuration issue 5539 TkrDataTaking failure; failure is due to the OR-Layer being misinterpreted as too low – Likely one-shot interaction with low level Tracker test, no issue 5538TkrTreqCheck failures – currently suspect software 4532Tem register Test, Trigger Interface Controller [Mux-config(register)] failed 9529Failed register test, TKR front end portion 505CAL LPT and TKR LPT have errors when tests are run at margined clock rates 5*487(Met bay) Failed calf_exr_p01; diagnosis was 2nd online process writing to register 9*(Met bay) Failed calf_exr_p01 (6/10)
13
GLAST LAT Project7 June 2005 13 Potential Tests Set the register to a known value, then repeatedly read it out –Failure pattern may help pin down the source (if it’s 60Hz…) Connect EGSE to other GASU ports and check clock quality (or other signals) to see if the GASU meets spec Connect tower to proven GASU port to see if the port that the tower is currently connected to has a problem Connect tower to external crate to remove the EM GASU from the picture –Configuration matches earlier receiving test except for use of flight TEM/TPS Put in a breakout box, check tracker termination resistance and verify TEM/TPS outputs are as expected Based on our discussions, are there any others?
14
GLAST LAT Project7 June 2005 14 New Topics for Discussion Understand Root Cause of failure. –Known failure due to lack of rework. –Failure due to environment. –Failure due to degradation. After Root Cause is determined, need to understand what options, and resulting impacts, are available. –Is rework an option? Better access now than later? –Wait and watch? What opportunities will be missed? –Access now vs. later? –Window between 6 and 8 towers? –What is impact of use as is? MCM may still work if tested with the EM TEM and EGSE used for the TKR test. Suggested Tests: –Change GASU port (it might have done already.) –Change the cable between GASU and TEM –Change TEM –Connect TKR to EM TEM and EGSE.
15
GLAST LAT Project7 June 2005 15 New Information Question 1) The TKR cable interface to the TEM is the only feasible access point to the TKR, is there anything that can be learned (or confirmed) by making measurements at this interface? –Measuring the TKR/TEM interface could check whether there is an additional contributing problem on the cable. It does not give any visibility to the internal MCM termination resistor. Question 2) Is it likely that an MCM with the 100 ohm GCRC termination resistor would successfully make it through the TKR test regime and not fail until LAT level testing? –The fact that this MCM made it through the TKR test regime and then failed at LAT level testing may reflect a difference in the environment it sees in the LAT. If it has the wrong termination resistor, then it will be sensitive to changes in the clock and voltage. Question 3) If multiple MCMs, say 10, contain the 100 ohm GCRC termination resistor, is it likely that all make it through the TKR test regime and have only 1 fail at LAT level testing? –Yes. When this problem was first found more than a year ago, it was only showing up in less than 10% of the MCMs when testing at low temperature. Due to natural variations, some chips have more margin than others, even with the 100-ohm termination.
16
GLAST LAT Project7 June 2005 16 New Information Teledyne paperwork for rework of termination resistor does not indicate that rework was performed on MCM 11377. However, Teledyne paperwork is not trustworthy. Based on SLAC visual inspection It is likely that MCM does have the correct termination resisters. The tower worked with many EM2 EGSE successfully in SLAC and Italy, which indicates it works with majority of EM2 TEM. (It also worked in a certain ranges, DVDD=2.5-2.8 V, clock frequency=18-22 MHz, clock duty cycle=42-58%.) The tower started exhibiting the problem as soon as it is connected with a flight TEM, indicating it does not like the clock feature of this TEM. There is a chance that the tower will works with another flight TEM. Suggest installing another TEM on this TKR tower.
17
GLAST LAT Project7 June 2005 17 Backup Charts
18
GLAST LAT Project7 June 2005 18 TEM/TPS 1835 NCRs - GTC GTC NCMR Report NCMRItem DescriptionSerial #DispositionCorrective ActionDescription of NonConformance 2294TPS CCAGLAT 1774, 1775, 1776, 1778, 1779, 1780, 1781, 1782 Use-as-is CLOSED 4/15 All personnel involved in this operation to receive additional training. This includes the SLAC on site QA representative. SLAC on site QA representative will be attending a course to obtain formal certification to IPC/EIA J-STD-001C and J-STD- 001CS. IS: insufficient staking on tantalum capacitors SB: Staking material should be in contact with both endfaces of the component. 2305TPS CCAGT104 thru GT122, GLAT1774 thru GLAT1792 Rework CLOSED 4/15 Schedule a MRB meeting with SLAC and GTC to determine method of removal. pre-tin and reworking of these components. Remove and replace with properly removed Gold per J-STD- 001CS Para 5.4.1. SLAC on site QA representative will be attending a course to obtain formal certification to IPC/EIA J-STD-001C and J-STD- 001CS IS: On components Q10, Q11 and Q12 solder is grainy and cold exhibiting evidence of gold embrittlement. SB: Should exhibit a properly wetted solder joint. 2323TPS CCAALL TPSRework CLOSED 4/15 Install cable ties with properly adjusted tool or manually Per NASA-STD_8739.4 Para. 9.6.2 Stake with Hysol 0151, Add Staking requirements to drawing. Rework all assemblies per rework traveler. IS: Cable tie are trimmed below strap head. SB: Per NASA-STD-8739.4 Para 9.6.2 Cable ties should be trimmed flush at the strap head.
19
GLAST LAT Project7 June 2005 19 TEM/TPS 1835 NCRs - SLAC GTC NCMR Report Electronics NCR Report Request IDItem DescriptionSerial #DispositionCorrective ActionDescription of NonConformance 00399TPS Subassembly Eelectrical Interface Continuity and Isolation Test TPS CCA 1774, 1775, 1776, 1778 NAOPEN1.A hard copy of the RED LINE MASTER not generated and controlled under the Responsible Engineer. 2. Test Procedure is RED LINED. Page 27, 29, 31, Procedure number IS: LAT-DS-04099; SB: LAT- TD-40850 Page 33, Para. 5.2.2.1-1 Removed Chassis GND and GND, continuity measurement. Page 44, Para. 5.2.3.3-1 Removed TEM current monitor High and TEM current monitor LOW measurements. 00400TPS Subassembly SVT TPS CCA GLAT 1753NAOPEN1. A hard copy of the RED LINE MASTER not generated and controlled under the Responsible Engineer. 2. Test Procedure is RED LINED. Page 39, 42 Procedure Number IS: LAT-DS-04849 SB: LAT-TD- 04849. Page 44, Para. 5.2.2-7 External power supply limits IS: +26.5- +29.5; SB: +27- +33.
20
GLAST LAT Project7 June 2005 20 TEM/TPS 1835 NCRs – SLAC (continued) GTC NCMR Report Electronics NCR Report Request IDItem DescriptionSerial #DispositionCorrective ActionDescription of NonConformance 00506Temperature deviation during initial pump-down in TV chamber test GLAT18 32 GLAT18 35 NAOPENTemperature deviation during initial pump-down in TV chamber test. During this test thermo-controlling unit was commanded to +25C at the beginning of pump-down, however, if chiller did not get commanded after this point it can lockup and temperature will drift. In this case temperature went to -9C and flight hardware was not powered. This condition was automatically reseted by transition to +60C. As correction to this new version TV control software does active control during all pump-down period. 00507TEM/TPSGLAT18 35 Assigned(cullinan, 6/2/05) Replace locking helicoil per print. Record lot # of helicoil on W.O. 669 for s/n GLAT 1835. OK to proceed to I&T testing. Helicoil must be installed prior to integration of GLAT 1835 into Grid. Report completion of helicoil installation to this NCR. IS: Helicoil (#8-32) was removed during dissassembly of the TEM/TPS module from the T/V test plate S.B.: Not removed
21
GLAST LAT Project7 June 2005 21 Tracker NCR History NCR 104 –Description During the functional test at a temperature of -30C, the following errors occurred: –a)SN 1099, GTFE #23 sometimes did not read back correctly the calibration mask –b)SN 480, GTFE #22 sometimes did not read back correctly the data and trigger masks –c)SN 269, GTFE #23 sometimes did not read back correctly the data and trigger masks –d)SN 261, GTFEs #21, 22, 23 sometimes did not read back correctly the trigger and calibration masks No errors occurred in the same setup at ambient temperature, at +25C, or at +60C. None of the other 11 MCMs in the setup gave any errors. The test procedure specifies that an NCR should be filled out in case of a failure at any of these temperatures, but the burn-in may proceed. Hence the burn-in at 85C is presently in progress for this set of 15 MCMs. –Disposition 8/9/2004 2:19:01 PM marsh –All MCM units are to be reworked by replacing two 100 Ohm termination resistors with 75 Ohm resistors. Rework of these MCM units will be performed by Zentek. Source inspections will be performed 100% after rework, before conformal coat, and again at final inspection. Final acceptance of MCM units will be performed upon succesful retesting of reworked MCMs. 8/13/2004 8:08:31 AM marsh –NCR closure was approved by Bill Jimenez, LAT Quality Engineering, and Robert Johnson, LAT Tracker Subsystem Manager (e-mails on file). –Root Cause Root cause has been determined to be an issue of crosstalk between the register readback output and the clock signal, which caused a glitch on the clock resulting in inproper shifting of the register. –Corrective Action Corrective Action consists of changing termination resistors R41, and R44 from 100 Ohm to 75 Ohm. Revision 10 to LAT-DS-00898, and Revision 9 to LAT-DS-00899, implements the change. Teledyne?s production line will change over in the near future. Until then, all MCM?s received from Teledyne (except S/Ns 259, 260, 263, 314, 346 [short], and 1088 [Tall]) with 100 Ohm termination resistors at R41, and R44, will be sent to Zentek for rework to the current revision. –Status Closed
22
GLAST LAT Project7 June 2005 22 Tracker NCR History Attachment to NCR 104 (page 1 of 2) –NCR Supporting Document Initiator Name: Robert Johnson Found by: in-process test Type of Nonconformance: minor Discrepancy Level: flight hardware Subsystem: Tracker Item Description: MCM Drawing #/Revision #: LAT-DS-00898-9 and LAT-DS-00899-8 Supplier: Teledyne Electronic Technologies Location: Los Angeles Lot #: N/A Serial Numbers: DS-00899: 1099; DS-00898: 480, 269, and 261 Test Procedure: LAT-TD-002367-6 –Description of Nonconformance: During the functional test at a temperature of –30C, the following errors occurred: SN 1099, GTFE #23 sometimes did not read back correctly the calibration mask SN 480, GTFE #22 sometimes did not read back correctly the data and trigger masks SN 269, GTFE #23 sometimes did not read back correctly the data and trigger masks SN 261, GTFEs #21, 22, 23 sometimes did not read back correctly the trigger and calibration masks No errors occurred in the same setup at ambient temperature, at +25C, or at +60C. None of the other 11 MCMs in the setup gave any errors. The test procedure specifies that an NCR should be filled out in case of a failure at any of these temperatures, but the burn-in may proceed. Hence the burn-in at 85C is presently in progress for this set of 15 MCMs.
23
GLAST LAT Project7 June 2005 23 Tracker NCR History Attachment to NCR 104 (page 2 of 2) –Discussion: The point in the MCM design with the most limited performance margin is well known to be the transfer of register information from the GTFE chip to the GTRC chip. This transfer is done by single- ended CMOS levels on a 3-state bus. Unlike the data readout, it was not made LVDS because register readback should never be done during running (and hence there is no issue of interference with the detectors). Also, unlike the data readout it was not made left-right redundant because failure of this functional feature would not significantly impair science operations. The problem is that the 3-state bus runs the length of the MCM and has 26 drivers and receivers hanging on it. Therefore it has a large capacitance, and the driver on the GTFE chip takes quite a bit of time charging it up. This time is added to the time to send the LVDS clock from GTRC to GTFE, the time to receive the clock in the GTFE chip, and the time to pass through several layers of gates and buffers in the GTFE chip to move the data into and through the driver. If the data become stable on the bus (i.e. above the GTRC threshold) only after the arrival of the next clock edge, then the wrong bits get transferred and an error is detected. The timing margin is affected by temperature, voltage, and clock frequency. In this case the –30C temperature is far below the Tracker operational range, by at least 50 degrees. Such a test demonstrates margin in some way, although in a much less straightforward way than simply raising the clock frequency. All of these MCMs passed the tests specified in the procedure LAT-PS-01971, including a test that was simultaneously 10% high in frequency and 5% low in voltage. It is very doubtful that these MCMs should be rejected as a result of the register readback errors because –the error occurs far outside the MCM operational range and even well outside of the Tracker test range. –the mask and calibration register readback are not essential features, in case that they really do fail during operation. The fault is always detectable, because the whole point of reading back the register is only to check the results against what was loaded. If there is a fault, one can easily verify that the register was loaded properly by doing a charge injection run, in which case the data comes back through a path that was proved even in these MCMs to work at all temperatures. Nevertheless, first we need to be sure of the cause of the bit errors. It is possible that it is simply a result of bad solder connections on the cables in the test system. In fact we have had problems with that before. I recommend that following the burn-in we connect these 4 MCMs to different cables or in different locations on the same cables and repeat the test at –30C.
24
GLAST LAT Project7 June 2005 24 Tracker NCR History NCR 107 –Description These MCMs had some event-data (from charge injection) readback errors only when reading to the right-hand side and only in the +60C test (they were fine at -30C and 25C). We found that the errors from SN-366 went away if the VDD voltage is raised to 2.65V, but the errors from SN-592 did not. We tried some other relevant temperatures as follows: –Tracker Acceptance Test Limit: 35C Both function perfectly. –Tracker Qualification Limit: 50C SN-366 functions perfectly but SN-592 has errors. –Disposition Use-As-Is, all units which pass retest with 75-Ohm test cables, in addition to the duty cycle testing, as acceptable for flight use. NCR closure approved by Bill Jimenez, LAT Quality Engineering, and Robert Johnson, LAT Tracker Subsystem Manager (e-mails on file). –Root Cause MCM timing issue. Failures always occur in one of two internal memory buffers in the GTRC chips, while attempting to access them. Some chips are faster than others, and those modules that failed each have one GTRC chip that is on the tail of the distribution. In general, high temperature, high clock frequency, and high clock duty factor (> than 50%) will all contribute to failing modules with a marginal timing margin. Investigation determined that use of 100 Ohm termination resistors on the test cables, further eroded the timing margins. –Corrective Action An MRB conclusion was that replacement of the 100 Ohm termination resistors with 75 Ohm resistors, would resolve the problem. –(1) Existing Flight and Test cables are all being reworked to incorporate 75 Ohm resistors. –(2) Drawings will be updated and all new cables will incorporate 75 Ohm resistors. (3) Duty Cycle Testing will be performed on all MCMs prior to shipment to INFN. The Burn-in procedure has been modified By Dr. Johnson to include Duty Cycle Testing, and is under review, while implementation of the duty cycle testing in the burn-in script is in process by Marcus Ziegler. –(3) The ICD is being revised to specify that the tracker function at 20MHz over the duty cycle range of 45% to 55%, by Richard Bright. –(4) Implementation of a test of this part of the ICD, in the Tracker test plan, is in process by Hiro Tajima –Status Closed
25
GLAST LAT Project7 June 2005 25 Tracker NCR History Attachment to NCR 107 (page 1 of 3)
26
GLAST LAT Project7 June 2005 26 Tracker NCR History Attachment to NCR 107 (page 2 of 3)
27
GLAST LAT Project7 June 2005 27 Tracker NCR History Attachment to NCR 107 (page 3 of 3)
28
GLAST LAT Project7 June 2005 28 Tracker SN 3 NCR History
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.