ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, 17-19 June 2014 Control Group Planning Rafael Hiriart.

Slides:



Advertisements
Similar presentations
SERVICE MANAGER 9.2 PROBLEM MANAGEMENT TRAINING JUNE 2011.
Advertisements

Systems Analysis and Design in a Changing World, 6th Edition
Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.
Release & Deployment ITIL Version 3
Computer System Lifecycle Chapter 1. Introduction Computer System users, administrators, and designers are all interested in performance evaluation. Whether.
Software faults & reliability Presented by: Presented by: Pooja Jain Pooja Jain.
CCSM Software Engineering Coordination Plan Tony Craig SEWG Meeting Feb 14-15, 2002 NCAR.
CPIS 357 Software Quality & Testing
The ALMA Common Software: a developer friendly CORBA-based framework G.Chiozzi d, B.Jeram a, H.Sommer a, A.Caproni e, M.Pesko bc, M.Sekoranja b, K.Zagar.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
50mm Telescope ACS Course Garching, 15 th to 19 th January 2007 January 2007Garching.
ALMA Integrated Computing Team Coordination & Planning Meeting #2 Santiago, January 2014 ASDM relational database Rafael Hiriart / Jorge Avarias.
Cycle-3 Capabilities and the OT Andy Biggs ALMA Regional Centre, ESO.
TelCal Phasing Engine description Draft Robert Lucas
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
ALMA Integrated Computing Team Coordination & Planning Meeting #2 Santiago, January 2014 Control Group Planning Rafael Hiriart, Control Group Lead.
The ALMA Software and Release Management Ruben Soto Software Operations Group & Release Manager Joint ALMA Observatory.
ALMA Software B.E. Glendenning (NRAO). 2 ALMA “High Frequency VLA” in Chile Presently a European/North American Project –Japan is almost certainly joining.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Relational APDM & Relational ASDM models effort done in online.
The ALMA TelCal subsystem Dominique Broguière, Institut de RadioAstronomie Millimétrique (IRAM) TelCal Phasing meeting – Grenoble -10/12/2012.
Overall Data Processing Architecture Review EVLA Monitor and Control Interfaces July , 2002EVLA Data Processing PDR Bill Sahr.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 ICT Group Planning: Control Rafael Hiriart ICT Control Group.
Clever Framework Name That Doesn’t Violate Copyright Laws MARCH 27, 2015.
Chapter 13: Regression Testing Omar Meqdadi SE 3860 Lecture 13 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
Scheduling Blocks: a generic description Andy Biggs (ESO, Garching)
JVLA capabilities to be offered for semester 2013A Claire Chandler.
ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, June 2014 ACA planning Manabu Watanabe, George Kosugi NAOJ.
QUALITY ASSURANCE PRACTICES. Quality Plan Prepared and approved at the beginning of project Soft filing system approach followed. Filing location – –
Pre-OTS Testing in Penticton Sonja Vrcic Socorro, December 11, 2007.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 ACA plan Manabu Watanabe National Astronomical Observatory.
ALMA Integrated Computing Team Coordination & Planning Meeting #4 Santiago, November 2014 Telescope Calibration Planning Dominique Broguière.
The System and Software Development Process Instructor: Dr. Hany H. Ammar Dept. of Computer Science and Electrical Engineering, WVU.
ICALEPCS’ GenevaACS in ALMA1 Allen Farris National Radio Astronomy Observatory Lead, ALMA Control System.
Software Phase V Testing and Improvements to Test Procedures S. Corder and L.-A. Nyman April 18, 20131ICT Planning Meeting, Santiago.
Observing Modes from a Software viewpoint Robert Lucas and Philippe Salomé (SSR)
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 Telescope Calibration Planning Dominique Broguiere.
Analysis trains – Status & experience from operation Mihaela Gheata.
ALMA Integrated Computing Team ICT Coordination and Planning Meeting #2 Santiago January 2014 Alarm system A.Caproni.
Atacama Large Millimeter/submillimeter Array Expanded Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array Data Processing Progress.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays - done for rsrv in 3.14 Channel access priorities - planned to.
14 June, 2004 EVLA Overall Design Subsystems II Tom Morgan 1 EVLA Overall Software Design Final Internal Review Subsystems II by Tom Morgan.
ES Slowdown, Optimization, Testing. Plan for shutdown: Timeline April: Focus on resolution of major outstanding issues: – Bulk data deployment  stable.
Configuration Mapper Sonja Vrcic Socorro,
ICALEPCS 2005 Geneva, Oct. 12 The ALMA Telescope Control SystemA. Farris The ALMA Telescope Control System Allen Farris Ralph Marson Jeff Kern National.
EPICS Release 3.15 Bob Dalesio May 19, Features for 3.15 Support for large arrays Channel access priorities Portable server replacement of rsrv.
ALMA Integrated Computing Team Coordination & Planning Meeting #1 Santiago, April 2013 ICT Group planning: Scheduling Jorge Avarias ICT Scheduling.
Correlator GUI Sonja Vrcic Socorro, April 3, 2006.
Advances In Software Inspection
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
Scheduling Blocks: simulating their execution Andy Biggs (ESO, Garching)
ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, June 2014 Online system tools and Control's scope expansion Rafael Hiriart.
Chapter 3 System Buses.  Hardwired systems are inflexible  General purpose hardware can do different tasks, given correct control signals  Instead.
Software Requirements for the Testing of Prototype Correlator Sonja Vrcic Socorro, December 11, 2007.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
Master Correlator Control Computer (MCCC) Requirements & Status Sonja Vrcic Socorro, December 12, 2007.
Scenario use cases Szymon Mueller PSNC. Agenda 1.General description of experiment use case. 2.Detailed description of use cases: 1.Preparation for observation.
Atacama Large Millimeter/submillimeter Array Karl G. Jansky Very Large Array Robert C. Byrd Green Bank Telescope Very Long Baseline Array ALMA Correlator.
ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, June 2014 Observation with ACA correlator for Cycle3 Manabu Watanabe NAOJ.
Jeff Kern NRAO/ALMA.  Scaling and Complexity ◦ SKA is not just a bigger version of existing systems  Higher Expectations  End to End Systems  Archive.
Benefits of a Virtual SIL
Introduction to CAST Technical Support
Software Engineering (CSI 321)
Simulation Requirements
EEC 688/788 Secure and Dependable Computing
DBA Coordination Group Update José Parra
Introduction to CAST Technical Support
EEC 688/788 Secure and Dependable Computing
Gustaaf van Moorsel September 9, 2003
EEC 688/788 Secure and Dependable Computing
Presentation transcript:

ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, June 2014 Control Group Planning Rafael Hiriart

ICT-CPM June 2014 Ralph Marson (general control development, observing modes, mount software) Rachel Rosen (data capturer, observing modes) Patrick Brandt (total power processor, HW devices) Jorge Avarias (secondary task after Scheduling, data capturer) Rodrigo Amestica (CDP correlator software) Jesus Perez (CCC correlator software) Matias Mora (ALMA Phasing Project) Rafael Hiriart (group management, tools) Control Group Resources

ICT-CPM June 2014 Status since last ICPM2 (January 2014, Δt ~ 4.5 months), 91 tickets implemented: 27 new features, 17 improvements, and 47 bugs Status and prioritization is done weekly in our CSCG meetings, with active participation from SoftOps, Operations and EOC. Engineering participation has been minimal. Do we need more? Bugs. Are we doing better? What can be done to reduce the support load? Can we improve testing? Features & Improvements. Summary of what has been done, and proposed plan for the next 6 months. Some special topics. Outline

ICT-CPM June 2014 Bugs submitted since ICPM2 ~ 5.6 bugs / week SoftOps on the other hand receives ~9 software related bugs per day these days. Load peaks after a new release and stabilizes after some months. 102 bugs submitted in 4.5 months

ICT-CPM June 2014 Bugs submitted since 2013/01/01 ~150 bugs / 4 months <100 bugs / 4 months We seem to be doing better, although we should see the next period to see if there’s really a trend here. It could be also that we were just “catching-up” after the strike. The strike

ICT-CPM June 2014 Bugs fixed since ICPM2 47 bugs fixed in 4.5 weeks = ~2.6 bugs fixed/week.

ICT-CPM June 2014 Bug input rate is still rather high. It's close to our “all ticket (bugs + features) output” rate. On the other hand, not all bugs submitted end up being new problems. Some are rejected as HW or configuration problems, or as duplicated. Some are also one-off, hard to reproduce problems. What can be done?  Decrease the amount of bugs being introduced by more rigorous testing/coding.  Facilitate the task of (correctly) diagnosing problems during operations.  Add robustness to the software (suppressing faults before they become failures).  Empower SoftOps to diagnose problems locally.  Other ideas? About bugs

ICT-CPM June 2014 Testing/coding  Integration testing in CONTROL is fairly complete. We execute several observations with Control and Correlator each night (CONTROL/IntTest). We plan to continue improving this simulation. We need to improve/complete our unit tests.  Recently Alexis has normalized our test STEs and implemented an automatic test framework based on Jenkins. The plan is to deploy this in Chile as well, and complete tests.  We want to start with code reviews, but haven’t done a lot in this direction during this period, due to pressure to implement features and fix bugs. Facilitate diagnosing problems  We need to improve the way errors are reported in our user interfaces. We are relying too much in the logs. The yellow triangle should come with an indicative error message, for example.  User interfaces to show the status of the system should be completed/improved.  ACD permanent error should be investigated and discarded if possible.  The alarm configuration & pending features should be completed.  Tiger team? What can be done about bugs (1)

ICT-CPM June 2014 Add robustness to the software. Implement ICT-719:  Make the observation resilient over antenna container crashes.  Make the observation resilient in case of cartridges in stop state.  LS and WCA lock failures should propagate exceptions to obs. scripts.  Make the fraction of antennas allowed to fail configurable.  Make the observation resilient in case of cartridges not powered up.  Make the Control software able to tolerate the physical disconnection of antennas. What can be done about bugs (2)

ICT-CPM June 2014 Empowering SoftOps.  We could improve the documentation of key areas.  Recently we have been discussing where the line should be drawn regarding how deep tickets should be investigated by SoftOps. I believe we have agreed on: SoftOps works at the integration level. It should investigate a problem until it becomes clear the group (ACS, Control, TelCal, etc.) that should follow up the issue. After this, SoftOps provides local support on the investigation. This is non-expert support. SoftOps should be “generalists”, i.e., not experts on specific subsystems. There’s not enough resources for SoftOps to become experts on all the system areas.  Can we help in other ways? What can be done about bugs (3)

ICT-CPM June new features, 17 improvements, including:  Fast scanning  Sub-arrays  New QuickLook GUI  APP phasing loop & H/W control  TPP improvements  Performance improvements  ACA-specific delays in the TMCDB Two main sources of requirements for long term planning  Stuartt/Denis/Neil important features list from January.  Observing Modes Meeting, which prioritized EOC activities for Cycle 3. Features

ICT-CPM June 2014 PriorityDescriptionStatus HighOptimize instantiation/deactivation of observing modeDone Artificial beaconDe-scoped? Focus/delay updated with temperature valuesFocus done Fast scanningDone Safe parking of offline WCAs for spurious signalsDone MediumMake optimization targets baseband dependentNot done Single dish sideband separationNeed EOC work first Sharing arrays/subarrays (ACA then BLC in priority order)Not done Second generation scan sequencesNot done Fast frequency switchingNot done, we can do the LS pre-tuning workaround (~ 2 weeks) LowFrequency switchingBinning? We need requirements. Decouple CASADone. Important Features from January Control:

ICT-CPM June 2014 PriorityDescriptionStatus LowForward look for functions about elevationNot done. (1 week) Replace most target utility separation/direction calls with Control functions. Not done. (2-4 weeks.) Very lowActual velocity instead of distance change for Ephemeris.Not done. (2 weeks.) Nutator (assuming fast scanning works).Not done. Status? Priority for focus subarrays or subscan sequences.What was this? Important Features (2) PriorityDescriptionStatus HighSubarraysDone! (hopefully) Medium90 degrees WF switchingAfter subarrays. Flagging/pegging the high edge channels.Not done. Low3x3, 4x4 and double Nyquist modes (exotic correlator modes)Not done. Multi-resolution modes.Not done. Correlator:

ICT-CPM June 2014 Important Features (3) PriorityDescriptionStatus WVR bandwidth, center frequency and coupling efficiencyNot done. Uncertainties in model parameters.Not done. PriorityDescriptionStatus DiffGainCal intent.Not done. WVR specific information.Not done. TMCDB: DC/ASDM:

ICT-CPM June 2014 Observing Modes Meeting (1)

ICT-CPM June 2014 Observing Modes Meeting (2) Long baselines. If problems show up during testing campaigns, they will probably be high priority. Implement QA0 flags, required for the pipeline. Document the executed SchedBlock into the ASDM. No actions about Solar Observing for the moment.

ICT-CPM June 2014 The plan assumes 50% support, so new features assume 3 months development time. Ralph.  Error handling & reporting.  Observing Modes support.  Shadowing flagging.  LS pre-tuning (pre-tune expires after 10 minutes).  No Nutator, no artificial source, no scan seq. 2, no dynamic sub- arrays. Rachel  DataCapturer support.  WVR parameters in the ASDM.  SchedBlock into the ASDM. Proposed Plan for Next 6 Months

ICT-CPM June 2014 Patrick  Operational confidence (OMC reloaded).  Alarms.  TotalPowerProcessor, FrontEnd and HW devices support.  No porting to 64-bits, it will be done during the first semester of Jorge  Scheduling Matias  APP Proposed Plan (2)

ICT-CPM June 2014 Rafael  Management.  QuickLook improvements, including QA0 flags into ASDM (1m).  TMCDB explorer split, so ACS can take over the SW deployment side (2m).  No other tool improvements. Proposed Plan (3)

ICT-CPM June 2014 J and Rodrigo  Sub-arrays support.  90 degrees WF (1m).  Flagging/pegging high edge channels (ICT-2284) (2w).  3-bit quantization correction (1m).  No exotic correlator modes (1m).  No multi-resolution (2m).  No LO offsetting sideband separation (2m).  No porting to 64 bits (deferred for first semester 2015) (3m). Proposed Plan (4)

ICT-CPM June 2014 SSR Feature Development.  New development done in a separate branch and integrated during phase C testing. It's not going through phase A/B testing. Tools & Control's scope. Discussed in a separate presentation. 64-bit porting. To be completed during the first semester of Moving casac to ICD. Are we done? Deployment issues? ACA requests coming? For example, sending SQLD data to the ACA. No official request has come, this is not in our plan for the next 6 months. Correlator data rate and 32/16 bit scaling. Is this issue well understood? Any action for us? Acceptance plan for Cycle 3.  What release will be used? ICT-1805, “what NRAO telescopes are observing right now”.  Path forward? Special Topics