ALMA Integrated Computing Team Coordination & Planning Meeting #3 Socorro, June 2014 Control Group Planning Rafael Hiriart
ICT-CPM June 2014 Ralph Marson (general control development, observing modes, mount software) Rachel Rosen (data capturer, observing modes) Patrick Brandt (total power processor, HW devices) Jorge Avarias (secondary task after Scheduling, data capturer) Rodrigo Amestica (CDP correlator software) Jesus Perez (CCC correlator software) Matias Mora (ALMA Phasing Project) Rafael Hiriart (group management, tools) Control Group Resources
ICT-CPM June 2014 Status since last ICPM2 (January 2014, Δt ~ 4.5 months), 91 tickets implemented: 27 new features, 17 improvements, and 47 bugs Status and prioritization is done weekly in our CSCG meetings, with active participation from SoftOps, Operations and EOC. Engineering participation has been minimal. Do we need more? Bugs. Are we doing better? What can be done to reduce the support load? Can we improve testing? Features & Improvements. Summary of what has been done, and proposed plan for the next 6 months. Some special topics. Outline
ICT-CPM June 2014 Bugs submitted since ICPM2 ~ 5.6 bugs / week SoftOps on the other hand receives ~9 software related bugs per day these days. Load peaks after a new release and stabilizes after some months. 102 bugs submitted in 4.5 months
ICT-CPM June 2014 Bugs submitted since 2013/01/01 ~150 bugs / 4 months <100 bugs / 4 months We seem to be doing better, although we should see the next period to see if there’s really a trend here. It could be also that we were just “catching-up” after the strike. The strike
ICT-CPM June 2014 Bugs fixed since ICPM2 47 bugs fixed in 4.5 weeks = ~2.6 bugs fixed/week.
ICT-CPM June 2014 Bug input rate is still rather high. It's close to our “all ticket (bugs + features) output” rate. On the other hand, not all bugs submitted end up being new problems. Some are rejected as HW or configuration problems, or as duplicated. Some are also one-off, hard to reproduce problems. What can be done? Decrease the amount of bugs being introduced by more rigorous testing/coding. Facilitate the task of (correctly) diagnosing problems during operations. Add robustness to the software (suppressing faults before they become failures). Empower SoftOps to diagnose problems locally. Other ideas? About bugs
ICT-CPM June 2014 Testing/coding Integration testing in CONTROL is fairly complete. We execute several observations with Control and Correlator each night (CONTROL/IntTest). We plan to continue improving this simulation. We need to improve/complete our unit tests. Recently Alexis has normalized our test STEs and implemented an automatic test framework based on Jenkins. The plan is to deploy this in Chile as well, and complete tests. We want to start with code reviews, but haven’t done a lot in this direction during this period, due to pressure to implement features and fix bugs. Facilitate diagnosing problems We need to improve the way errors are reported in our user interfaces. We are relying too much in the logs. The yellow triangle should come with an indicative error message, for example. User interfaces to show the status of the system should be completed/improved. ACD permanent error should be investigated and discarded if possible. The alarm configuration & pending features should be completed. Tiger team? What can be done about bugs (1)
ICT-CPM June 2014 Add robustness to the software. Implement ICT-719: Make the observation resilient over antenna container crashes. Make the observation resilient in case of cartridges in stop state. LS and WCA lock failures should propagate exceptions to obs. scripts. Make the fraction of antennas allowed to fail configurable. Make the observation resilient in case of cartridges not powered up. Make the Control software able to tolerate the physical disconnection of antennas. What can be done about bugs (2)
ICT-CPM June 2014 Empowering SoftOps. We could improve the documentation of key areas. Recently we have been discussing where the line should be drawn regarding how deep tickets should be investigated by SoftOps. I believe we have agreed on: SoftOps works at the integration level. It should investigate a problem until it becomes clear the group (ACS, Control, TelCal, etc.) that should follow up the issue. After this, SoftOps provides local support on the investigation. This is non-expert support. SoftOps should be “generalists”, i.e., not experts on specific subsystems. There’s not enough resources for SoftOps to become experts on all the system areas. Can we help in other ways? What can be done about bugs (3)
ICT-CPM June new features, 17 improvements, including: Fast scanning Sub-arrays New QuickLook GUI APP phasing loop & H/W control TPP improvements Performance improvements ACA-specific delays in the TMCDB Two main sources of requirements for long term planning Stuartt/Denis/Neil important features list from January. Observing Modes Meeting, which prioritized EOC activities for Cycle 3. Features
ICT-CPM June 2014 PriorityDescriptionStatus HighOptimize instantiation/deactivation of observing modeDone Artificial beaconDe-scoped? Focus/delay updated with temperature valuesFocus done Fast scanningDone Safe parking of offline WCAs for spurious signalsDone MediumMake optimization targets baseband dependentNot done Single dish sideband separationNeed EOC work first Sharing arrays/subarrays (ACA then BLC in priority order)Not done Second generation scan sequencesNot done Fast frequency switchingNot done, we can do the LS pre-tuning workaround (~ 2 weeks) LowFrequency switchingBinning? We need requirements. Decouple CASADone. Important Features from January Control:
ICT-CPM June 2014 PriorityDescriptionStatus LowForward look for functions about elevationNot done. (1 week) Replace most target utility separation/direction calls with Control functions. Not done. (2-4 weeks.) Very lowActual velocity instead of distance change for Ephemeris.Not done. (2 weeks.) Nutator (assuming fast scanning works).Not done. Status? Priority for focus subarrays or subscan sequences.What was this? Important Features (2) PriorityDescriptionStatus HighSubarraysDone! (hopefully) Medium90 degrees WF switchingAfter subarrays. Flagging/pegging the high edge channels.Not done. Low3x3, 4x4 and double Nyquist modes (exotic correlator modes)Not done. Multi-resolution modes.Not done. Correlator:
ICT-CPM June 2014 Important Features (3) PriorityDescriptionStatus WVR bandwidth, center frequency and coupling efficiencyNot done. Uncertainties in model parameters.Not done. PriorityDescriptionStatus DiffGainCal intent.Not done. WVR specific information.Not done. TMCDB: DC/ASDM:
ICT-CPM June 2014 Observing Modes Meeting (1)
ICT-CPM June 2014 Observing Modes Meeting (2) Long baselines. If problems show up during testing campaigns, they will probably be high priority. Implement QA0 flags, required for the pipeline. Document the executed SchedBlock into the ASDM. No actions about Solar Observing for the moment.
ICT-CPM June 2014 The plan assumes 50% support, so new features assume 3 months development time. Ralph. Error handling & reporting. Observing Modes support. Shadowing flagging. LS pre-tuning (pre-tune expires after 10 minutes). No Nutator, no artificial source, no scan seq. 2, no dynamic sub- arrays. Rachel DataCapturer support. WVR parameters in the ASDM. SchedBlock into the ASDM. Proposed Plan for Next 6 Months
ICT-CPM June 2014 Patrick Operational confidence (OMC reloaded). Alarms. TotalPowerProcessor, FrontEnd and HW devices support. No porting to 64-bits, it will be done during the first semester of Jorge Scheduling Matias APP Proposed Plan (2)
ICT-CPM June 2014 Rafael Management. QuickLook improvements, including QA0 flags into ASDM (1m). TMCDB explorer split, so ACS can take over the SW deployment side (2m). No other tool improvements. Proposed Plan (3)
ICT-CPM June 2014 J and Rodrigo Sub-arrays support. 90 degrees WF (1m). Flagging/pegging high edge channels (ICT-2284) (2w). 3-bit quantization correction (1m). No exotic correlator modes (1m). No multi-resolution (2m). No LO offsetting sideband separation (2m). No porting to 64 bits (deferred for first semester 2015) (3m). Proposed Plan (4)
ICT-CPM June 2014 SSR Feature Development. New development done in a separate branch and integrated during phase C testing. It's not going through phase A/B testing. Tools & Control's scope. Discussed in a separate presentation. 64-bit porting. To be completed during the first semester of Moving casac to ICD. Are we done? Deployment issues? ACA requests coming? For example, sending SQLD data to the ACA. No official request has come, this is not in our plan for the next 6 months. Correlator data rate and 32/16 bit scaling. Is this issue well understood? Any action for us? Acceptance plan for Cycle 3. What release will be used? ICT-1805, “what NRAO telescopes are observing right now”. Path forward? Special Topics