ICALEPCS’2005 - Geneva The ALMA Computing Project Update and Management Approach Brian Glendenning (1) Gianni Raffi (2)

1 ICALEPCS’2005 - Geneva The ALMA Computing Project Update and Management Approach Brian Glendenning (1) Gianni Raffi (2) (1) National Radio Astronomy Observatory (NRAO), Socorro, NM, USA (2) European Southern Observatory (ESO), Munich, Germany

2 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA partner organizations

3 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Project in Summary 64 x 12m antennas, 30-950 GHz => Reality check: 50 antennas proposed for the time being Array configurations:150 m-14 Km Near S. Pedro de Atacama, Chile at 5000 m EU and North America as equal partners  Japan will add Compact Array: 12 x 7m + 4 x 12m antennas and extra correlator, receivers 2 prototype antennas (in Socorro, NM) Construction phase 2003-2011 Early Science foreseen for 2009

4 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Antenna Configurations

5 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Computing requirements Control of antennas and receivers Correlator control/ data acquisition (input: 96 Gb/s per antenna, output to archive up to 64 MB/s) On-line Pipeline(quicklook, flagging, images), Off-line Data Reduction, Telescope Calibration Archiving ( Data rate >10MB/s - 300 TB/year) Observing Preparation, Scheduling –Support of novice science intent to get Sched. Blocks –Dynamic scheduling to take advantage of weather

6 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Software Scope From the cradle… –Proposal Preparation –Proposal Review –Program Preparation –Dynamic Scheduling of Programs –Observation –Calibration & Imaging –Data Delivery & Archiving Afterlife: –Archival Research & VO Compliance

7 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Trilateral Computing IPT Organisation Total Bilateral staff now: 40 FTEs Total trilateral staff now: 65 FTEs

8 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Computing Large but extremely distributed team 40 Full Time Equivalent for whole E2E sw  Total development effort to 2011 ~280 FTE-years The fundamental output of the CIPT will be a ~2M SLOC “end to end” software system running on over 200 computers on 4 continents. –(2M figure does not include comments, tests, documentation, or adopted/modified products like AIPS++, NGAS, ATM, etc). Staff in 14 Institutions Europe/North America/Japan  Japanese Computing fully integrated. It includes:  Staff in Japan working on ACA ~ 30 FTE-years  Staff and cash for developments in Europe, US ~ 60 FTE-years

9 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Software Architecture

10 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi AOS Network 1 Gb fibers from Antenna pads Terminal PCs (Diskless + RFI quiet) IP-Telephony 16 CDP Beowulf nodes 10 Gb fibers to OSF CDP Master SRST-Router CCC Computer Computer RoomOffice Area Patch Panel ARTM, GPS.. (Diskless computers) Correlator Room Patch Panel Room Structured copper cabling X 64 X 250 fiber copper 10 Gb

11 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA software development process Software to be developed in two main phases: Array sw by 2008, Observatory sw by 2011 Incremental synchronized development via 6 monthly Releases at FIXED dates  allows adjusting priorities to status –We consider a fixed-date development pacing to be crucial in our distributed environment Monthly integration tags (end-of-month) and inter-subsystem interface freezes (middle of month) Releases every 6 months (alternating major/minor) – We believe development of an integrated system requires integrations from the beginning to avoid the well-known “integration hell” problem Non regression- + User (Test Cases)-Tests (Goal:20% effort)

12 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA software approach We have requirements since the beginning: Science + Operation Requirements => Architecture => We are tracking them (vs Features, Tests, Delivery time) (using Telelogic’s DOORS) Prototypes were done (using ACS – see below) Software for prototype antennas, first correlator Common infrastructure (software rather than rules): ALMA Common Software (ACS), started very early and now getting more and more stable. S/w engineering procedures, integration, tests

13 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ACS Concepts Component-Container Supports Separation of Concerns between technology and specific applications. Same idea as.NET, EJB, CCM Client... Container Component 1 Component 2 Component 3 ACS Entity objects Structured data, e.g. Scheduling Blocks to be passed between components defined & serialized with XML

14 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Computing Project Management & Oversight Oversight –Yearly reviews –Assignment of “subsystem scientists” –Subsystem contact meetings Planning, Control Plan coming year in some detail (high-level requirements decomposed into granular features), place remaining features in a backlog, to be drawn in priority order Verify (trace) feature completion via user end tests

15 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Planning: R3 Master Test Plan

16 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Computing Group Communications and Reporting  Yearly Incremental Design Reviews, Review Plans revised every 6 months  TWiki is used/useful for orderly discussions  Contact meetings with subsystems and among subsytem leads  Yearly subsystem leads meetings (design and interface discussions)  People meet by working together at each other’s site  Videoconf more troublesome than telecons

17 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Tests will grade full/partial requirements. SSR sign off on a requirement as ‘Adequate’ by grading requirements as shown in example below. Overall Grade Test Grades

18 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Status Passed external PDR (2003) and CDR2 (‘04) and internal CDR1(’04), CDR3 (‘05) Delivered R0-R3 release (+Rx.1 Releases) Prototype control/correlator used with prototype antennas Every subsystem has a dedicated astronomer, who checks developed features twice per year (release validation).

19 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Status (cont.) Most subsystems have substantial development with infrastructure in place, external interfaces defined and implemented, and some functionality. –Most subsystems have had external user tests –Integrated tests with simulated/elementary data has taken place –internal testing of the system at the VLA site early 2006 Antenna evaluation required significant software, but was done essentially via scripting of control components ACA (Japanese compact array) and Observatory Support software still in early design

20 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi (~850 kSLOCs Oct.05) In-kind contributions (NGAS, AIPS++, ATM) not included Test Interferometer Control Software prototype

21 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Lessons learned Geographical distribution with this size & pace is difficult (*): –Computing Subsystems mixed across continents (sometimes, it was inevitable) –Acceptance of common software (optimized for system, not for everybody’s taste & mandatory. In general OK) => Requires team spirit. –Stability of interfaces among subsystems => No last minute changes –Difficulty of Integration. Subsystems tend to give priority to own development vs. stability of system (but we are still in the early phases). => Takes two months for an integrated system. Continuous integration remains a goal (dream?) –In front of problems finger-pointing to “the others” occurs too quickly. –Some inefficiency has to be accepted (balanced by more discussion, better design) We gave some thought to Agile developments.. but are at wrong end of spectrum (vs local small team). At least: Light doc.+ Some form of emergency “pair programming” at integration time. (*) Not a statement against collaborations (typically among labs with different projects). We believe to be a very good example of a collaborative project (Hopefully we will also have a successful software to show at the end as well).

22 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Prototype Antennas at the VLA Site (New Mexico) Vertex/RSIAlcatel/EIE Evaluated using prototype control software (with ACS)

23 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi First Operator GUI

24 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Operation Support Facility (OSF) ALMA Sites in Chile 60 MB/s (peak) 6 MB/s (average) Antenna Operations Site (AOS) Santiago Central Office (SCO)

25 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Earthwork for the OSF Technical Facilities

26 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Operation Site Facility today

27 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Operation Site Facility (2900m – Atacama desert) ALMA operated from here up to 2009

28 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi Antenna Operation Site Technical Building Concept

29 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Santiago Office Support operation from Santiago with: Final master archive Pipeline monitoring ALMA Regional Centers in Europe, US, Japan Wide area network connectivity Copies of archive data Support of users in proposal prep. & final data reduction

30 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Related Papers and Posters at ICALEPCS’2005 Sat.-Sun: ALMA Common Software (ACS) Workshop WE1.4-4: Advanced Hardware Technology in ALMA Back End and Correlator, F. Biancat Marchet etc. WE4A.2-5: A generic software interface simulator for ALMA common software, D. Fugate etc. WE2.4-6 : The ALMA Common Software ACS Status and Developments, G.Chiozzi etc. WE3A.3-6: The ALMA Telescope Control System, A. Farris etc. PO1.012-1: Development of the control system for the 40m radiotelescope of the OAN using the Alma Common Software, P. de Vicente etc. PO1.032-6: Transmitting huge amounts of data design implementation and performance of the bulk data transfer mechanism in ALMA ACS, P. Di Marcantonio etc. PO2.067-4 : ALMA Correlator Real-Time Data Processor, J.Pisano etc. PO1.100-8 : Migration from ACS 1.1 to ACS 4 at ANKA, I.Križnar etc.

31 ICALEPCS’2005 - Geneva The Alma Computing Project - B.Glendenning, G.Raffi ALMA Sites: Chajnantor +

