Trends in AO ESO: or 10+ years of Octopus Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud,

Slides:

Advertisements

Similar presentations

Fast & Furious: a potential wavefront reconstructor for extreme adaptive optics at ELTs Visa Korkiakoski and Christoph U. Keller Leiden Observatory Niek.

Advertisements

Chapter 2 Machine Language.

DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.

Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.

Adaptive Optics Model Anita Enmark Lund Observatory.

Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.

What Requirements Drive NGAO Cost? Richard Dekany NGAO Team Meeting September 11-12, 2008.

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Trigger and online software Simon George & Reiner Hauser T/DAQ Phase 1 IDR.

Cisc Complex Instruction Set Computing By Christopher Wong 1.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

 Johann Kolb, Norbert Hubin  Mark Downing, Olaf Iwert, Dietrich Baade Simulation results:  Richard Clare Detectors for LGS WF sensing on the E-ELT 1AO.

1 On-sky validation of LIFT on GeMS C. Plantet 1, S. Meimon 1, J.-M. Conan 1, B. Neichel 2, T. Fusco 1 1: ONERA, the French Aerospace Lab, Chatillon, France.

1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,

1 Software Development Configuration management. \ 2 Software Configuration  Items that comprise all information produced as part of the software development.

GLAO simulations at ESO European Southern Observatory

1 Manal Chebbo, Alastair Basden, Richard Myers, Nazim Bharmal, Tim Morris, Thierry Fusco, Jean-Francois Sauvage Fast E2E simulation tools and calibration.

NSF Center for Adaptive Optics UCO Lick Observatory Laboratory for Adaptive Optics Tomographic algorithm for multiconjugate adaptive optics systems Donald.

AO for ELT – Paris, June 2009 MAORY Multi conjugate Adaptive Optics RelaY for the E-ELT Emiliano Diolaiti (INAF–Osservatorio Astronomico di Bologna)

Offline Coordinators  CMSSW_7_1_0 release: 17 June 2014  Usage:  Generation and Simulation samples for run 2 startup  Limited digitization and reconstruction.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.

Low order modes sensing for LGS MCAO with a single NGS S. Esposito, P. M. Gori, G. Brusa Osservatorio Astrofisico di Arcetri Italy Conf. AO4ELT June.

April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.

1 Characterization of the T/T conditions at Gemini Using AO data Jean-Pierre Véran Lisa Poyneer AO4ELT Conference - Paris June , 2009.

ATLAS The LTAO module for the E-ELT Thierry Fusco ONERA / DOTA On behalf of the ATLAS consortium Advanced Tomography with Laser for AO systems.

Improved Tilt Sensing in an LGS-based Tomographic AO System Based on Instantaneous PSF Estimation Jean-Pierre Véran AO4ELT3, May 2013.

CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.

1 MCAO at CfAO meeting M. Le Louarn CfAO - UC Santa Cruz Nov

A global approach to ELT instrument developments J.-G. Cuby for the French ELT WG.

AO4ELT, June Wide Field AO simulation for ELT: Fourier and E2E approaches C. Petit*, T. Fusco*, B. Neichel**, J.-F. Sauvage*, J.-M. Conan* * ONERA/PHASE.

Experimental results of tomographic reconstruction on ONERA laboratory WFAO bench A. Costille*, C. Petit*, J.-M. Conan*, T. Fusco*, C. Kulcsár**, H.-F.

QCAdesigner – CUDA HPPS project

FLAO_01: FLAO system baseline & goal performance F. Quirós-Pacheco, L. Busoni FLAO system external review, Florence, 30/31 March 2009.

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Wide-field wavefront sensing in Solar Adaptive Optics - its modeling and its effects on reconstruction Clémentine Béchet, Michel Tallon, Iciar Montilla,

March 31, 2000SPIE CONFERENCE 4007, MUNICH1 Principles, Performance and Limitations of Multi-conjugate Adaptive Optics F.Rigaut 1, B.Ellerbroek 1 and R.Flicker.

GLAO Workshop Leiden April 2005 Remko Stuik Leiden Observatory.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

George Angeli 26 November, 2001 What Do We Need to Know about Wind for GSMT?

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

AO4ELT, Paris A Split LGS/NGS Atmospheric Tomography for MCAO and MOAO on ELTs Luc Gilles and Brent Ellerbroek Thirty Meter Telescope Observatory.

François Rigaut, Gemini Observatory GSMT SWG Meeting, LAX, 2003/03/06 François Rigaut, Gemini Observatory GSMT SWG Meeting, LAX, 2003/03/06 GSMT AO Simulations.

Outline Announcements: –HW II Idue Friday! Validating Model Problem Software performance Measuring performance Improving performance.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.

The MAORY Simulation Tool C. Arcidiacono, L. Schreiber, G. Bregoli, E. Diolaiti, Mauro Patti, M. Lombini OA BOLOGNA.

F453 Module 8: Low Level Languages 8.1: Use of Computer Architecture.

All-sky source search Aim: Look for a fast method to find sources over the whole sky Selection criteria Algorithms Iteration Simulations Energy information.

Conclusions on CS3014 David Gregg Department of Computer Science

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

MASS Java Documentation, Verification, and Testing

Modeling Big Data Execution speed limited by: Model complexity

Scientific requirements and dimensioning for the MICADO-SCAO RTC

Spatial Analysis With Big Data

Constructing a system with multiple computers or processors

CSCI1600: Embedded and Real Time Software

Constructing a system with multiple computers or processors

Constructing a system with multiple computers or processors

The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.

Constructing a system with multiple computers or processors

1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.

Outline Chapter 2 (cont) OS Design OS structure

Computer Evolution and Performance

Types of Parallel Computers

CSCI1600: Embedded and Real Time Software

Presentation transcript:

Trends in AO ESO: or 10+ years of Octopus Miska Le Louarn and all the Octopus gal and guys along the years: Clémentine Béchet, Jérémy Braud, Richard Clare, Rodolphe Conan, Visa Korkiakoski, Christophe Vérinaud

Different roles for end-to-end AO sims Rough concept validation  PYR is better than a SH ! Let’s make it. TLR / performance definition What performance can you get with our chosen PYR / DM ? Provide PSFs to astronomers for science simulations / ETC System design / component and tolerances specification / CDR / PDR How well do you need to align the PYR wrt. DM ? System performance validation Yes, in lab / on the sky we get what we simulated If not, why ? System debugging Why is this not working ? R&D Frim, calibrations, testing of new concepts Other RTC simulation, Atmospheric simulations, WFS studies,…

General principles of Octopus Atmosphere simulated by von Karman spectrum phase screens (these are pixel maps of turbulence) Phase at telescope is the sum of the phase screens in one particular direction (geometric propagation) A Wavefront sensor model measures that phase Includes usually Fourier Transforms of the phase From those measurements, commands to the DM(s) are calculated DM shape is calculated (through WF reconstruction) and subtracted from the incoming phase Commands are time filtered (simple integrator, or POLC, or…) Phase screens are shifted, to reproduce temporal evolution (by wind – frozen flow hypothesis) Go back to the beginning of this slide. And iterate for “some time”. Many options for several of the steps above…

Archeology: why Octopus ? OWL: 100m ancestor of the E-ELT, ~year Before Octopus, there were a few single CPU simulations (FRI’s aosimul.pro (  yao), CHAOS,ESO Matlab tool…) Limitations: 2 GB of RAM (32bit systems), single threaded 1 st challenge: Simulate 100m SH-SCAO, on cheap desktop machines, with 2GB of RAM / machine, in a “reasonable” time ✓ 2 nd challenge MAD-like on 100m or ✓ MCAO (6LGS, 3DMs) for 40m class 3 rd challenge EPICS (i.e. XAO, 200x200 subaps) for the 42m ✓ Open to new concepts Pyramid, Layer Oriented, MOAO, POLC, New reconstructors,…

Octopus: features Octopus: software to simulate ELT AO / large AO systems Has to be reasonably fast on LARGE systems. Not optimized for small systems… Still, it works also on small systems. End-to-end (Monte Carlo), Many effects (alignments, actuator geometry, …) included Open to new reconstructors  MVM + Simple Integrator (this is the original scheme – the rest is add ons)  FrIM + Internal model control / POLC  FTR + Simple Integrator  Austrian “Cure3D” and others Several WFS types  SH (with spatial filter if needed)  PYR (incl. modulation)  SO/LO OL, SCAO, GLAO, LTAO, MCAO, MOAO can be simulated LGS specific aspects  including different orders for sensors (e.g. 3x3 NGS sensor)  Image sharpening for TT  Spot elongation with central / side launch / non gaussian profiles,…  Different centroiding algorithms  “Complex” SW to handle all those cases.

Hardware / Software side Hardware to simulate ELT AO Linux + cluster of PCs AO simulation ESO: ~60 nodes, up to 128GB of RAM / node Heterogeneous architecture (some machines faster / newer than others) Gigabit Ethernet switch (quite old now  upgrade 10G considered) Software (open source, maximum portability & versatility): Gcc, Mpich2, Gsl, fftw2, scalapack (all open source) // debugger (ddt – not open source) Code is very portable. Also tested: Linux / PC cluster at Arcetri, Leiden (LOFAR project), IBM Blue Gene L (PPC architecture) Single multi-core workstation Shows limits of single machine: many cores machine has slower cores than less cores machines Allows to tackle extremely large systems without changing at all the code. To simulate bigger systems, just add machines.

Parallelization Almost everything in Octopus is “somehow” parallelized Atmospheric propagation WFS Several levels of parallelization  multiple WFSs  WFS itself MVM, Matrix operations, Matrix creations (=calibration), PSF calculations Parallelization done “explicitly” Coarse grain parallelization (i.e. big “functional” blocks are parallelized) This introduces a level of complexity not necessarily seen in “conventional” AO simulators Parallelization done with MPI Allows to use many machines (“distributed memory”), and add memory by adding machines Allows also to use single machine with multiple cores (“shared memory” with some overhead): not optimal but portable. Although not optimized, the code will run and be useful in different kinds of architectures (shared and distributed memory). BUT Not optimal in shared memory case !

Recent upgrades Noise optimal reconstructor for spot elongation (“SCAO”, GLAO, LTAO) with central / side launch Richard for MVM, Clementine for Frim, All Austrians reconstructors Spot elongation with non gaussian Na profiles New MVM reconstructor with MMSE tomography (ATLAS, MAORY). ONERA algorithm being made Octopus compatible. Significant acceleration (x5 !) of code with large spot elongation Skipping of PSF calculation, just rms WFE + TT fudge  Strehl (acceleration) Most accelerations have been done through better approximations and improved modeling of the physics  Octopus is a mix of AO physics modeling and computer science optimizations

System based customizations Each AO system is somehow unique At some phase of system analysis, particularities of the system need to be integrated Actual DM geometry IFs Particular error budget (vibrations, static errors,…) Particular outputs (TT residuals, EE, PSF…) Code then “diverges” from main branch (OR enormous array of if this then that) How to deal with *a lot* of configurations, each somehow special ?

Octopus validations Recurrent question: “How is Octopus validated” ? Against other simulators Several “campaigns” of validation Yao (  Gemini MCAO), TMT simulator (NFIRAOS), analytical models (Cibola, ONERA, error budget-type formula for fitting/aliasing/temporal delay,…) NACO simulations compared to Octopus Against MAD There are so many variables that you never know for sure (e.g.: integration of X seconds, with constantly variable seeing vs. Y seconds simulation with fixed seeing, Cn2,…) Satisfactory agreement when “reasonable” assumptions are made Indirectly For example, Frim uses an internal AO model. This allowed also to test Octopus methods. Showed impact of SH non-linearities. The simulation only simulates what you put in… If the system is not well maintained, simulations and reality will disagree. The problem is rather: what did you forget to model in the PARTICULAR system you are investigating. (ex: vibrations, Cn2, Telescope…)

An example of “validation”

Difficulties with Octopus It is written in C and parallelized Adding new features is more difficult than with higher level simulation tools Price to pay for high speed & portability One could move some things to a higher level language (yorick ?) to simplify – with not much loss of performance Some Linux knowledge and command line is needed It is also complex because many concepts are simulated, in parallel. A single thread SCAO SH code would be much simpler. Many things are “quick and dirty” solutions which need cleaning up Written by physicist. It is a research code. I think that’s ok – we are never doing the same system twice, so there is always things to add / change (ERIS is the latest example). New concept pop up and need new implementation For example, spot elongation required to break the nice paradigm that all sub-apertures are equal (  impact on parallelization) LGS with PYR might also introduce some mess […]

A faster Octopus? One very efficient way to accelerate is to reduce accuracy Example: Sphere Model for SPARTA Reduce pupil sampling (  FOV of subaps gets smaller) Reduce number of turbulent layers (  ok for Sphere) Don’t calculate PSF (just Strehl  ok for SPARTA use) No spatial filter (  ok for SPARTA) […]  Simulation accelerates by factor 5-10 (!). 120Hz (can be improved) Octopus cluster allows to run at least 5-10 simulations simultaneously: allows to gain some of the time “lost” (wrt GPU codes) by simply launching many simulations in parallel. Tested Xeon Phi Managed to run the code Very slow (unuseable) for the moment on Phi Need to improve vectorization Improve paralellization to use efficiently cores Is it worth the time ??? Vectorization should be improved for sure (improves also CPU performance) What’s the future of Xeon Phi ?

A faster Octopus ? An option is to use more dedicated hardware TMT / COMPASS-like approach to port the ~whole code to GPUs Harder to add new concepts into GPU quickly because so specialized Large porting effort requiring GPU & AO expertise Lose possibility to go to large cluster (supercomputer) if needed If a huge AO simulation is needed (for example 2 nd Gen MCAO for ELT), we risk being stuck by HW limitations if HW is too specific This is clearly a risk since we are very influenced by external ideas (≠TMT). We cannot have a dedicated simulation tool per project. Compromise: Porting parts of Octopus to GPUs is possible without loss of generality (but also with loss of max achievable performance) Eg. SH could be accelerated “easilly” by porting FFT to GPUs – but with what gain ? Same for PSF calculation (maybe – it’s large FFTs…) – but with what gain ? Porting atmospheric propagation would require much more work (  TMT). Huge effort in terms of manpower is needed for this approach… Use COMPASS for some cases ?

Octopus external tools Set of tools to analyze Octopus data Plot DM shapes, slopes, commands, Ims,… Pretty much everybody wants different things Matlab, yorick, IDL,… Matlab Engine (using Matlab compiler to produce libraries) to call Octopus from Matlab and vice versa External code can also be used with Octopus Reconstructors (Frim, Austrians, soon ONERA) Power spectrum calculators (  Richard) Analysis of residual phases, slopes, commands,… through dumps to the disk.

Future software directions ? RTC testing platform Use Octopus to generate slopes to feed to SPARTA, SPARTA generates commands, Commands sent to Octopus  allows to test SPARTA loops that need true atmospheric data (e.g. r0 estimation, optimizations,…) “A loop in the computer” Doesn’t need highest accuracy simulation BUT extreme speed First “proof of concept” demonstration done with Octopus GPUs / FPGA / … To get more speed on simulations in some areas (or complete simulation…) More… Calibrations of AOF, AIT of AOF Algorithms, temporal behavior, […] PYR with LGS ? We need to carefully weights what we lose in coding time (optimizing / re- coding, re-engineering) vs. what we gain in simulation time. Very often not limited by simulation speed but setting up / checking / thinking / gathering and comparing result… I prefer a set of small evolutions in steps instead of a complete rewrite

Simulated systems Along the years, many systems have been simulated AOF: GRAAL, GALACSI WFM, NFM OWL (100m): SCAO, GLAO, MCAO E-ELT (50m, 42m, 39m): SCAO, GLAO, MCAO, LTAO, XAO, [MOAO] “TMT NFIRAOS” Eris “Gemini-like MCAO” (for Eris) MAD (SCAO, GLAO, MCAO) “NACO” “SPHERE” NAOMI […]

Conclusions Octopus has shown its ability to deliver simulations on all major AO systems at ESO It is fast enough on large AO systems and scalable to anything that we can imagine Many accelerations done recently – so its even faster With current software & hardware, we can do the study (up to FDR) of any one (maybe 2) complex ELT AO system in addition to ERIS / VLT systems. Today. More people limited that CPU limited Well tested (doesn’t mean bug free ;-) ) Has been demonstrated to be open to new concepts, and able to deliver results on those new concepts in a relatively short time.