Big Data Imperial June 2013 Dr Paul Calleja Director HPCS The SKA The worlds largest big-data project.

Slides:



Advertisements
Similar presentations
HPCx Power for the Grid Dr Alan D Simpson HPCx Project Director EPCC Technical Director.
Advertisements

SKADSMTAvA A. van Ardenne SKADS Coördinator ASTRON, P.O. Box 2, 7990 AA Dwingeloo The Netherlands SKADS; The.
Team 1: Aaron, Austin, Dan, Don, Glenn, Mike, Patrick.
Square Kilometre Array (SKA) Project PRE-CONSTRUCTION PHASE Briefing for Industry & Science Organisations Tuesday 14 th August Wellington.
SADC HPC Workshop, 2 Dec 2013, Cape Town
Paul Alexander DS3 & DS3-T3 SKADS Review 2006 DS3 The Network and its Output Data Paul Alexander.
Cape Town 2013 Dr Paul Calleja Director Cambridge HPC Service SKA The worlds largest Radio Telescope streaming data processor.
CSIRO ASKAP Science Data Archive (CASDA) Project Kick-Off IM&T AND CASS Dan Miller| Project Manager 17 July 2014.
FUTURE TECHNOLOGIES Lecture 13.  In this lecture we will discuss some of the important technologies of the future  Autonomic Computing  Cloud Computing.
Towards a Virtual European Supercomputing Infrastructure Vision & issues Sanzio Bassini
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
SKA South Africa Overview Thomas Kusel MeerKAT System Engineering Manager April 2011.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
02/24/09 Green Data Center project Alan Crosswell.
R R R CSE870: Advanced Software Engineering (Cheng): Intro to Software Engineering1 Advanced Software Engineering Dr. Cheng Overview of Software Engineering.
Constellation Technologies Providing a support service to commercial users of gLite Nick Trigg.
SKA/LOFAR Ray Norris ATNF Outreach workshop 2 Dec 2003.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Assessment of Core Services provided to USLHC by OSG.
Kick-off University Partners Meeting September 18, 2012 Michael Owen, VP Research, Innovation & International, UOIT on behalf of Consortium partners Southern.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
Server Virtualization: Navy Network Operations Centers
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
National Center for Supercomputing Applications Observational Astronomy NCSA projects radio astronomy: CARMA & SKA optical astronomy: DES & LSST access:
1 The Strategic Research Agenda from 2014 to 2020 BDEC Wednesday, January 28 th,
LOFAR AND AFRICA Daan du Toit DST – South Africa.
Top Issues Facing Information Technology at UAB Sheila M. Sanders UAB Vice President Information Technology February 8, 2007.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Paul Alexander & Jaap BregmanProcessing challenge SKADS Wide-field workshop SKA Data Flow and Processing – a key SKA design driver Paul Alexander and Jaap.
CRISP & SKA WP19 Status. Overview Staffing SKA Preconstruction phase Tiered Data Delivery Infrastructure Prototype deployment.
SKA System Design Kobus Cloete 9 December 2010 AAVP Workshop "Exploring the Universe with the world's largest radio telescope"
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Peter Wilkinson SKADS Workshop: Paris 10 October nd SKADS Workshop: Introduction “What is it about?” Peter Wilkinson University of Manchester.
● Radio telescope arrays – km diameter – Resolution arcmin to micro-arcsec at radio wavelengths ● Similar (baseline/ wavelength) for other regimes.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
A Data Centre for Science and Industry Roadmap. INNOVATION NETWORKING DATA PROCESSING DATA REPOSITORY.
Australian SKA Pathfinder (ASKAP) David R DeBoer ATNF Assistant Director ASKAP Theme Leader 06 November 2007.
Minimising IT costs, maximising operational efficiency NIMM: Key Business Technology Map The core application delivery solutions that.
Patryk Lasoń, Marek Magryś
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
Paul AlexanderSKA Computational Challenges Square Kilometre Array Computational Challenges Paul Alexander.
Smart cloud orchestrator - the first implementation in the world at Wroclaw University of Technology for supporting design processes in education at universities.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Power and Cooling at Texas Advanced Computing Center Tommy Minyard, Ph.D. Director of Advanced Computing Systems 42 nd HPC User Forum September 8, 2011.
N. RadziwillEVLA Advisory Committee Meeting May 8-9, 2006 NRAO End to End (e2e) Operations Division Nicole M. Radziwill.
The Square Kilometre Array Dr. Minh Huynh (International Centre for Radio Astronomy Research and SKA Program Development Office) Deputy International SKA.
ISP Astronomy Gary D. Westfall1Lecture 7 Telescopes Galileo first used a telescope to observe the sky in 1610 The main function of a telescope is.
Paul Alexander 2 nd SKADS Workshop October 2007 SKA and SKADS Costing The Future Paul Alexander Andrew Faulkner, Rosie Bolton.
Frazer OwenNSF EVLA Mid-Project Review May 11-12, Transition to EVLA
Computing Strategies. A computing strategy should identify – the hardware, – the software, – Internet services, and – the network connectivity needed.
Square Kilometre Array eInfrastructure: Requirements, Planning, Future Directions Duncan Hall SPDO Software and Computing EGEE 2009.
Nigel Lockyer Fermilab Operations Review 16 th -18 th May 2016 Fermilab in the Context of the DOE Mission.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
CMB & LSS Virtual Research Community Marcos López-Caniego Enrique Martínez Isabel Campos Jesús Marco Instituto de Física de Cantabria (CSIC-UC) EGI Community.
The Australian Astronomy MNRF Ray Norris June 2004.
Engineering Commissioning May 2016 SKA1 LOW – Assembly, Integration & Verification Adam MacLeod AIV Consortium Manager & ASKAP System Engineer.
The Science Data Processor and Regional Centre Overview Paul Alexander UK Science Director the SKA Organisation Leader the Science Data Processor Consortium.
Centre of Excellence in Physics at Extreme Scales Richard Kenway.
The SKA Science Data Processor and Regional Centres
High Performance Computing (HPC)
Open Science cloud access to LOFAR data and compute
Accessing the VI-SEEM infrastructure
Building the Square Kilometer Array – a truly global project
Mid Frequency Aperture Arrays
National e-Infrastructure Vision
CSIRO Agency Update Dr. Robert Woodcock.
DATS International Portfolio.
Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing
The Cambridge Research Computing Service
Presentation transcript:

Big Data Imperial June 2013 Dr Paul Calleja Director HPCS The SKA The worlds largest big-data project

Big Data Imperial June 2013 HPCS activities & focus Dell HPC Solution Centre Academic / Industrial HPC Cloud Cambridge HPC Service

Big Data Imperial June 2013 Next generation radio telescope 100 x more sensitive X faster 5 square km of dish over 3000 km The next big science project Currently the worlds most ambitious IT Project Cambridge lead the computational design HPC compute design HPC storage design HPC operations Square Kilometre Array - SKA

Big Data Imperial June 2013 SKA location Needs a radio-quiet site Very low population density Large amount of space Two sites: Western Australia Karoo Desert RSA A Continental sized Radio Telescope

Big Data Imperial June 2013 What is radio astronomy XXXXXX SKY Image Detect & amplify Digitise & delay Correlate Process Calibrate, grid, FFT Integrate s B 12 Astronomical signal (EM wave)

Big Data Imperial June 2013 Why SKA – Key scientific drivers Are we alone ??? Cosmic Magnetism Evolution of galaxies Pulsar survey gravity waves Exploring the dark ages

Big Data Imperial June 2013 SKA timeline 2019Operations SKA : Operations SKA Construction of Full SKA, SKA 2 €1.5B % SKA construction, SKA 1 €300M 2012Site selection Pre-Construction: 1 yr Detailed design€90M PEP 3 yr Production Readiness System design and refinement of specification Initial concepts stage Preliminary ideas and R&D

Big Data Imperial June 2013 SKA project structure SKA Board Director General Work Package Consortium 1 Work Package Consortium n Advisory Committees (Science, Engineering, Finance, Funding …) Advisory Committees (Science, Engineering, Finance, Funding …) … … Project Office (OSKAO) Locally funded

Big Data Imperial June 2013 Work package breakdown UK (lead), AU (CSIRO…), NL (ASTRON…) South Africa SKA, Industry (Intel, IBM…) UK (lead), AU (CSIRO…), NL (ASTRON…) South Africa SKA, Industry (Intel, IBM…) 1.System 2.Science 3.Maintenance and support /Operations Plan 4.Site preparation 5.Dishes 6.Aperture arrays 7.Signal transport 8.Data networks 9.Signal processing 10.Science Data Processor 11.Monitor and Control 12. Power SPO

Big Data Imperial June 2013 SKA data flow 16 Tb/s4 Pb/s 24 Tb/s 20 Gb/s 1000Tb/s

Big Data Imperial June 2013 Science data processor pipeline 10 Pflop 1 Eflop 100 Pflop Software complexity 3200 GB/s 200 Pflop 2.5 Eflop … Incoming Data from collectors Switch Buffer store Switch Buffer store Bulk Store Correlator Beamformer UV Processor Imaging: Non-Imaging: Corner Turning Course Delays Fine F-step/ Correlation Visibility Steering Observation Buffer Gridding Visibilities Imaging Image Storage Corner Turning Course Delays Beamforming/ De-dispersion Beam Steering Observation Buffer Time-series Searching Search analysis Object/timing Storage HPC science processing Image Processor 128,000GB/s 1 Eflop 3 EB SKA 2 SKA PB 135 PB 5.40 EB

Big Data Imperial June 2013 The SKA SDP compute facility will be at the time of deployment one of the largest HPC systems in existence Operational management of large HPC systems is challenging at the best of times - When HPC systems are housed in well established research centres with good IT logistics and experienced Linux HPC staff The SKA SDP will be housed in a desert location with little surrounding IT infrastructure, with poor IT logistics and little prior HPC history at the site Potential SKA SDP exascale systems are likely to consist of 100,000 nodes occupy 800 cabinets and consume 30 MW. This is very large – around 5 times the size of today largest supercomputer –Titan Cray at Oakridge national labs. The SKA SDP HPC operations will be very challenging SKA Exascale computing in the desert

Big Data Imperial June 2013 Although the operational aspects of the SKA SDP exacscale facility are challenging they are tractable if dealt with systematically and in collaboration with the HPC community. The challenge is tractable

Big Data Imperial June 2013 We can describe the operational aspects by functional element Machine room requirements ** SDP data connectivity requirements SDP workflow requirements System service level requirements System management software requirements** Commissioning & acceptance test procedures System administration procedure User access procedures Security procedure Maintenance & logistical procedures ** Refresh procedure System staffing & training procedures ** SKA HPC operations – functional elements

Big Data Imperial June 2013 Machine room infrastructure for exascale HPC facilities is challenging 800 racks, 1600M squared 30MW IT load ~40 Kw of heat per rack Cooling efficiency and heat density management is vital Machine infrastructure at this scale is in the £150M bracket with a design and implementation time sale of 2-3 years The power cost alone at todays cost is £30M per year Desert location presents particular problems for data centre Hot ambient temperature - difficult for compressor less cooling Lack of water- difficult for compressor less cooling Very dry air- difficult for humidification Remote location- difficult for DC maintenance Machine room requirements

Big Data Imperial June 2013 System management software is the vital element in HPC operations System management software today does not scale to exascale Worldwide coordinated effort to develop system management software for exascale Elements of system management software stack:- Power management ** Network management Storage management Workflow management OS Runtime environment ** Security management System resilience ** System monitoring ** System data analytics ** Development tool System management software

Big Data Imperial June 2013 Current HPC technology MBTF for hardware and system software result in failure rates of ~ 2 nodes per week on a cluster a ~600 nodes. It is expected that SKA exascale systems could contain ~100,000 nodes Thus expected failure rates of 300 nodes per week could be realistic During system commissioning this will be 3 or 4 X Fixing nodes quickly is vital otherwise the system will soon degrade into a non functional state The manual engineering processes for fault detection and diagnosis on 600 will not scale to 100,000 nodes. This needs to be automated by the system software layer Scalable maintenance procedures need to be developed between HPC system administrators, system software and smart hands in the DC Vendor hardware replacement logistics need to cope with high turn around rates Maintenance logistics

Big Data Imperial June 2013 Providing functional staffing levels and experience at remote desert location will be challenging Its hard enough finding good HPC staff to run small scale HPC systems in Cambridge – finding orders of magnitude more staff to run much more complicated systems in a remote desert location will be very Challenging Operational procedures using a combination of remote system administration staff and DC smart hands will be needed. HPC training programmes need to be implemented to skill up way in advance The HPCS in partnership SA National HPC provider and SKA organisation is already in the process of building out pan African HPC training activities Staffing levels and training

Big Data Imperial June 2013 Early Cambridge SKA solution - EDSAC 1