Yin Chen Towards the Big Data Strategies for EISCAT-3D.

Slides:



Advertisements
Similar presentations
What does LOFAR have to do with the Virtual Observatory (VO)? LOFAR Science Day 16 December 2003 Melbourne David Barnes The University of Melbourne.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
SICSA student induction day, 2009Slide 1 Social Simulation Tutorial Session 6: Introduction to grids and cloud computing International Symposium on Grid.
Project number: Towards common operations Wouter Los 1 & Sanna Sorvari 2 1 University of Amsterdam 2 Finnish Meteorological Institute.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.
Ian McCrea STFC Rutherford Appleton Laboratory Chilton, Oxfordshire, UK On behalf of the EISCAT_3D Project Consortium.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
1 Common Challenges Across Scientific Disciplines Laurence Field CERN 18 th November 2013.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
1 Data services and computing. 2 We tend to be dealt the computing environment in which we must operate. Few of us have enough influence to steer the.
The EISCAT_3D Science Case: Current Status Ian McCrea STFC RAL.
Paul Alexander & Jaap BregmanProcessing challenge SKADS Wide-field workshop SKA Data Flow and Processing – a key SKA design driver Paul Alexander and Jaap.
Doug Tody E2E Perspective EVLA Advisory Committee Meeting December 14-15, 2004 EVLA Software E2E Perspective.
WP3: Science Planning and User Engagement Anita Aikio Coordinators: UOulu (FI) and STFC (UK) Objectives: Ensure consistency between the Science Case and.
CERN openlab V Technical Strategy Fons Rademakers CERN openlab CTO.
Research Networks and Astronomy Richard Schilizzi Joint Institute for VLBI in Europe
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
EISCAT-3D Ian McCrea Rutherford Appleton Laboratory, UK On behalf of the EISCAT-3D Design Team.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Jake F. Weltzin United States Geological Survey USA National Phenology Network Integrating phenology data across spatial and temporal scales.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
AstroGrid NAM 2001 Andy Lawrence Cambridge NAM 2001 Andy Lawrence Cambridge Belfast Cambridge Edinburgh Jodrell Leicester MSSL.
A real-time software backend for the GMRT : towards hybrid backends CASPER meeting Capetown 30th September 2009 Collaborators : Jayanta Roy (NCRA) Yashwant.
EURO-VO: GRID and VO Lofar Information System Design OmegaCEN Kapteyn Institute TARGET- Computing Center University Groningen Garching, 10 April 2008 Lofar.
C herenkov Telescope Array Dr. Giovanni Lamanna CNRS tenured research scientist (CTA-LAPP team leader and CTA Computing Grid project coordinator) LAPP.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
NORDUnet NORDUnet e-Infrastrucure: Grids and Hybrid Networks Lars Fischer CTO, NORDUnet Fall 2006 Internet2 Member Meeting, Chicago.
VLDATA Common solution for the (very-)large data challenge EINFRA-1, focus on topics (4) & (5)
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Netherlands Institute for Radio Astronomy Big Data Radio Astronomy A VRC for SKA and its pathfinders Hanno Holties March 28 th 2012.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data service requirements and provisioning models Gergely Sipos With input.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
RI EGI-InSPIRE RI Astronomy and Astrophysics Dr. Giuliano Taffoni Dr. Claudio Vuerli.
Get Data to Computation eudat.eu/b2stage B2STAGE How to shift large amounts of data Version 4 February 2016 This work is licensed under the.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
E-infrastructure requirements from the ESFRI Physics, Astronomy and Analytical Facilities cluster Provisional material based on outcome of workshop held.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Collaboration.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
EISCAT-3D Lassi Roininen Sodankyla Geophysical Observatory.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
Extreme Scale Infrastructure
Break out group coordinator:
PIDs in EUDAT Webinar, 15 Februari 2013
AAI for a Collaborative Data Infrastructure
Tools and Services Workshop
Joslynn Lee – Data Science Educator
EGEE NA4 Lofar Lofar Information System Design OmegaCEN
EGI-Engage Engaging the EGI Community towards an Open Science Commons
University of Technology
EISCAT-3D: a data centric design for extreme scale computing
EGI Webinar - Introduction -
An ecosystem of contributions
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Presentation transcript:

Yin Chen Towards the Big Data Strategies for EISCAT-3D

 EISCAT-3D New Measurement Capabilities  Instantaneous, adaptive control of beam positions  Simultaneous multiple beams/interlaced beams  High-resolution coding of polarisation, phase and amplitude  Aperture synthesis imaging – small-scale 3D imaging(sub- beam-width)  Multi-beam volume imaging – large-scale 3D imaging  Full-profile vector measurements – large/small-scale 3D vector imaging  High-speed object tracking * Estimated for 3 MW Tx: improvement at least x 10 better Opportunities for new Research

 EISCAT-3D e-Infrastructure capabilities  Real-time data access  Virtual observation  Support long-tail scientists  Search through all levels of data, e.g., o Find specific signature at all levels o Plasma features, meteors, space debris, astronomical features  Search for other ISR data resources  User specifying data analysis/processing  New Applications, e.g.,  Space weather  Visualisation Opportunities for new Research

 3 +1 Vs  Volume.  5PB/year in 2018, 40PB/year in 2023  Operate for 30 years, data products to be stored for > 10 years  Velocity.  Each antenna : 120MB/s  160 * antenna group (100 antennas): 2 Gbit/s/group  5* Ringbuffer: each 125 TB/h  Variety.  Measurements: different versions, formats, replicas, external sources...  System information: configuration, monitoring, logs/provenance...  Users’ metadata/data: experiments, analysis, sharing, communications …  Value.  Meaningful insights that deliver analytics/patterns from deep, complex analysis based on machine learning, statistical modelling, graph algorithms...  Go beyond traditional approaches to the space science The Big Data Challenges in EISCAT-3D

 5 Types of data  Raw antenna (group) data ( 10 PB/day )  Voltage beam formed data ( 10 PB/year )  Correlated products ( 1 PB/year )  Fitted data ( 1GB/year )  (User) Specialised Products EISCAT-3D Data Acquisition

 Each antenna  30 Msamples/s (120MB/s)  Antenna group (core site)  Computes a number of (broad) beams from a small number of antennas (FPGAs)  100 antennes → 1 beam 2 polarisations  At 30 MHz IQ this is 32 * 30 * 2 = 2 Gbit/s/group  These data are stored in a ringbuffer  160 groups → 125 TB/h EISCAT-3D Data Acquisition

 2nd stage beamforming  160 antenna groups → 100 beams  Decimation to 1MHz → 200 Gflop/s  Continuing sampling 32bit words (I/Q) 100*1e6*2*32 → 1GB/s 2* 10MHz bands correlated data → 2GB/s In total 10TB/h to be stored in archive  Lag profile inversion  2-3 Tflops/s/beam  Total beams*(2-3) Tflops  8-13 Tflops for 1 beam  Tflops/s for 100 beams EISCAT-3D Data Acquisition

 One want occasionally do offline work on the ringbuffer data  Need transfer to HPC  Link or physical transport 1Tb/s → 1 month, better to do the calcs on-site?  125 TB/h * 1 day → 3 PB  In total ~10PB storage at HPC (72h data)  HPC computing  Higher resolutions (spatial and time)  4Pflop/s*24h → 10⁵ Pflop EISCAT-3D Data Acquisition

EISCAT-3D Data Curation Tire 1 Tire 0 Data Acquisition

EISCAT-3D Tire 0 Curation  Existing EISCAT  Small, EISCAT archive ( ) 60TB  EISCAT_3D 1st stage (2018)  Moderate, EISCAT archive 1PB/year  2-3 Mirrors (North + South Europe+Japan)  Analyis software + Search engines  HPC for detailed studies/developments  Storage 1PB, 1Pflop/run  EISCAT_3D 2nd stage (2023)  High, EISCAT archive 10PB/year  HPC, Storage 10PB, 10 Eflop/run

EISCAT-3D Tire 1 Curation Data Access & Processing Tire 0 Tire 1

 Data staging  Long-term perservation  Security service, e.g., single sign-on, authentication, authorisation  Large scale virtualization of data/compute center resources to achieve on-demand compute capacities  Computing sites and workload management  Metadata service  File catalogue, application registration  Safe Replication service, e.g., dynamic data streams  Simple Store, e.g., drop-box like service for data  Semantic annotation services  A web based science gateway system EISCAT-3D Tire 1 Curation

 Unlock the hidden-value of the big data  Discovery & Access  Intelligent filter  Signature search, similarity, pattern  Connecting big data with existing research analysis  To support discovery of “unknowns”  Metadata-based  Integration of other resources  Processing  Statistical analysing, correlation process  Visualisation  Domain Applications, e.g., space weather service EISCAT-3D Data Access & Processing

A digraph will be provided here … EISCAT-3D Data Access & Processing

 Real-time data access  Community driven design  Virtual research environments  Support Long-tail scientists  Global data sharing and integration Objective 1: Support EISCAT Science Community

 Identify common requirements, challenging issues, state-of-the-art design experiences  LOFAR  LHC  SKA  The Pierre Auger Observatory  Cherenkov Telescope Array  Advance existing technologies  Proof of concepts /prototypes of data infrastructure- enabling software  Support to the evolution of EGI  EUDAT:  Common storage, computing, metadata services to large research communities (typically ESFRI)  Robust solutions to replicate and optimize data access. Objective 2: Common Services for Big Data

 The 4 th paradigm for science  A new data-centric way of conceptualising, organising and carrying out research activities  New approaches to solve problems that were previously considered extremely hard/ impossible to solve  This will lead to serendipitous discoveries and significant scientific breakthroughs Objective 3: Training of Data Scientists

 Cardiff University  CERN  CSC  CSC provides IT support and resources for academia and research institutes.  CSC is a part of the Finnish national collaboration on building EISCAT-3D in coordination with the other member states.  Planned role to provide capacity and expertise in data management, HPC/Cloud services and connecting the EISCAT stations with high-speed networks.  CSC’s modular Data Center in Kajaani offers >2200 Tflops HPC capacity in  EGI  EISCAT  EUDAT  University of Edinburgh Participating Organisations