KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Steinbuch Centre for Computing (SCC) www.kit.edu.

Slides:



Advertisements
Similar presentations
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
Advertisements

Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.
Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
High Performance Computing Course Notes Grid Computing.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
New Generation SDI and Cyber-Infrastructure Prof. Guoqing Li CEODE/CAS March 29, 2009, Newport Beach, USA Presented to 4th China-US Roundtable Meeting.
1 Large-scale Data Processing Challenges David Wallom.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
KIT – Universität des Landes Baden-Württemberg und nationales Forschungszentrum in der Helmholtz-Gemeinschaft Welcome to GridKa School 2015.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
LHC Computing Plans Scale of the challenge Computing model Resource estimates Financial implications Plans in Canada.
ALICE Upgrade for Run3: Computing HL-LHC Trigger, Online and Offline Computing Working Group Topical Workshop Sep 5 th 2014.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
1 HiGrade Kick-off Welcome to DESY Hamburg Zeuthen.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
7. Grid Computing Systems and Resource Management
| nectar.org.au NECTAR TRAINING Module 2 Virtual Laboratories and eResearch Tools.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
LHC Computing, CERN, & Federated Identities
The ATLAS Computing Model and USATLAS Tier-2/Tier-3 Meeting Shawn McKee University of Michigan Joint Techs, FNAL July 16 th, 2007.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: /zenodo.45449, CC-BY-SA,
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
WHAT SURF DOES FOR RESEARCH SURF’s Science Engagement TNC15 June 18, 2015 Sylvia Kuijpers (SURFnet)
The Helmholtz Association Project „Large Scale Data Management and Analysis“ (LSDMA) Kilian Schwarz, GSI; Christopher Jung, KIT.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Enabling Grids for E-sciencE INFSO-RI Dr. Rüdiger Berlich Forschungszentrum Karslruhe Introduction to Grid Computing Christopher.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
Cremlin Kick Off | DESY-IT | V. Gülzow | Seite 1 Untertitel durch Klicken bearbeiten e-Infrastructures & Big Data Handling Volker Guelzow DESY Moscow,
KIT – The Research University in the Helmholtz Association Welcome to GridKa School 2016 Hannes Hartenstein.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Grid site as a tool for data processing and data analysis
Tools and Services Workshop
Joslynn Lee – Data Science Educator
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Dagmar Adamova, NPI AS CR Prague/Rez
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
New strategies of the LHC experiments to meet
EGI Webinar - Introduction -
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association Steinbuch Centre for Computing (SCC) The Large Scale Data Management and Analysis Project (LSDMA) Dr. Andreas Heiss, SCC, KIT

2September 12, 2013 Dr. Andreas Heiss Introducing KIT and SCC Big Data Infrastructures at KIT: GridKa and the Large Scale Data Facility (LSDF) Large Scale Data Management and Analysis (LSDMA) Summary and Outlook Overview

3September 12, 2013 Dr. Andreas Heiss KIT is both state university with research and teaching and research center of the Helmholtz Association with program oriented provident research Objectives: research teaching innovation Introducing KIT Numbers 24,000 students 9,400 employees 3,200 PhD researchers 370 professors 790 million EUR annual budget in 2012

4September 12, 2013 Dr. Andreas Heiss Provisioning and development of IT services for KIT and beyond R&D High Performance Computing Grids and Clouds Big Data ~ 200 employees in total 50% scientists 50% technicians, administrative personnel and student assistants named after Karl Steinbuch, professor at Karlsruhe University, creator of the term “Informatik” (German term for computer science) Introducing Steinbuch Center for Computing

5September 12, 2013 Dr. Andreas Heiss Big Data Comparing Google trends Cloud computing Big Data Grid Computing

6September 12, 2013 Dr. Andreas Heiss Big Data Cloud computing Big Data Grid Computing Comparing Google trends

7September 12, 2013 Dr. Andreas Heiss “In those days Caesar Augustus issued a decree that a census should be taken of the entire Roman world.” (Luke 2,1) Big Data 2000 years ago clearly defined purpose for collecting data: tax lists of all tax payers data collection distributed analog time-consuming distributed storage of data tedious data aggregation

8September 12, 2013 Dr. Andreas Heiss Big Data today One Buzzword ….. various challenges! Industry Science -Data mining -Business intelligence -Get additional information from (often) already existing data. -Data aggregation -Typically O(10) or O(100) TBs New field to make money! -Products -Services -Market shared between some ‘big players’ and many start- ups / spin-offs! -Handling huge amounts of data -PetaBytes -Distributed data sources and/or storage -(Global) data management -High Throughput -Data preservation

9September 12, 2013 Dr. Andreas Heiss Definition of Data Science Venn-Diagramm by Drew Conway (IA Ventures)

10September 12, 2013 Dr. Andreas Heiss Goals search for the origin of mass understanding the early state of the universe LHC went live in 2008 four detectors main discovery until now: a Higgs boson Big Data in science: LHC at CERN Level 1 - Hardware Level 2 – Online Farm 40 MHz (1,000 TB/sec) equivalent Level 3 – Online Farm 300 Hz (250 MB/sec) 100 KHz (100 GB/sec digitized) 5 KHz (5 GB/sec) world-wide LHC community Goal for 2015: : 25 PB of data taken

11September 12, 2013 Dr. Andreas Heiss Goals search for the origin of mass understanding the early state of the universe LHC went live in 2008 four detectors main discovery until now: a Higgs boson Big Data in science: LHC at CERN Level 1 - Hardware Level 2 – Online Farm 40 MHz (1,000 TB/sec) equivalent Level 3 – Online Farm 300 Hz (250 MB/sec) 100 KHz (100 GB/sec digitized) 5 KHz (5 GB/sec) world-wide LHC community Goal for 2015: : 25 PB of data taken O(1000) physicists distributed worldwide

12September 12, 2013 Dr. Andreas Heiss Hierarchy of services, response times and availability: 1 Tier-0 center at CERN copy of all raw data (tape) first pass reconstruction 11 Tier-1 centers worldwide 2 to 3 distributed copies of raw data large-scale data reprocessing Storage of simulated data from Tier-2 centers tape storage ~150 Tier-2 centers worldwide user analysis simulations Worldwide LHC Computing Grid – Hierarchical Tier Structure Hierarchy Courtesy of Ian Bird, CERN Mesh Hierarchical model relaxed

13September 12, 2013 Dr. Andreas Heiss Big Data in science: DNA sequencing MB GB

14September 12, 2013 Dr. Andreas Heiss Big Data in science: synchrotron light sources Source: Wikipedia KIT

15September 12, 2013 Dr. Andreas Heiss Big Data in science: synchrotron light sources Dectris Pilatus 6M 2463 x 2527 pixels 7 MB images 25 frames/s 175 MB/s Several TB/day Data doesn‘t fit any more on USB drive Users are usually not affiliated to the synchrotron lab Users from physics, biology, chemistry, material sciences, …

16September 12, 2013 Dr. Andreas Heiss Big Data in science: high throughput imaging Imaging machines / microscope 1 – 100 frames/s => up to 800 MByte/s => O(10) TBytes/day Reconstruction of zebrafish early embryonic development

17September 12, 2013 Dr. Andreas Heiss Big Data in science Many research areas, where the data growth is very fast Biology, chemistry, earth sciences, … Data sets became too big to take home Data rates require dedicated IT infrastructures to record and store Data analysis requires farms and clusters. Single PCs not sufficient. Collaborations require distributed infrastructures and networks Data management becomes a challenge Less IT experienced and IT interested people than e.g. in phyisics

18September 12, 2013 Dr. Andreas Heiss Definition of Data Science Venn-Diagramm by Drew Conway (IA Ventures) Physicist Biologist, chemist, …

19September 12, 2013 Dr. Andreas Heiss German WLCG Tier-1 Center Supports all LHC experiments + Belle II + several small communities and older experiments >10,000 cores Disk space: 12 PB, tape space: 17 PB 6x10 Gbit/s network connectivity ~ 15% of LHC data permanently stored at GridKa Services: file transfer, workload management, file catalog, … Global Grid User Support (GGUS): service development and operation of the trouble ticket system for the world-wide LHC Grid Annual international GridKa School 2013: ~140 participants from 19 countries KIT infrastructures: GridKa

20September 12, 2013 Dr. Andreas Heiss GridKa Experiences evolving demands and usage patterns no common workflows hardware commodity, software not hierarchical storage with tape is challenging data access and I/O is the central issue Different users / user communities have different data access methods and access patterns! on-site experiment representation highly useful

21September 12, 2013 Dr. Andreas Heiss Main goals provision of storage for multiple research groups at KIT and U-Heidelberg support of research groups in data analysis Resources and access 6 PB of on-line storage 6 PB of archival storage 100 GbE connection between and U-Heidelberg analysis cluster of 58*8 cores variety of storage protocols jointly funded by Helmholtz Association and state of Baden-Württemberg KIT infrastructure: Large Scale Data Facility

22September 12, 2013 Dr. Andreas Heiss LSDF set-up at KIT

23September 12, 2013 Dr. Andreas Heiss high demand for storage, analysis and archival research groups vary in research topics (from genetic sequencing to geophysics) size IT expertise need for services and protocols Important needs common to many user groups sharing data with other groups data security and preservation ‘consulting’ many small groups depend on LSDF LSDF experiences

24September 12, 2013 Dr. Andreas Heiss The Large Scale Data Management and Analysis (LSDMA) project: facts and figures Helmholtz portfolio extension initial project duration: partners: project coordinator: Achim Streit (KIT) sustainability: inclusion of activities into respective Helmholtz program- oriented funding in 2015 next annual international symposium: September 24 th at KIT

25September 12, 2013 Dr. Andreas Heiss Scientific Data Life Cycle

26September 12, 2013 Dr. Andreas Heiss LSDMA: Dual Approach Data Life Cycle Labs Joint R&D with scientific user communities optimization of the data life cycle community-specific data analysis tools and services Data Services Integration Team Generic methods R&D data analysis tools and services common to several DLCLs interface between federated data infrastructures and DLCLs/communities

27September 12, 2013 Dr. Andreas Heiss Selected LSDMA activities (I) DLCL Energy (KIT, U-Ulm) analyzing stereoscopic satellite images for estimating the efficiency of solar energy with Hadoop privacy policies for personal energy data DLCL Key Technologies (KIT, U-Heidelberg, U-Dresden) optimization of tomographical reconstruction using data-intensive computing visualization for high throughput microscopy DLCL Health (FZJ) workflow support for data-intensive parameter studies efficient metadata administration and indexing

28September 12, 2013 Dr. Andreas Heiss Selected LSDMA activities (II) DLCL Earth&Environment (KIT, DKRZ) MongoDB for data and metadata of meteorologic satellite data Data Replication within the European EUDAT project using iRods DLCL Structure of Matter (DESY, GSI, HTW) Development of a portal for PETRA-III data Determining the computing requirements for FAIR data analysis DSIT (all partners) Federated identity management Archive Federated storage (e.g. dCache) …

29September 12, 2013 Dr. Andreas Heiss Communities differ in previous knowledge level of specification of the data life cycle tools and services used Needs driven by increasing amount of data cooperation between groups policies open access/data long-term preservation LSDMA Challenges Within communities focus on data analysis high fluctuation of computing experts running tools and services Lessons learned interoperable AAI crucial data privacy very challenging, both legally and technically communities need evolution, not revolution needs can be very specific

30September 12, 2013 Dr. Andreas Heiss data facilities and R&D very important for KIT extensive experience at GridKa and LSDF wide variety of user communities often very specific needs Interoperable AAI and privacy crucial topics Today, data is important to basically all research topics more projects on state, national and international levels to come LSDMA: research on generic data methods, workflows and services and community specific support and R&D. Summary and Outlook