Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications.

Slides:



Advertisements
Similar presentations
Polish Tier-2 Ryszard Gokieli Institute for Nuclear Studies Warsaw.
Advertisements

C. Grimme, A. Papaspyrou Scheduling in C3-Grid AstroGrid-D Workshop Project: C3-Grid Collaborative Climate Community Data and Processing Grid Scheduling.
High Performance Computing Course Notes Grid Computing.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Services and Operations in Polish NGI M. Radecki,
Duke Atlas Tier 3 Site Doug Benjamin (Duke University)
Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.
Secure Off Site Backup at CERN Katrine Aam Svendsen.
5 Nov 2001CGW'01 CrossGrid Testbed Node at ACC CYFRONET AGH Andrzej Ozieblo, Krzysztof Gawel, Marek Pogoda 5 Nov 2001.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
GridLab & Cactus Joni Kivi Maarit Lintunen. GridLab  A project funded by the European Commission  The project was started in January 2002  Software.
Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science. Numerical Simulations of Metal Forming Production Processes and Cycles by.
Computing in Poland from the Grid/EGEE/WLCG point of view Ryszard Gokieli Institute for Nuclear Studies Warsaw Gratefully acknowledging slides from: P.Lasoń.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Cracow Grid Workshop’10 Kraków, October 11-13,
Cracow - CYFRONET PACKAGING pack into portable format e.g. rpm PACKAGING pack into portable format e.g. rpm PACKAGING pack into portable format e.g. rpm.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
Seamless Medical Image Processing on the Grid on the Example of Segmentation and Partition of the Airspaces Andrzej Rutkowski 1, Michał Chlebiej 1, Marcelina.
INFSO-RI Enabling Grids for E-sciencE FloodGrid application Ladislav Hluchy, Viet D. Tran Institute of Informatics, SAS Slovakia.
Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.
Grid Information Systems. Two grid information problems Two problems  Monitoring  Discovery We can use similar techniques for both.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
Cracow Grid Workshop, October 15-17, 2007 Polish Grid Polish Grid: National Grid Initiative in Poland Jacek Kitowski Institute of Computer Science AGH-UST.
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
Cluster currently consists of: 1 Dell PowerEdge Ghz Dual, quad core Xeons (8 cores) and 16G of RAM Original GRIDVM - SL4 VM-Ware host 1 Dell PowerEdge.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
J OINT I NSTITUTE FOR N UCLEAR R ESEARCH OFF-LINE DATA PROCESSING GRID-SYSTEM MODELLING FOR NICA 1 Nechaevskiy A. Dubna, 2012.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
GLOBAL GRID FORUM 10 Workflows in PROGRESS and GridLab environments Michał Kosiedowski.
The PROGRESS Grid Service Provider Maciej Bogdański Portals & Portlets 2003 Edinburgh, July 14th-17th.
BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
The ILC And the Grid Andreas Gellrich DESY LCWS2007 DESY, Hamburg, Germany
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
High Level Architecture (HLA)  used for building interactive simulations  connects geographically distributed nodes  time management (for time- and.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Scalarm: Scalable Platform for Data Farming D. Król, Ł. Dutka, M. Wrzeszcz, B. Kryza, R. Słota and J. Kitowski ACC Cyfronet AGH KU KDM, Zakopane, 2013.
KUKDM’2011, Zakopane Semantic Based Storage QoS Management Methodology Renata Słota, Darin Nikolow, Jacek Kitowski Institute of Computer Science AGH-UST,
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Lightweight construction of rich scientific applications Daniel Harężlak(1), Marek Kasztelnik(1), Maciej Pawlik(1), Bartosz Wilk(1) and Marian Bubak(1,
Patryk Lasoń, Marek Magryś
Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,
Interactive European Grid Environment for HEP Application with Real Time Requirements Lukasz Dutka 1, Krzysztof Korcyl 2, Krzysztof Zielinski 1,3, Jacek.
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space The Capabilities of the GridSpace2 Experiment.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
G. Russo, D. Del Prete, S. Pardi Kick Off Meeting - Isola d'Elba, 2011 May 29th–June 01th A proposal for distributed computing monitoring for SuperB G.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
Polish Infrastructure for Supporting Computational Science in the European Research Space EUROPEAN UNION Grid Resource Bazaar Platform for resource allocation.
PL-Grid: Polish Infrastructure for Supporting Computational Science in the European Research Space 1 ESIF - The PLGrid Experience ACK Cyfronet AGH PL-Grid.
Piotr Bała, Marcin Radecki, Krzysztof Benedyczak
Brief introduction about “Grid at LNS”
Accessing the VI-SEEM infrastructure
GridOS: Operating System Services for Grid Architectures
LIGHTWEIGHT CLOUD COMPUTING FOR FAULT-TOLERANT DATA STORAGE MANAGEMENT
Victor Abramovsky Laboratory “GRID technology in modern physics”
Grid Computing.
CompChem VO: User experience using MPI
Simulation use cases for T2 in ALICE
University of Technology
H2020 EU PROJECT | Topic SC1-DTH | GA:
Introduction to the SHIWA Simulation Platform EGI User Forum,
Presentation transcript:

Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications in PL-Grid R. Slota (1,2), D. Król (1), K. Skalkowski (1), B. Kryza (1), D. Nikolow (1,2), and J. Kitowski (1,2) (1) ACC Cyfronet AGH, Kraków, Poland (2) Institute of Computer Science AGH-UST, Krakow, Poland KU KDM 2011 Zakopane,

Agenda 1. Data intensive applications 2. Research and implementation goals 3. Non-functional requirements in data management 4. FiVO/QStorMan toolkit components and architecture 5. FiVO/QStorMan usage 6. Testing scenarios 7. Results 8. Conclusions

Data intensive applications Main features:  Generate gigabytes (or more) of data per day.  Different types of data which require different types of storage.  Heavily uses read/write operations.  The run time of an application heavily depends on storage access time and transfer speed rather than the computation time. Scientific examples (from wikipedia):  The LHC experiment produces 15 PB/year = ~42 TB/day = ~1 GB/s  The German Climate Computing Center (DKRZ) has a storage capacity of 60 petabytes of climate data.

Research and implementation goals The main objective of the presented research is to manage the data coming from Grid applications using the following concepts:  allowing users to define non-functional requirements for storage devices explicitly,  exploiting a knowledge base of the VO extended with descriptions of storage elements  exploiting information from storage monitoring systems and VO knowledge base to find the most suitable storage device complient with the defined requirements

Non-functional requirements in data management  Data intensive applications may have different requirements, e.g. important data should be replicated  Abstraction of storage elements prevents users from influencing the actual location of data  Distribution of data among available storage elements according to the defined requirements Sample non-functional requirements: freeCapacity currentReadTransferRate averageWriteTransferRate

FiVO/QStorMan toolkit

FiVO/QStorMan usage 1. Using QStorMan portal: Declare your non-functional requirements in the QStorMan portlet. Copy and paste the returned text from the portlet to your JDL file. 2. Using C++ programming library (libses): #include using namespace lustre_api_library; LustreManager manager; StoragePolicy policy; policy.setAverageReadTransferRate(50); policy.setCapacity(100); int descriptor = manager.createFile(„nazwa_pliku.dat”, &policy); 3. Using system C library: declare your non-functional requirements in the GOM knowledge base export LD_PRELOAD=

FiVO/QStorMan testing environment ACC Cyfronet AGH (Cracow) Scientific Linux SL release 5.5 (Boron) 2x Intel(R) Xeon(R) CPU 2.50GHz (4 cores, 1 thread per core) MB RAM ~ 12 TB storage capacity, ~150 MB/s read transfer rate, ~70 MB/s write transfer rate PCSS (Poznan): Scientific Linux CERN SLC release 5.5 (Boron) Intel(R) Xeon(R) CPU 3.00GHz (2 cores, 1 thread per core) 1000 MB RAM ~ 14 TB storage capacity, ~55 MB/s read trasfer rate, ~46 MB/s write trasfer rate ICM (Warsaw): CentOS release 5.5 (Final) Intel(R) Xeon(R) CPU 2.40GHz (4 cores, 1 thread per core) 7975 MB RAM ~ 5 TB storage capacity, ~50 MB/s read trasfer rate, ~27 MB/s write trasfer rate

Testing scenario Scenario: Aims to simulate a Grid job which is scheduled to run in the most suitable data center. The job performs computation and then writes data. Scenario parameters: Number of users – 3 (2 users used QStorMan) and 4 File size – 512 MB Number of files to write – 20, 30, 40 Scenario: Aims to simulate a Grid job which is scheduled to run in the most suitable data center. The job performs computation and then writes data. Scenario parameters: Number of users – 3 (2 users used QStorMan) File size – 512 MB Number of files to write – 20, 30, 40

Results

Conclusions and Future work  The presented research goal is to develop new approaches to issues of storage management in the Grid environment  Explicit definitions of non-functional requirements are necessary in data intensive applications  Allowing to accelerate data-oriented Grid applications by ~45% without any modifications in source code Future work:  Integration with Grid Queuing Systems  Integration with Virtual Organizations

Do you want to know more ? or