Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Slides:



Advertisements
Similar presentations
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
Advertisements

Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science. Numerical Simulations of Metal Forming Production Processes and Cycles by.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Mrs. Maninder Kaur Mrs. Maninder Kaur 1 Architecture of DBMS
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
Disk Array Performance Estimation AGH University of Science and Technology Department of Computer Science Jacek Marmuszewski Darin Nikołow, Marek Pogoda,
MIS 710 Module 0 Database fundamentals Arijit Sengupta.
Towards auto-scaling in Atmosphere cloud platform Tomasz Bartyński 1, Marek Kasztelnik 1, Bartosz Wilk 1, Marian Bubak 1,2 AGH University of Science and.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.
Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Optimisation of Data Access in Grid Environment* Darin Nikolow 1 Renata Słota 1 Łukasz Dutka 1 Jacek Kitowski 12 Piotr Nyczyk 1 Mariusz Dziewierz 1 1.
Advanced Grid-Enabled System for Online Application Monitoring Main Service Manager is a central component, one per each.
Operating Systems (CS 340 D) Dr. Abeer Mahmoud Princess Nora University Faculty of Computer & Information Systems Computer science Department.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Computer Architecture Lecture 28 Fasih ur Rehman.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science Daniel Bachniak 1, Jakub Liput 2, Łukasz Rauch 1, Renata Słota 2,3, Jacek.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Recording application executions enriched with domain semantics of computations and data Master of Science Thesis Michał Pelczar Krakow,
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
In each iteration macro model creates several micro modules, sends data to them and waits for the results. Using Akka Actors for Managing Iterations in.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
High Level Architecture (HLA)  used for building interactive simulations  connects geographically distributed nodes  time management (for time- and.
Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications.
Scalarm: Scalable Platform for Data Farming D. Król, Ł. Dutka, M. Wrzeszcz, B. Kryza, R. Słota and J. Kitowski ACC Cyfronet AGH KU KDM, Zakopane, 2013.
Lecture 3 Page 1 CS 111 Online Disk Drives An especially important and complex form of I/O device Still the primary method of providing stable storage.
KUKDM’2011, Zakopane Semantic Based Storage QoS Management Methodology Renata Słota, Darin Nikolow, Jacek Kitowski Institute of Computer Science AGH-UST,
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.
VMware vSphere Configuration and Management v6
Company small business cloud solution Client UNIVERSITY OF BEDFORDSHIRE.
Federating PL-Grid Computational Resources with the Atmosphere Cloud Platform Piotr Nowakowski, Marek Kasztelnik, Tomasz Bartyński, Tomasz Gubała, Daniel.
Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science Górecki 1,2, Bachniak 1,2, Liput 2, Rauch 1,2, Kitowski 2,3, Pietrzyk 1,2.
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
Chapter 2 Introduction to OS Chien-Chung Shen CIS/UD
Operating Systems: Summary INF1060: Introduction to Operating Systems and Data Communication.
Konrad Zemek, Łukasz Opioła, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST.
Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
ONEDATA Way to access to your Data at the global scale Lukasz Dutka, R. Slota, M. Wrzeszcz, D. Krol, L. Opiola, R. Slota, J. Kitowski ACK Cyfronet AGH.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
PL-Grid: Polish Infrastructure for Supporting Computational Science in the European Research Space 1 ESIF - The PLGrid Experience ACK Cyfronet AGH PL-Grid.
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Efficient data maintenance in GlusterFS using databases
DI4R Conference, September, 28-30, 2016, Krakow
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
GSAF Grid Storage Access Framework
University of Technology
Storage Virtualization
Chapter 1: Introduction
The Onedata platform Konrad Zemek, Krzysztof Trzepla ACC Cyfronet AGH
Mariusz Sterzel1 , Lukasz Dutka1, Tomasz Szepieniec1
Database System Architectures
Efficient Migration of Large-memory VMs Using Private Virtual Memory
The Design and Implementation of a Log-Structured File System
Presentation transcript:

Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków, Poland, October 26-28, 2015 Efficient Storing of Metadata for Distributed Data Management

Distributed data management in global environment onedata System’s description Data and metadata organization Metadata challenges in onedata Analyzed solutions Proposed solution Performance tests Conclusions Agenda

Managing data over different storage solution in globally dispersed environments is hot topic. Global data management challenges are investigated by many research and commercial groups. Distributed Data Management in Global Environment

Onedata is a distributed data management system that virtualizes access to organizationally distributed data and hides environment’s complexity where there is no trust between resources providers. Data and metadata organization is a key to provide: easy view on data for each user, automatic data management for better efficiency. Onedata – overall description

Direct access whenever possible Management of blocks’ replicas to minimize delays Caching, prefetching and fast parallel transport Onedata – work in distributed environment

Data organization Spaces Logical files ProvidersStorages Users Groups Logical files organization via spaces separates users from problems connected with resources and data locations’ management.

Results of data organization design Easy management and sharing of data for users. Limitation of metadata that each provider stores and processes.

Metadata organization 3 levels of metadata for data organization and usage description 1.Metadata used to coordinate providers’ cooperation 2.Files metadata stored by each provider 3.Current usage metadata Usage optimization Lower level -> more frequent usage -> higher distribution

Metadata challenges in onedata Too slow storing of metadata when all metadata is stored on disk Risk of loosing important metadata when metadata is saved only in memory Examples: metadata that describes location of actual data file has to be persistent metadata that describes the way files are used by current sessions should be - at most - available as long as the session is active and be available extremely fast

Various solutions In-memory vs. persistent databases Standalone vs. build-in applications Examples: Mnesia, Redis, Riak, Couchbase, Cassandra No solution with all 3 features: Safety High throughput (many operations per seconds) Low delay Analysed solutions

Proposed solution - datastore Models API that defines how specific types of metadata should be stored (e.g. in global memory) Stores Elements where data is kept Worker with API Set of functionalities for data access optimization

Datastore key features and examples Dynamic Cache System Datastore allows to set one store as cache for other Reads and writes are done on cache Writes are aggregated and done asynchronous Dynamic load/unload of data from cache when needed Hooks for models cooperation Separation of models Easy reaction for other models actions Exemplary models: file_meta, session, task_pool

Performance tests Speed vs. risk of metadata loss Cache as compromise

Conclusions For systems that globalize data access, efficient metadata management is key element. Proposed datastore provides flexible, efficient and safe solution for storing of metadata. Proposed datastore allows onedata to provide data access in a globally distributed environment.

Thank you onedata homepage: See also: Łukasz Dutka, Michał Wrzeszcz, Tomasz Lichoń, Rafał Słota, Konrad Zemek, Krzysztof Trzepla, Łukasz Opioła, Renata Słota, and Jacek Kitowski. Onedata - a Step Forward towards Globalization of Data Access for Computing Infrastructures, ICCS 2015 Computational Science at the Gates of Nature, Procedia Computer Science, volume 51, pages 2843– M. Wrzeszcz, T. Lichoń, R. Słota, K. Zemek, K. Trzepla, Ł. Opioła, D. Nikolow, Ł. Dutka, R. Słota and J. Kitowski, Metadata Organization and Management for Globalization of Data Access with onedata, PPAM 2015 : book of abstracts, 2015, pp. 31 MichałWrzeszcz,ŁukaszDutka,RenataSłota,andJacekKitowski.VeilFS-AnewfaceofStorage as a Service. In eChallenges e-2014, 2014 Conference, pages 1–10, Oct Łukasz Dutka, Renata Słota, Michał Wrzeszcz, Dariusz Król, and Jacek Kitowski. Uniform and Efficient Access to Data in Organizationally Distributed Environments. eScience on Distributed Computing Infrastructure, volume 8500 of Lecture Notes in Computer Science, pages 178–194. Springer International Publishing, Słota,R., Dutka,Ł., Wrzeszcz,M. Kryza,B., Nikolow,D., Król, D., Kitowski, J.: Storage Management Systems for Organizationally Distributed Environments - PLGrid PLUS Case Study. Lecture Notes in Computer Science, Vol. 8384, 2014, pp. 724–733.