Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,

Slides:



Advertisements
Similar presentations
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Advertisements

Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.
G Robert Grimm New York University Disconnected Operation in the Coda File System.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
Overview Distributed vs. decentralized Why distributed databases
Ch1: File Systems and Databases Hachim Haddouti
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Grid Virtual Organization Semantic Framework for Knowledge Support Bartosz Kryza, Łukasz Dutka, Renata Słota, Jan Pieczykolan, Jacek Kitowski.
University of Pennsylvania 11/21/00CSE 3801 Distributed File Systems CSE 380 Lecture Note 14 Insup Lee.
Distributed Databases
Distributed Database and Replication. Distributed Database A logically interrelated collection of shared data and a description of this data physically.
Disk Array Performance Estimation AGH University of Science and Technology Department of Computer Science Jacek Marmuszewski Darin Nikołow, Marek Pogoda,
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
SQL Server Replication By Karthick P.K Technical Lead, Microsoft SQL Server.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
Construction of efficient PDP scheme for Distributed Cloud Storage. By Manognya Reddy Kondam.
Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.
Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.
Optimisation of Data Access in Grid Environment* Darin Nikolow 1 Renata Słota 1 Łukasz Dutka 1 Jacek Kitowski 12 Piotr Nyczyk 1 Mariusz Dziewierz 1 1.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
CGW 2003 Institute of Computer Science AGH Proposal of Adaptation of Legacy C/C++ Software to Grid Services Bartosz Baliś, Marian Bubak, Michał Węgiel,
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
Next Generation Domain-Services in PL-Grid Infrastructure for Polish Science Daniel Bachniak 1, Jakub Liput 2, Łukasz Rauch 1, Renata Słota 2,3, Jacek.
SeLeNe - Architecture George Samaras Kyriakos Karenos Larnaca – April 2003 THE UNIVERSITY OF CYPRUS.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
Distributed File System By Manshu Zhang. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Cracow Grid Workshop, October 27 – 29, 2003 Institute of Computer Science AGH Design of Distributed Grid Workflow Composition System Marian Bubak, Tomasz.
Running a Scientific Experiment on the Grid Vilnius, 13 rd May, 2008 by Tomasz Szepieniec IFJ PAN & CYFRONET.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Advantage of File-oriented system: it provides useful historical information about how data are managed earlier. File-oriented systems create many problems.
1 ACTIVE FAULT TOLERANT SYSTEM for OPEN DISTRIBUTED COMPUTING (Autonomic and Trusted Computing 2006) Giray Kömürcü.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Chapter 6.5 Distributed File Systems Summary Junfei Wen Fall 2013.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
1 Distributed Databases BUAD/American University Distributed Databases.
KUKDM’2011, Zakopane Semantic Based Storage QoS Management Methodology Renata Słota, Darin Nikolow, Jacek Kitowski Institute of Computer Science AGH-UST,
Author(s) Politehnica University of Bucharest Automatic Control and Computers Faculty Computer Science Department SSH-Based Efficient File Synchronization.
Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
Research data management using Globus ESIP Summer Meeting 2015 Rachana Ananthakrishnan University of Chicago
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
Konrad Zemek, Łukasz Opioła, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
ONEDATA Way to access to your Data at the global scale Lukasz Dutka, R. Slota, M. Wrzeszcz, D. Krol, L. Opiola, R. Slota, J. Kitowski ACK Cyfronet AGH.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
CyberSKA: Global Federated e-Infrastructure
File Systems Vs Database Systems
Distributed Database Management Systems
Nache: Design and Implementation of a Caching Proxy for NFSv4
University of Technology
Consistency in Distributed Systems
The Onedata platform Konrad Zemek, Krzysztof Trzepla ACC Cyfronet AGH
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Database System Architectures
Presentation transcript:

Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń, Łukasz Opioła, Darin Nikolow, Łukasz Dutka, Renata Słota, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST PPAM 2015 Krakow, Poland, September 6-9, 2015

Motivation Problems with Global Data Access Is a new tool needed? Onedata Design Assumptions Key Aspects of Data Access Global data organization Globally distributed metadata Results Conclusions Agenda

Scientific communities require global access that integrates independently managed resources. Metadata organization and management is a key to make global access effective, simple and convenient. Motivation

Storage heterogeneity and delays/bandwidth issue. Manual transfer of data before/after computations. No accounts integration: Difficult access (security issues). Problematic data sharing. Problems with Global Data Access

Is a new tool needed? iRODS LFC Dropbox GoogleDrive Globus Connect Gluster PanFS BeeFS Parrot

All organizations (providers) supporting a user have access to all data and meta-data concerning the given user. No central server for the metadata for the sake of performance and availability. No replication everything to everyone, optimally managing the redundancy data. Data access efficiency: Minimal overhead when the data is close to client. In the case of remote data an efficient fragment access. Onedata - Design Assumptions

Global data organization Hides complexity of data distribution from users Indicates which remote data should be observed by each organization Globally distributed metadata No trust between providers Caching vs. coherency Onedata - Key Aspects of Data Access

Global data organization Easy management and sharing of data for users. Limitation of metadata that provider should know.

3 metadata levels Metadata used to coordinate providers’ cooperation Files metadata stored by each provider Current usage metadata Usage optimization Lower level -> more frequent usage -> higher distribution Caching and aggregation of changes Changes pushing to caches Global metadata distribution

Supports cooperation (users accounts integration) Provides information which lower level metadata should be synchronized with whom (spaces metadata) Stored by Global Registry – distributed application which works as trusted mediator Global metadata distribution Level 1

Files metadata File parts location description Stored by each provider that supports particular space Fast access to needed metadata Limited number of synchronization operations Propagation of changes on the basis of Level 1 metadata Changes aggregation Automatic conflicts resolution Level 1 metadata caching Global metadata distribution Level 2

Metadata about current files usage Who should be notified about file change Where data is currently modified Stored by providers, cached by clients First aggregation at client side, second at provider’s Updates Level 2 metadata Global metadata distribution Level 3

Caching & aggregation vs. time needed to gain global consistency Set balance at provider level (dynamic clients reconfiguration) Locks for immediate consistency Global metadata distribution Sum up Global Registry Level 1 Provider 1 Level 2 Level 1 Cache Level 3 Client Level 3 Cache Provider 2 Level 3 Level 1 Cache Level 2 More changes -> lower level -> more power

Easy organization of data Global distribution hidden Easy results publishing Results Simplicity

Results Cooperation

Results Efficiency

Conclusions Data organization allows hiding global distribution from users keeping providers’ independence Ready for global users cooperation Efficient enough for computations Onedata status Onedata v1 installed in production environment of ACC Cyfronet AGH Onedata v2 currently tested by international organizations

Thank you onedata homepage: