Optimisation of Data Access in Grid Environment* Darin Nikolow 1 Renata Słota 1 Łukasz Dutka 1 Jacek Kitowski 12 Piotr Nyczyk 1 Mariusz Dziewierz 1 1.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Inktomi Confidential and Proprietary The Inktomi Climate Lab: An Integrated Environment for Analyzing and Simulating Customer Network Traffic Stephane.
Welcome to Middleware Joseph Amrithraj
Database System Concepts and Architecture
High Performance Computing Course Notes Grid Computing.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Optimizing of data access using replication technique Renata Słota 1, Darin Nikolow 1,Łukasz Skitał 2, Jacek Kitowski 1,2 1 Institute of Computer Science.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
5 Nov 2001CGW'01 CrossGrid Testbed Node at ACC CYFRONET AGH Andrzej Ozieblo, Krzysztof Gawel, Marek Pogoda 5 Nov 2001.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Chapter 9 Overview  Reasons to monitor SQL Server  Performance Monitoring and Tuning  Tools for Monitoring SQL Server  Common Monitoring and Tuning.
Module 8: Monitoring SQL Server for Performance. Overview Why to Monitor SQL Server Performance Monitoring and Tuning Tools for Monitoring SQL Server.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
EUROPEAN UNION Polish Infrastructure for Supporting Computational Science in the European Research Space Towards scalable, semantic-based virtualized storage.
Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.
Influence of Virtualization on Process of Grid Application Deployment Distributed Systems Research Group Department of Computer Science AGH-UST Cracow,
Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
IT Infrastructure Chap 1: Definition
Authors: Jiann-Liang Chenz, Szu-Lin Wuy,Yang-Fang Li, Pei-Jia Yang,Yanuarius Teofilus Larosa th International Wireless Communications and Mobile.
ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.
Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
File and Object Replication in Data Grids Chin-Yi Tsai.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Polish Infrastructure for Supporting Computational Science in the European Research Space FiVO/QStorMan: toolkit for supporting data-oriented applications.
1 GCA Application in STAR GCA Collaboration Grand Challenge Architecture and its Interface to STAR Sasha Vaniachine presenting for the Grand Challenge.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
KUKDM’2011, Zakopane Semantic Based Storage QoS Management Methodology Renata Słota, Darin Nikolow, Jacek Kitowski Institute of Computer Science AGH-UST,
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.
August 28, 2003APAN, Logistical Networking WS DiDaS Distributed Data Storage Ludek Matyska Masaryk University, Institute of Comp. Sci. and CESNET, z.s.p.o.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Metadata Organization and Management for Globalization of Data Access with Michał Wrzeszcz, Krzysztof Trzepla, Rafał Słota, Konrad Zemek, Tomasz Lichoń,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Course: Cluster, grid and cloud computing systems Course author: Prof
Databases and DBMSs Todd S. Bacastow January 2005.
Reducing Risk with Cloud Storage
Module 11: File Structure
The Data Grid: Towards an architecture for Distributed Management
Open Source distributed document DB for an enterprise
Computing Infrastructure for DAQ, DM and SC
Storage Virtualization
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
An Introduction to Computer Networking
Distributed File Systems
Database System Concepts and Architecture
Data Management Components for a Research Data Archive
IBM Tivoli Storage Manager
Presentation transcript:

Optimisation of Data Access in Grid Environment* Darin Nikolow 1 Renata Słota 1 Łukasz Dutka 1 Jacek Kitowski 12 Piotr Nyczyk 1 Mariusz Dziewierz 1 1 Institute of Computer Science - AGH 2 Academic Computer Centre CYFRONET - AGH University of Mining and Metallurgy, Cracow, Poland Cracow Grid Workshop, Nov.5-6, 2001 *CrossGrid Project - Task 3.4

Outline Background Bottom-top approach Media management software –middleware for existing HSM –dedicated VTSS Local component-expert systems Global policy for migration/replication FOR MORE INFO...

Motivation Big and growing stuff of data Multimedia database systems (applications - medical, educational, virtual reality, virtual laboratories, digital libraries, advanced simulations,...) Solution: Tertiary Storage Systems (TSS) = Media Libraries + Management Software Examples of existing TSS: HPSS, DataCutter, APRIL, Condor, OmniStore, UniTree, Possible directions –Data access time estimation system - efficient usage –Data distribution and grid implementation - large scale experiments –Expert system for data management –Replication policies

PARMED Project (Uni. of Klagenfurt - Uni. of Mining & Metall. Cracow) –to support physicians with telematic services for: long distance collaboration of medical centers, medical teleeducation case archives Background Client Site 1 Video Server Storage Server Client Site 2 Video Server Client Site 3 Storage Server Client Site 4 Disk Server Meta-Database WAN r1 a1 d1 r2 a2 d2 r3 a3 d3

Bottom-top approach - Major Components HSMUniTreeCastorHPSS Storage System Replica Management Replica Selection Metadata Repository LDAP..... Resource Management Assumptions –mechanism neutrality –policy neutrality –compatibility with grid infrastructure –uniformity of information infrastructure

Media Management Software –Nikolow, D., Słota, R., Kitowski, J., Nyczyk, P., Otfinowski, J., "Tertiary Storage System for Index-Based Retrieving of Video Sequences", Proc. Int. Conf. HPCN, Amsterdam, June 25-27, 2001, Lect.Notes in Comp. Sci. 2110, pp , Springer, –Nikolow, D., Slota, R., Kitowski, J., ”Benchmarking TertiaryStorage Systems with File Fragmentation”, PPAM Conf., Nałęczów, Lect.Notes in Comp.Sci., accepted.

Media Management Software and its usage in X# Darin Nikolow

Motivation Main purpose of the developed TSS: efficient index-based retrieving of video fragments (instead of file fragments) –specific requirements for frequent data reading startup latency transfer time minimal transfer rate > video bitrate Two prototypes proposed and benchmarked –middleware layer for existing HSM –dedicated TSS The developed systems are of general use -> possible grid implementations

Multimedia Storage and Retrieval System (MMSRS) Requirements –use existing software (UniTree HSM) –reduce latency (start-up delay), i.e. -reduce file granularity –file fragmentation (subfiles) Implementation –splitting files into pieces of similar size Middleware layer on HSM Consists of: –Automated Media Library –UniTree HSM managing system –MPEG extension for HSM (MEH) MEH receives the name of video file and the frame range - start/end frames output stream via HTTP

Video Tertiary Storage System (VTSS) Repository Daemon REPD –keeps repository information Tertiary File Manager Daemon TFMD –manages: filedb - tape ident and startup position of the fragment tapedb - information about tape usage Dedicated TSS Client requests to VTSS can be of the following kinds: – write a new file to VTSS, read a file fragment from VTSS, delete a file from VTSS. The fragment range is defined in the frame units Two daemons implemented in C using Unix sockets

MMSRS and VTSS performance Hardware (AML Quantum|ATL) –ATL 4/52 (DLT 2000) –ATL 7100 (DLT 7000) –HP D-class server (with UniTree HSM) Data –790 MB MPEG1 file with B=0.4 MB/s bitrate (33 min.) –subfile for MMSRS - 16 MB (8,16, 32 MB tested) as short as possible to keep reproducing smooth (low latency) “optimal” subfile length depends on –positioning time –drive transfer rate –bitrate of the video file

Benchmarks Startup latency - time elapsed from issuing the request to receiving the first byte Transfer time - time from receiving the first byte till the end of transmission Minimal rate - minimal transfer rate experienced by a client with endless buffer (should be greater than the bitrate of the video stream to have smooth reproduction)

Startup latency MMSRS (DLT2000) VTSS (DLT2000) VTSS (DLT7000) UniTree reference startup latency = 718 s UniTree reference startup latency = 718 s

Transfer time (beginning part shown only) MMSRS (DLT2000) VTSS (DLT7000) VTSS (DLT2000) UniTree reference transfer time = 135 s

System performance for the whole video file transfer (DLT2000)

Definitions (for VTSS) –Minimal transfer rate –Time offset for tape changing direction Minimal transfer rate – n - number of packets – B j - number of bytes in j -th packet – t i - time when i -th packet was received – T - tape capacity in MB – N - number of tracks – B r - bitrate of video file in MB/s –no bad blocks

For DLT2000: – T = 10 GB – N = 64 – B r = 0.4 MB/s Minimal transfer rate MMSRS (DLT2000) VTSS (DLT7000) VTSS (DLT2000) Q dt = 400 s For DLT7000: – T = 35 GB – N = 52 – B r = 0.4 MB/s Q dt = 1723 s

Access Time Estimation: Motivation for X# Retrieving a file from TSS could last few seconds or few hours User’s satisfaction increases when the access time of data is known (e.g. user waiting to watch selected video; administrator recovering from backup) Efficient use of storage resources in Grid environment (data replication subsystem)

Access Time Estimation: Approaches Open TSS approach source code changes will be used as experimental platform Black Box TSS approach - for existing HSMs in X# sites retrieving TSS’s state info via its native tools and available internal files

Access Time Estimation - Open TSS Approach TSS* TSS Symulator Client ETA of req. id? [3] ETA [4] req. [1] data req. id [2] events *TSS source code changes - adding event reporting functions

Access Time Estimation - Black Box TSS Approach TSS databases conf. files logs Monitoring tools Disk cache TSS Monitor TSS Simulator Request Monitor & Proxy Client events collecting update [4] TSS state [5] ETA [6] fileid [2] queue state [3] feedback [12] data [10] fileid ETA? [1] ETA [7] fileid [8]data [11] Needed info by Simulator:  nr of drives  tape labels  media types  position of file in media  nr of requests ... fileid [9]

Conclusions MMSRS and VTSS more efficient than standard UniTree HSM MMSRS efficient enough to be used as a middleware for existing HSM of UniTree type (in X# sites) Proposed measurements could be used for: –building more sophisticated distributed storage systems (faster access to files stored in TSS) –building access time estimation subsystem Access time estimation subsystem --->>> an information provider for X# replication and migration of data

Component-expert Systems –Dutka, Ł., and Kitowski, J., „Implementation of expert technologies in information systems based on a component methodology”, MSK 2001 Conf., Nov ,2001 Cracow, accepted (in Polish). –Dutka, Ł., and Kitowski, J., „Component-expert technology in mass-storage grid applications”, ICCS 2002 Conf., April 2002, Amsterdam, in preparation.

Basics of Component-Expert Technology and its usage in X# Łukasz Dutka

Classical component strategy

Component-expert strategy

Component structure

Component header structure

Structure of component code

Call-Environment Describe state of the call place Describe call place requirements Caries information about user or programmer wishes Expert system processes Call-Environment and finds best component for given Call-Environment

Expert Subsystem Rule-based expert system Typical rule looks like If log-expr Then action1 Else action2 The rules describe what is meant by : The best component for given Call-Environment Expert system logs calls and stores deduction results for further analysis

Profits from Component-Expert technology Dynamic expanding system possibility Ease of solving new problems Minimising programmer responsibility for component choice Ease of programming in heterogeneous environment Maximal reusable of components Internal simplicity of components code Increase efficiency of programming process

Component-Expert Technology for X# Task 3.4

Basic analysis of Data-access problems in X# Different data set types Huge data files Distributed environment Long distance connections Mission critical applications Heterogeneous data storing systems Heterogeneous computing systems Open system Unpredictable file types

Basic connection diagram

Sequence Diagram

Example of Component-Expert technology usage for data access in X# Sample Attributes –User ID –Computing Node ID –Preferred replica localisation –Required throughput –Application purpose –Data sharing –Critical level –Replica expiration..... Example of local decisions –Devices choosing (according to availability and type) –Storing format (blocks, multimedia streams,......) –Available delivering performance (network, storage devices,....) –... And much more...

Control System for Migration/Replication Strategies (1/2) Assumptions –replica file instances –read only –no update, no coherence Logical file metadata File instance pointer Logical file Replica catalog repository Storage metadata File instance metadata Logical file metadata File instance pointer Logical file Replica catalog repository Storage metadata File instance metadata Storage system File instance Storage system File instance From replica manager

System Management for Migration/Replication Strategies (2/2) In cooperation with other projects High-level control system (e.g. cooperating with LDAP) Two possible realizations –heuristic reinforcement learning based on heuristic strategies for migration/replication and system state –classical rule-based expert system

Conclusions Some elements have been defined and implemented Working on higher level structure and cooperation with other X# modules and services