Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain

Slides:



Advertisements
Similar presentations
Data Management Expert Panel - WP2. WP2 Overview.
Advertisements

Tactical Storage: Simple, Secure, and Semantic Access to Remote Data Prof. Douglas Thain University of Notre Dame
Separating Abstractions from Resources in a Tactical Storage System Douglas Thain, Sander Klous, Justin Wozniak, Paul Brenner, Aaron Striegel, and Jesus.
The Consequences of Decentralized Security in a Cooperative Storage System Douglas Thain, Chris Moretti, Paul Madrid, Phil Snowberger, and Jeff Hemmes.
High Performance Computing Course Notes Grid Computing.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Research Issues in Cooperative Computing Douglas Thain
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Positioning Dynamic Storage Caches for Transient Data Sudharshan VazhkudaiOak Ridge National Lab Douglas ThainUniversity of Notre Dame Xiaosong Ma North.
Separating Abstractions from Resources in a Tactical Storage System Douglas Thain University of Notre Dame
Research Issues in Cooperative Computing Douglas Thain
Enabling Data-Intensive Science with Tactical Storage Systems Prof. Douglas Thain University of Notre Dame
Separating Abstractions from Resources in a Tactical Storage System Douglas Thain University of Notre Dame
Workload Management Massimo Sgaravatto INFN Padova.
Hardware/Software Concepts Tran, Van Hoai Department of Systems & Networking Faculty of Computer Science & Engineering HCMC University of Technology.
Distributed Databases
The Origin of the VM/370 Time-sharing system Presented by Niranjan Soundararajan.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Web Based Applications
Computer System Architectures Computer System Software
Grappa: Grid access portal for physics applications Shava Smallen Extreme! Computing Laboratory Department of Physics Indiana University.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
GridFS Targeting Data Sharing in Grid Environments Marcelo Nery dos Santos / Renato Cerqueira PUC-Rio, Brazil Presented by: Francisco Silva.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
Networked Storage Technologies Douglas Thain University of Wisconsin GriPhyN NSF Project Review January 2003 Chicago.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Latest Relevant Techniques and Applications for Distributed File Systems Ela Sharda
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
Csi315csi315 Client/Server Models. Client/Server Environment LAN or WAN Server Data Berson, Fig 1.4, p.8 clients network.
Chapter 10: File-System Interface Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Jan 1, 2005 Chapter 10: File-System.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
Types of Operating Systems
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Background: Operating Systems Brad Karp UCL Computer Science CS GZ03 / M th November, 2008.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
NT SECURITY Introduction Security features of an operating system revolve around the principles of “Availability,” “Integrity,” and Confidentiality. For.
Virtual Workspaces Kate Keahey Argonne National Laboratory.
Enabling Data Intensive Science with Tactical Storage Systems Prof. Douglas Thain University of Notre Dame
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
Types of Operating Systems 1 Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
NeST: Network Storage John Bent, Venkateshwaran V Miron Livny, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau.
HNC COMPUTING - Network Concepts 1 Network Concepts Network Concepts Network Operating Systems Network Operating Systems.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Background Computer System Architectures Computer System Software.
LINUX Presented By Parvathy Subramanian. April 23, 2008LINUX, By Parvathy Subramanian2 Agenda ► Introduction ► Standard design for security systems ►
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
GridOS: Operating System Services for Grid Architectures
Business System Development
Distributed System Concepts and Architectures
Grid Canada Testbed using HEP applications
Haiyan Meng and Douglas Thain
Design Unit 26 Design a small or home office network
Chapter 2: Operating-System Structures
Basic organizations and memories in distributed computer systems
Chapter 2: Operating-System Structures
Lecture 4: File-System Interface
Presentation transcript:

Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain

Cooperative Computing Lab Sharing is Hard! Despite decades of research in distributed systems and operating systems, sharing computing resources is still technically and socially difficult! Most existing systems for sharing require: –Kernel level software. –A privileged login. –Centralized trust. –Loss of control over resources that you own.

Cooperative Computing Lab Example: Grid Computing Robert Gardner, et al. (102 authors) The Grid2003 Production Grid Principles and Practice IEEE HPDC 2004 The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by… ATLAS, CMS, SDSS, LIGO…

Cooperative Computing Lab Grid Computing Experience The good news: –27 sites with 2800 CPUs –40985 CPU-days provided over 6 months –10 applications with 1300 simultaneous jobs The bad news: –40-70 percent utilization –30 percent of jobs would fail –90 percent of failures were site problems –Most site failures were due to disk space.

Cooperative Computing Lab A Strange Problem Storage is Plentiful! –Large disks on every CPU, PDA, and iPod. –Typ. cluster has unused disks on each node. –MS filesystem study: most disks 90% free. –Tools for sharing: AFS, NFS, FTP, SCP... The problem: –Users are fixed to the abstractions provided by administrators: e.g. one NFS file system. –Result: 1000 people share one 40 GB disk.

Cooperative Computing Lab What if... Users could use any storage anywhere? I could borrow an unused disk for NFS? An entire cluster can be used as storage? Multiple clusters could be combined? All this could be done without root? Solution: Tactical Storage System (TSS)

Cooperative Computing Lab Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

Cooperative Computing Lab Tactical Storage Systems (TSS) A TSS allows any node to serve as a file server or as a file system client. All components can be deployed without special privileges – but with security. Users can build up complex structures. –Filesystems, databases, caches,... Two Independent Concepts: –Resources – The raw storage to be used. –Abstractions – The organization of storage.

file system file system file system file system file system file system file system Central Filesystem App Distributed Database Abstraction Adapter App Distributed Filesystem Abstraction Adapter App Cluster administrator controls policy on all storage in cluster UNIX Workstations owners control policy on each machine. file server file server file server file server file server file server file server UNIX ??? Adapter

Cooperative Computing Lab Three Components User-Level File Servers –Secure Remote File Access w/out root Storage Abstractions –Combine several file servers into one. Application Adapters –Attach existing applications w/out root.

Cooperative Computing Lab User-Level File Servers Unix-Like Access to Existing File Systems Complete Independence –choose friends –limit bandwidth –evict users? Trivial to Deploy –three steps Flexible Access Control file server file server Chirp Protocol Chirp Protocol file system

Cooperative Computing Lab Access Control in File Servers Unix Security is not Sufficient for the Job Authentication –Globus, Kerberos, Unix, Hostname, Address Authorization –Each directory has an access control: globus:/O=INFN/CN=Paolo_Mazzanti RWLA RWL hostname:*.bo.infn.it RL address: * RWLA

Cooperative Computing Lab Widely Shared Storage Servers file server globus:/O=INFN/CN=* RWLAX a.out test.ctest.dat cms.exe

Cooperative Computing Lab Reservation Right (V) file server globus:/O=INFN/CN=* V(RWLA) /O=INFN/CN=Mazzanti RWLA mkdir a.outtest.c /O=INFN/CN=Mazzanti mkdir /O=INFN/CN=Berlusconi RWLA a.outtest.c /O=INFN/CN=Berlusconi mkdir only!

Cooperative Computing Lab Abstractions Users Create Higher Level Structures –Admins do not know/care about abstractions. Current Abstraction Types: –CFS – Central File System –DSFS – Dist Shared File System –DSDB – Dist Shared Database Abstractions Under Development: –Striped File System –Distributed Time Travel Backup System

Cooperative Computing Lab CFS: Central File System file server adapter appl file

Cooperative Computing Lab ptr DSFS: Dist. Shared File System file server adapter appl file server file server file

Cooperative Computing Lab DSDB: Dist. Shared Database adapter appl file server file server file database server file index query direct access insert prepare create file

Cooperative Computing Lab hostname:database.infn.it RWLA mkdir DSDB Authentication file server hostname:database.infn.it V(RWLA) appl database server insert file for /O=INFN/CN=Mazzanti mkdir setacl /O=INFN/CN=Mazzanti RWL hostname:database.infn.it RWLA globus:/O=INFN/CN=Mazzanti RWL file.dat transfer data adaper

Cooperative Computing Lab ptrace interface Enhanced Operating System tcsh catvi trapped system calls tcsh catvi file table process table Like an OS Kernel –Tracks procs, files, etc. –Adds new capabilities. –Enforces owner’s policies. Delegated Syscalls –Trapped via ptrace interface. –Action taken by Parrot. –Resources chrgd to Parrot. Research Platform –Distributed file systems. –Grid appl. environments. –Debugging. –Easier than OS coding! Adapter Adapter - Parrot

Cooperative Computing Lab file system file server

Cooperative Computing Lab Prototype Storage in Computer Science Dept - Office Workstations - Instructional Labs - Research Clusters - Storage Bricks Each Owner Controls Local Storage - Access Control List - Evicts Users if Needed. - Collaborate Offsite

Cooperative Computing Lab Demo Time!

Cooperative Computing Lab Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

Cooperative Computing Lab Performance Considerations Nothing comes for free! –System calls: order of magnitude slower. –Memory bandwidth overhead: extra copies. Compared to NFS: –TSS slightly better on small operations. –TSS much better in network bandwidth. On real applications: –Measurable slowdown –Benefit: far more flexible and scalable.

Cooperative Computing Lab Performance – System Calls

Cooperative Computing Lab Performance - Applications parrot only

Cooperative Computing Lab Performance – I/O Calls

Cooperative Computing Lab Performance – Bandwidth

Cooperative Computing Lab Performance – DSFS

Cooperative Computing Lab Performance Conclusion TSS has measurable slowdown. TSS is comparable to NFS. TSS can create scalable, parallel filesys. To do better, must modify kernel.

Cooperative Computing Lab Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

Cooperative Computing Lab Application: High-Energy Physics SP5 Monte Carlo Simulation –Component of BaBar at SLAC –Collaboration with Sander Klous at NIKHEF Difficult to Deploy on a Grid –Complex Software Structure –Custom Shared Libraries –Objectivity Database –(Similar Difficulties with Other Applications)

Cooperative Computing Lab sp5 libobjy scripts data lock server file system operations database lock operations sp5 SP5 on a Standalone Machine manually started application

Cooperative Computing Lab sp5 libobjy scripts data lock server file system ops database lock ops sp5 Ideal SP5 Deployment sp5 libobjy sp5 libobjy sp5 libobjy sp5 libobjy sp5 libobjy

Cooperative Computing Lab sp5 adapter libobjy scripts data lock server file system ops database lock ops sp5 SP5 with Tactical Storage sp5 adapter sp5 adapter sp5 adapter sp5 adapter sp5 adapter file server libobjy GSI libobjy

Cooperative Computing Lab Performance on EDG Testbed Setup Time to Init Time/Event Unix 446 +/ /- 4664s LAN/NFS / s LAN/TSS / s WAN/TSS / s

Cooperative Computing Lab Thoughts on SP5 + TSS “With this project we have shown that computer scientists can solve the complications of grid computing and physicists can just use it.” “The most important issue is: Who has to do the work?”

Cooperative Computing Lab Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

Cooperative Computing Lab Application: Molecular Dynamics Researchers in MD are much like HEP: –Long running simulations, explore space. –Collaborating/competing on similar siml. –“What parameters have I explored?” –“How can I share results with friends?” –“Replicate these data for safety.” GEMS: Grid Enabled Molecular Sims –Distributed database for MD siml at Notre Dame. –Collaborators: Dr. Jesus Izaguirre, Dr. Aaron Striegel

Cooperative Computing Lab GEMS Distributed Database database server catalog server catalog server XML ->host1:fileA host7:fileB host3:fileC ACB YZX XML ->host6:fileX host2:fileY host5:fileZ data XML+ Temp>300K Mol==CH 4 host5:fileZ host6:fileX

Cooperative Computing Lab GEMS and Tactical Storage Dynamic System Configuration –Add/remove servers, discovered via catalog Policy Control in File Servers –Groups can Collaborate within Constraints –Security Implemented within File Servers Direct Access via Adapters –Unmodified Simulations can use Database

Cooperative Computing Lab Survivability

Cooperative Computing Lab Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

Cooperative Computing Lab Tactical Storage Systems Separate Abstractions from Resources Components: –File servers, abstractions, adapters. –Completely user level. –Performance acceptable for real applications. Independent but Cooperating Components –Owners of file servers set policy. –Users must work within policies. –Large numbers of users: V right.

Cooperative Computing Lab Future Work More powerful abstractions –Striping, replicating, indexing, searching. More fine grained control of storage –Allocation, accounting, and management of bandwidth and storage space. Applications and Deployment

Cooperative Computing Lab Tactical Storage Systems put power in the hands of the users, not administrators!

Cooperative Computing Lab Collaborators NIKHEF and Vrije University –Sander Klous University of Notre Dame –Aaron Striegel, Jesus Izaguirre Hard working students: –Justin Wozniak, Paul Brenner –Paul Madrid, Chris Moretti

Cooperative Computing Lab Publications Tactical Storage Systems –UND CSE Dept Tech Report , May Transparent Access to Grid Resources for User Software –Accepted to Concurrency and Computation: Practice and Experience, Gluttony and Generosity in GEMS: Grid Enabled Molecular Storage –High Performance Distributed Comp, Parrot: Transparent User-Level Middleware for Data-Intensive Computing –Workshop on Adaptive Grid Middleware, 2003.

Cooperative Computing Lab For more information... Cooperative Computing Lab Cooperative Computing Lab Cooperative Computing Tools Cooperative Computing Tools Douglas Thain Douglas Thain –