Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain

Similar presentations


Presentation on theme: "Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain"— Presentation transcript:

1 Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain http://www.cse.nd.edu/~dthain

2 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Sharing is Hard! Despite decades of research in distributed systems and operating systems, sharing computing resources is still technically and socially difficult! Most existing systems for sharing require: –Kernel level software. –A privileged login. –Centralized trust. –Loss of control over resources that you own.

3 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Example: Grid Computing Robert Gardner, et al. (102 authors) The Grid2003 Production Grid Principles and Practice IEEE HPDC 2004 The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory that has sustained for several months the production-level services required by… ATLAS, CMS, SDSS, LIGO…

4 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Grid Computing Experience The good news: –27 sites with 2800 CPUs –40985 CPU-days provided over 6 months –10 applications with 1300 simultaneous jobs The bad news: –40-70 percent utilization –30 percent of jobs would fail –90 percent of failures were site problems –Most site failures were due to disk space.

5 Cooperative Computing Lab http://www.cse.nd.edu/~ccl A Strange Problem Storage is Plentiful! –Large disks on every CPU, PDA, and iPod. –Typ. cluster has unused disks on each node. –MS filesystem study: most disks 90% free. –Tools for sharing: AFS, NFS, FTP, SCP... The problem: –Users are fixed to the abstractions provided by administrators: e.g. one NFS file system. –Result: 1000 people share one 40 GB disk.

6 Cooperative Computing Lab http://www.cse.nd.edu/~ccl What if... Users could use any storage anywhere? I could borrow an unused disk for NFS? An entire cluster can be used as storage? Multiple clusters could be combined? All this could be done without root? Solution: Tactical Storage System (TSS)

7 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

8 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Tactical Storage Systems (TSS) A TSS allows any node to serve as a file server or as a file system client. All components can be deployed without special privileges – but with security. Users can build up complex structures. –Filesystems, databases, caches,... Two Independent Concepts: –Resources – The raw storage to be used. –Abstractions – The organization of storage.

9 file system file system file system file system file system file system file system Central Filesystem App Distributed Database Abstraction Adapter App Distributed Filesystem Abstraction Adapter App Cluster administrator controls policy on all storage in cluster UNIX Workstations owners control policy on each machine. file server file server file server file server file server file server file server UNIX ??? Adapter

10 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Three Components User-Level File Servers –Secure Remote File Access w/out root Storage Abstractions –Combine several file servers into one. Application Adapters –Attach existing applications w/out root.

11 Cooperative Computing Lab http://www.cse.nd.edu/~ccl User-Level File Servers Unix-Like Access to Existing File Systems Complete Independence –choose friends –limit bandwidth –evict users? Trivial to Deploy –three steps Flexible Access Control file server file server Chirp Protocol Chirp Protocol file system

12 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Access Control in File Servers Unix Security is not Sufficient for the Job Authentication –Globus, Kerberos, Unix, Hostname, Address Authorization –Each directory has an access control: globus:/O=INFN/CN=Paolo_Mazzanti RWLA kerberos:dthain@nd.edu RWL hostname:*.bo.infn.it RL address:192.168.1.* RWLA

13 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Widely Shared Storage Servers file server globus:/O=INFN/CN=* RWLAX a.out test.ctest.dat cms.exe

14 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Reservation Right (V) file server globus:/O=INFN/CN=* V(RWLA) /O=INFN/CN=Mazzanti RWLA mkdir a.outtest.c /O=INFN/CN=Mazzanti mkdir /O=INFN/CN=Berlusconi RWLA a.outtest.c /O=INFN/CN=Berlusconi mkdir only!

15 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Abstractions Users Create Higher Level Structures –Admins do not know/care about abstractions. Current Abstraction Types: –CFS – Central File System –DSFS – Dist Shared File System –DSDB – Dist Shared Database Abstractions Under Development: –Striped File System –Distributed Time Travel Backup System

16 Cooperative Computing Lab http://www.cse.nd.edu/~ccl CFS: Central File System file server adapter appl file

17 Cooperative Computing Lab http://www.cse.nd.edu/~ccl ptr DSFS: Dist. Shared File System file server adapter appl file server file server file

18 Cooperative Computing Lab http://www.cse.nd.edu/~ccl DSDB: Dist. Shared Database adapter appl file server file server file database server file index query direct access insert prepare create file

19 Cooperative Computing Lab http://www.cse.nd.edu/~ccl hostname:database.infn.it RWLA mkdir DSDB Authentication file server hostname:database.infn.it V(RWLA) appl database server insert file for /O=INFN/CN=Mazzanti mkdir setacl /O=INFN/CN=Mazzanti RWL hostname:database.infn.it RWLA globus:/O=INFN/CN=Mazzanti RWL file.dat transfer data adaper

20 Cooperative Computing Lab http://www.cse.nd.edu/~ccl ptrace interface Enhanced Operating System tcsh catvi trapped system calls tcsh catvi file table process table Like an OS Kernel –Tracks procs, files, etc. –Adds new capabilities. –Enforces owner’s policies. Delegated Syscalls –Trapped via ptrace interface. –Action taken by Parrot. –Resources chrgd to Parrot. Research Platform –Distributed file systems. –Grid appl. environments. –Debugging. –Easier than OS coding! Adapter Adapter - Parrot

21 Cooperative Computing Lab http://www.cse.nd.edu/~ccl file system file server

22 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Prototype Storage in Computer Science Dept - Office Workstations - Instructional Labs - Research Clusters - Storage Bricks Each Owner Controls Local Storage - Access Control List - Evicts Users if Needed. - Collaborate Offsite

23 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Demo Time!

24 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

25 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance Considerations Nothing comes for free! –System calls: order of magnitude slower. –Memory bandwidth overhead: extra copies. Compared to NFS: –TSS slightly better on small operations. –TSS much better in network bandwidth. On real applications: –Measurable slowdown –Benefit: far more flexible and scalable.

26 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance – System Calls

27 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance - Applications parrot only

28 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance – I/O Calls

29 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance – Bandwidth

30 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance – DSFS

31 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance Conclusion TSS has measurable slowdown. TSS is comparable to NFS. TSS can create scalable, parallel filesys. To do better, must modify kernel.

32 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

33 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Application: High-Energy Physics SP5 Monte Carlo Simulation –Component of BaBar at SLAC –Collaboration with Sander Klous at NIKHEF Difficult to Deploy on a Grid –Complex Software Structure –Custom Shared Libraries –Objectivity Database –(Similar Difficulties with Other Applications)

34 Cooperative Computing Lab http://www.cse.nd.edu/~ccl sp5 libobjy scripts data lock server file system operations database lock operations sp5 SP5 on a Standalone Machine manually started application

35 Cooperative Computing Lab http://www.cse.nd.edu/~ccl sp5 libobjy scripts data lock server file system ops database lock ops sp5 Ideal SP5 Deployment sp5 libobjy sp5 libobjy sp5 libobjy sp5 libobjy sp5 libobjy

36 Cooperative Computing Lab http://www.cse.nd.edu/~ccl sp5 adapter libobjy scripts data lock server file system ops database lock ops sp5 SP5 with Tactical Storage sp5 adapter sp5 adapter sp5 adapter sp5 adapter sp5 adapter file server libobjy GSI libobjy

37 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Performance on EDG Testbed Setup Time to Init Time/Event Unix 446 +/- 46 446 +/- 4664s LAN/NFS 4464 +/- 172 113s LAN/TSS 4505 +/- 155 113s WAN/TSS 6275 +/- 330 88s

38 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Thoughts on SP5 + TSS “With this project we have shown that computer scientists can solve the complications of grid computing and physicists can just use it.” “The most important issue is: Who has to do the work?”

39 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

40 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Application: Molecular Dynamics Researchers in MD are much like HEP: –Long running simulations, explore space. –Collaborating/competing on similar siml. –“What parameters have I explored?” –“How can I share results with friends?” –“Replicate these data for safety.” GEMS: Grid Enabled Molecular Sims –Distributed database for MD siml at Notre Dame. –Collaborators: Dr. Jesus Izaguirre, Dr. Aaron Striegel

41 Cooperative Computing Lab http://www.cse.nd.edu/~ccl GEMS Distributed Database database server catalog server catalog server XML ->host1:fileA host7:fileB host3:fileC ACB YZX XML ->host6:fileX host2:fileY host5:fileZ data XML+ Temp>300K Mol==CH 4 host5:fileZ host6:fileX

42 Cooperative Computing Lab http://www.cse.nd.edu/~ccl GEMS and Tactical Storage Dynamic System Configuration –Add/remove servers, discovered via catalog Policy Control in File Servers –Groups can Collaborate within Constraints –Security Implemented within File Servers Direct Access via Adapters –Unmodified Simulations can use Database

43 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Survivability

44 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Outline Why is Sharing Data so Hard? Tactical Storage Systems –File Servers, Abstractions, Adapters Performance Comparison Application: High-Energy Physics Application: Bioinformatics Database Conclusion

45 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Tactical Storage Systems Separate Abstractions from Resources Components: –File servers, abstractions, adapters. –Completely user level. –Performance acceptable for real applications. Independent but Cooperating Components –Owners of file servers set policy. –Users must work within policies. –Large numbers of users: V right.

46 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Future Work More powerful abstractions –Striping, replicating, indexing, searching. More fine grained control of storage –Allocation, accounting, and management of bandwidth and storage space. Applications and Deployment

47 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Tactical Storage Systems put power in the hands of the users, not administrators!

48 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Collaborators NIKHEF and Vrije University –Sander Klous University of Notre Dame –Aaron Striegel, Jesus Izaguirre Hard working students: –Justin Wozniak, Paul Brenner –Paul Madrid, Chris Moretti

49 Cooperative Computing Lab http://www.cse.nd.edu/~ccl Publications Tactical Storage Systems –UND CSE Dept Tech Report 2005-07, May 2005. Transparent Access to Grid Resources for User Software –Accepted to Concurrency and Computation: Practice and Experience, 2005. Gluttony and Generosity in GEMS: Grid Enabled Molecular Storage –High Performance Distributed Comp, 2005. Parrot: Transparent User-Level Middleware for Data-Intensive Computing –Workshop on Adaptive Grid Middleware, 2003.

50 Cooperative Computing Lab http://www.cse.nd.edu/~ccl For more information... Cooperative Computing Lab Cooperative Computing Lab http://www.cse.nd.edu/~ccl Cooperative Computing Tools Cooperative Computing Tools http://www.cctools.org Douglas Thain Douglas Thain –dthain@cse.nd.edu dthain@cse.nd.edu –http://www.cse.nd.edu/~dthain http://www.cse.nd.edu/~dthain


Download ppt "Enabling Data-Intensive Science with Tactical Storage Systems Douglas Thain"

Similar presentations


Ads by Google