Download presentation
Presentation is loading. Please wait.
Published byBarnaby Williams Modified over 6 years ago
1
Thor: a Fast, Distributed, Persistent Object System
Andrew Myers CS 632—Advanced Database Systems 22 Feb 01
2
Thor: A Fast Distributed Persistent Object System
Persistence Question : what is the right programming model for accessing persistent data? what is the right data model? Current persistent data: Much data stored in relational databases Much less data stored in object databases Lots of data in flat files in file systems (every Windows machine in the world!) Structure of data sometime encoded in directory structure, (& relations) More often implicit in application code 1/3/2019 Thor: A Fast Distributed Persistent Object System
3
Structuring Persistent Data
Huge amount of data are going into digital formats (e.g., digital libraries) Defining suitable models for persistent data is important -- earlier the better Models must be flexible, extensible Should support safe sharing of data across applications, across distributed computing environment Good performance also important 1/3/2019 Thor: A Fast Distributed Persistent Object System
4
Thor: A Fast Distributed Persistent Object System
Impedance mismatch Problem: popular persistent data formats don’t look much like popular programming language models Persistent data: no pointers, no object identity, weak referential integrity, no garbage collection, no type checking Important only for volatile data? 1/3/2019 Thor: A Fast Distributed Persistent Object System
5
Thor: A Fast Distributed Persistent Object System
Effect on Programs Program reads “file” of persistent data Creates convenient volatile in-memory data structures using parsing routines Data manipulated in volatile form Explicitly “saved” by converting back to persistent format (unparsing) Extra parsing & unparsing code with no support for correctness No fine-grained, concurrent sharing No pointers, no garbage collection 1/3/2019 Thor: A Fast Distributed Persistent Object System
6
Orthogonal Persistence
Idea: write application in any language you like (e.g., Java) Objects manipulated by the program transparently persistent or volatile Persistence defined by reachability from root, not by type or explicit annotation Result: persistence for free; low-cost software development; more robust code 1/3/2019 Thor: A Fast Distributed Persistent Object System
7
Thor: A Fast Distributed Persistent Object System
Provides standard single-machine programming model, but supports distributed persistent data transparently persistent objects with semantics rich type system (Java+) referential integrity garbage collection distributed storage + caching sequential consistency -- hides concurrent access, failures heterogeneous language support 1/3/2019 Thor: A Fast Distributed Persistent Object System
8
Thor: A Fast Distributed Persistent Object System
Thor architecture Front ends do computation, cache objects, provide application interface to persistence Object repository (OR) provides persistent storage of objects Client Client Client FE FE FE OR OR OR 1/3/2019 Thor: A Fast Distributed Persistent Object System
9
Thor: A Fast Distributed Persistent Object System
Programming model Each FE caches part of object universe Objects automatically fetched as needed 64 bit persistent object ids; 32-bit in-memory ptrs Safe languages supported: Java, Theta FE (231 objects) 264 objects 1/3/2019 Thor: A Fast Distributed Persistent Object System
10
Thor: A Fast Distributed Persistent Object System
Veneers Applications may be in unsafe language (C, C++) Object operations invoked via veneer automatically generated stubs for app language Reflective object system objects point to their own implementations impls are objects in OR can discover interfaces dynamically Client (unsafe lang.) shared- memory pipe Veneer FE (safe lang.) ORs 1/3/2019 Thor: A Fast Distributed Persistent Object System
11
Object References surrogate object (node marking) Client Client cached
copies unswizzled pointer FE FE (edge marking) intra-node reference (32 bit) OR OR persistent objects inter-node reference (64 bit via forwarding obj) 1/3/2019 Thor: A Fast Distributed Persistent Object System
12
Thor: A Fast Distributed Persistent Object System
Transactions Computation at FE is broken up into transactions separated by checkpoints Transaction is committed atomically to participating ORs via two-phase commit Client FE OR OR OR 1/3/2019 Thor: A Fast Distributed Persistent Object System
13
Persistence by Reachability
An OR has a root object always persistent always reachable a light-weight directory Any object reachable from root becomes persistent at transaction commit No explicit declaration of persistence needed No type distinction between persistent and volatile objects: orthogonal persistence 1/3/2019 Thor: A Fast Distributed Persistent Object System
14
Thor: A Fast Distributed Persistent Object System
Convenient programming model, strong semantic guarantees, and high performance too? 1/3/2019 Thor: A Fast Distributed Persistent Object System
15
Thor: A Fast Distributed Persistent Object System
Performance Performance comparison: OO7 benchmark Most generally-accepted object-oriented database benchmark Similar to a CAD database -- good model mixture of very small and large objects (4W-32K) various recursive traversals (w/ & w/o modification) of complex pointer structure must run in a fixed amount of memory (so that only fraction of database can fit in memory) 1/3/2019 Thor: A Fast Distributed Persistent Object System
16
Implementation options
Relational database Conventional file system with read/write Conventional file system with memory-mapped files Object-oriented database Distributed object-oriented database (Thor) 1/3/2019 Thor: A Fast Distributed Persistent Object System
17
Using Relational Database
15 levels Problem: relational database don’t implement pointers (object references) efficiently Must introduce extra keys, use index to find appropriate records: extra storage, locality problems 1/3/2019 Thor: A Fast Distributed Persistent Object System
18
Thor: A Fast Distributed Persistent Object System
Memory-mapped files Memory-mapped files (mmap) avoids data duplication between application and OS file buffer cache Buffer cache memory mapped directly into application VM Conventional file I/O uses twice the memory; can cache only half as much of persistent data in memory Application OS Kernel Volatile data Buffer cache 1/3/2019 Thor: A Fast Distributed Persistent Object System
19
Relative Performance for OO7
? Non-distributed Object databases Memory-mapped files Thor Simple File I/O Object-relational databases Relational databases 1/3/2019 Thor: A Fast Distributed Persistent Object System
20
Thor: A Fast Distributed Persistent Object System
Relative Performance Object data in OO7 does not fit in memory fetches of persistent data into memory dominate performance System with fewest fetches wins 1/3/2019 Thor: A Fast Distributed Persistent Object System
21
OO7 in C++, memory-mapped
C++/OS application implementing OO7 benchmark Objects in memory-mapped file close( ) on file flushes memory to disk Weak semantic guarantees: no concurrency control no array bounds checks no support for failure during write 1/3/2019 Thor: A Fast Distributed Persistent Object System
22
Thor: A Fast Distributed Persistent Object System
Traversals Sparse vs. dense traversals dense traversals use every page of disk storage effectively (unrealistic) (91%) sparse traversal only touches a few objects on each page (3%) Realistic bound [TN92]: 15-41% hit rate per page Read-only vs. read-write traversals read-write traversals accumulate changes that must be written back to disk 1/3/2019 Thor: A Fast Distributed Persistent Object System
23
Thor vs. C++/mmap (dense)
sec 200 C++/mmap 150 18MB 100 50 T2a T2b 1/3/2019 Thor: A Fast Distributed Persistent Object System
24
Dense read-only traversal
25% speedup 200 sec 150 C++/mmap 15× speedup 100 40% slowdown 50 Thor 10 20 30 40 50 FE cache size (MB) 1/3/2019 Thor: A Fast Distributed Persistent Object System
25
Thor: A Fast Distributed Persistent Object System
Other traversals C++/OS does best on unrealistically dense traversals Sparse traversals: Thor has up to 1000× relative performance C++ NFS server was given much more memory than Thor OR server (137MB vs. 36MB) 1/3/2019 Thor: A Fast Distributed Persistent Object System
26
Thor: A Fast Distributed Persistent Object System
Conclusion File systems are obsolete -- they provide sub-optimal performance and a even worse interface for programmers to write applications 1/3/2019 Thor: A Fast Distributed Persistent Object System
27
Thor: A Fast Distributed Persistent Object System
Thor vs. Quickstore Quickstore (commercial object-oriented database) has best published performance results for any OODB Not a distributed system Built on memory-mapped files -- uses page-based memory management 1/3/2019 Thor: A Fast Distributed Persistent Object System
28
Thor: A Fast Distributed Persistent Object System
Results Number of fetches: sparse dense Thor k Quickstore k Thor has 21-25% fewer fetches No Quickstore results for medium-sized traversals; even more advantageous for Thor Conclusion: object caching beats page caching 1/3/2019 Thor: A Fast Distributed Persistent Object System
29
Thor: A Fast Distributed Persistent Object System
Front end features Object storage managed by Hybrid Adaptive Caching (HAC) algorithm CLOCC optimistic concurrency control algorithm provides sequential consistency, best performance Techniques may be applicable to more conventional databases 1/3/2019 Thor: A Fast Distributed Persistent Object System
30
Thor: A Fast Distributed Persistent Object System
Object repository Server cache speeds up client fetches Modified Object Buffer (MOB) keeps track of object mods separately from cache defers writes until necessary reduces installation reads, allows write absorption Server Page cache read Log flusher commit, abort MOB 1/3/2019 Thor: A Fast Distributed Persistent Object System
31
Thor: A Fast Distributed Persistent Object System
More OR features Replicated ORs (log stability via replication) Referential integrity object mobility (multiple oids per object) supported through OR surrogate objects, lazy forwarding no centralized location service distributed GC algorithm collects cycles efficiently FE OR OR OR 1/3/2019 Thor: A Fast Distributed Persistent Object System
32
Thor: A Fast Distributed Persistent Object System
Other Issues Queries not directly supported in standard PLs or in Thor can be coded using conventional data structures, but can high-performance queries be achieved? may require moving code to data (function shipping); Thor model is data shipping relational databases not obsolete Schema evolution: how to handle changes to software and data objects? Disconnected operation/long transactions 1/3/2019 Thor: A Fast Distributed Persistent Object System
33
Thor: A Fast Distributed Persistent Object System
Reading Providing Persistent Objects in Distributed Systems (ECOOP ’99) Hybrid Adaptive Caching for Distributed Storage Systems (SOSP ’97) Safe and Efficient Sharing of Persistent Objects in Thor (SIGMOD ’96) The Language-Independent Interface of the Thor Persistent Object System, in Object-Oriented Multidatabase Systems 1/3/2019 Thor: A Fast Distributed Persistent Object System
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.