The Spensa File System Douglas Santry Computer Laboratory University of Cambridge
Target Environment “Lots” of physical machines in a machine room Physical machines interconnected by “high” quality network Machines are cheap and stuffed with “large” ATA disk drives
What are they doing? Machines are running virtual machines (Xen or VMWare) Virtual machines are mobile, that is, they migrate between physical machines There is very little explicit file sharing between virtual machines Candidates include corporate data centres, “service” providers, e-commerce sites
Challenges Data availability and reliability Load balancing and performance tuning Service differentiation and guarantees Location Transparency – virtual machines and data need to move transparently to the one another ATA disks are cheap – they WILL fail
Spensa Features Service Differentiation Service Guarantees Service Isolation Automatic load balancing Automatic performance tuning
Spensa A Distributed File System Two components: a client file system and a server Servers store opaque objects – they have no notion of file systems The client file system is backed by objects on the servers and offers the traditional file system hierarchy and name space
An instance of a Spensa (Name: foo) / usr home mnt Machine A Machine BMachine C Foo’s bascauda Spensa operates on objects
Bascauda A Bascauda B Bascauda C VM Mounted Spensa B VM Mounted Spensa C VM Mounted Spensa A
Spensa continued Every physical machine runs application virtual machines and a Spensa server Spensa servers run inside dedicated virtual machines – one per physical machine
Reliability and Availability Replication At 50 cents/G one can be free with it Replication factor specified on a per Spensa basis
Reading Replicas Spensa client broadcasts request for data to all copies of it First machine to fetch it answers and cancels fetch on peers
Caching Servers reside in virtual machines with all of the other virtual machines – memory is critical Servers do not cache client data Servers cache path critical meta data to minimize latency (backing file system’s inode, bitmaps &c)
Service Service can be specified in terms of time or bandwidth Time is specified in terms of percentage Bandwidth specified in KB/s Latency in milliseconds A Server is configured for either time or bandwidth. They are mutually exclusive
Service Continued Enforcement is distributed. There are no centralised or interposed enforcement machines or mechanisms Bandwidth seems to be more intuitive to specify for humans Bandwidth offers tighter short-term control
Load Balancing Too many machines (real and virtual) for a human to make provision decisions - Spensa auto-provisions Load balancing mitigates poor decisions Virtual diffusion with direct migration
Diffusion Bascaudae need to be decomposed for partial migration Bascaudae are decomposed in the object name space (it has no knowledge of the file system’s name space) Traffic is not Poisson – use the real distribution Servers keep a per bascauda load and address reference histogram