Presentation is loading. Please wait.

Presentation is loading. Please wait.

New layout for describing block devices and file systems Luis Fernando Muñoz Mejías Universidad Autónoma de Madrid 4 th Quattor Workshop (UAM, 2007)

Similar presentations


Presentation on theme: "New layout for describing block devices and file systems Luis Fernando Muñoz Mejías Universidad Autónoma de Madrid 4 th Quattor Workshop (UAM, 2007)"— Presentation transcript:

1 New layout for describing block devices and file systems Luis Fernando Muñoz Mejías Universidad Autónoma de Madrid 4 th Quattor Workshop (UAM, 2007)

2 Luis Fernando Muñoz Mejías Outline ➲ Current layout ● Limitations ➲ Tim Bell and Andras Horvath's model ➲ New model ➲ Example ➲ Conclusions ➲ What's next

3 Luis Fernando Muñoz Mejías Current layout ➲ Oriented to partitions ➲ Mixes partition and file system definition ➲ Based on set_partitions function set_partitions (nlist (“hda1”, nlist (“mountpoint”, “/”, “size”, 10*GB, “type”, “ext3”...

4 Luis Fernando Muñoz Mejías Limitations of current layout ➲ Control of advanced features (large partitions, tuning options...) is poor ➲ Software RAID is difficult to achieve, at best ➲ It depends on “hda”-like naming ● There are other naming schemas, f.i. MegaRAID ➲ It's not meant to control hardware RAID

5 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model ➲ Separation between file system and block devices ● File system just references the block device it lies on ● Easy to extend ➲ Very natural for humans ● Partitions are part of disk structure, logical volumes are part of volume group structure...

6 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model: file systems type filesystem = { “preserve” : boolean “format” : boolean “type” : string “block_device” : string “mountpoint” : string... }; Reference

7 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model: disks type disk = { “partitions” : partition{} “label” ? string... }; “Natural”: partitions are disk members Allows for large partitions

8 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model: hardware RAID type hwraid = { “partitions” : partition{} “label” : string “level” : string... }; Duplicated from disks!

9 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model: LVM and software RAID type volume_group = { “device_list” : string[] “logical_volumes” : logical_volume{} }; type sw_raid = { “device_list” : string[] “raid_level” : string }; List of references Bi-directional referencing

10 Luis Fernando Muñoz Mejías Tim Bell and Andras Horvath's model: problems ➲ Difficult to implement ➲ Human's natural way is not the best way for computers ● Bi-directional creations are complex, slow and error prone ● The model doesn't allow filesystems to control creations, destructions and modifications

11 Luis Fernando Muñoz Mejías New model ➲ Based on Tim's and Andras' ➲ Top-down only ➲ File systems control creation, growth and shrinks

12 Luis Fernando Muñoz Mejías New model: top-down File systems and block devices can be modelled as a tree-like structure rooted on the file system File system: /Homer Blockdev: LV /dev/Springfiedl/EvergreenTrc Blockdev: VG /dev/Springfield Blockdev: partition /dev/sda1 Blockdev: partition /dev/sda2 Blockdev: disk /dev/ciss/c0d 0 B lockdev: disk /dev/sda Tree structure Non- tree structur e

13 Luis Fernando Muñoz Mejías New model: disks and hardware RAID type physical_dev = { “label” : string “raid_level” ? string “raid_members” ?... }; ➲ Hardware RAID and disks are merged ➲ Partitions are defined outside their disks msdos, gpt, bsd...

14 Luis Fernando Muñoz Mejías New model: partitions type partition = { “size” ? long “holding_dev” : string “type” : string... }; ➲ Partitions reference the disk they lay on ● More flexibility on naming schemas ➲ “grow” flag is gone Optional! primary, extended, logical

15 Luis Fernando Muñoz Mejías New model: LVM type volume_group = { “device_list” : string[] }; type logical_volume= { “volume_group” : string “size” ? long }; ➲ Volume groups don't know about the logical volumes they hold ● Enforce top-down approach

16 Luis Fernando Muñoz Mejías New model: software RAID type md = { “raid_level” : string “device_list” : string[] }; ➲ Software RAID can lay on arbitrary devices References to defined block devices

17 Luis Fernando Muñoz Mejías New model: files type file = { “size” : string “owner” : string “group” : string “perms” : long }; ➲ Files can hold filesystems with the loopback module

18 Luis Fernando Muñoz Mejías New model: file systems type filesystem = { “mountpoint” : string “type” : string “tuneopts” ? string “preserve” : boolean “format” : boolean “mount” : boolean “freq” : long “sync” : long “block_device” : string }; ➲ File systems only reference the block device they lay on

19 Luis Fernando Muñoz Mejías New model: tying it all together type blockdevices = { “physical_devs” ? disk{} “partitions” ? partition{} “volume_groups” ? volume_group{} “logical_volumes” ? logical_volumes{} “md” ? md{} “files” ? file{} }; bind “/system/blockdevices” = blockdevices; bind “/system/filesystems” = filesystem[]; This is a list!!

20 Luis Fernando Muñoz Mejías Some advice ➲ Don't use extended/logical partitions ● Use LVM instead ➲ Always use partitions ● Don't place filesystems directly on disks, they might get destroyed by Quattor

21 Luis Fernando Muñoz Mejías Let's see an example

22 Luis Fernando Muñoz Mejías Example: diagram File system: /Homer Blockdev: LV /dev/Springfiedl/EvergreenTrc Blockdev: VG /dev/Springfield Blockdev: partition /dev/sda1 Blockdev: partition /dev/sda2 Blockdev: disk /dev/ciss/c0d 0 B lockdev: disk /dev/sda Tree structure Non- tree structur e

23 Luis Fernando Muñoz Mejías Example Let's suppose that /dev/sda1 uses 1GB, /dev/sda2 uses the rest of the disk, and logical volume Springfield/EvergreenTrc uses all its volume group

24 Luis Fernando Muñoz Mejías Example: the file system “/system/filesystems” = list ( nlist (“mountpoint”, “/Homer”, “preserve”, true, “format”, false, “type”, “xfs”, “block_device”, “logical_volumes/EvergreenTrc”, “mount”, true ) );

25 Luis Fernando Muñoz Mejías Example: the LVM “/system/blockdevices/logical_volumes” = nlist( “EvergreenTrc”, nlist (“volume_group”, “Springfield”)); “/system/blockdevices/volume_groups” = nlist ( “Springfield”, nlist (“device_list”, list (“partitions/sda1”, “partitions/sda2”, “physical_devs/” + escape (“ciss/c0d0”))); Relative to /system/blockdevic es Relative to /system/blockdevices/volume_grou ps

26 Luis Fernando Muñoz Mejías Example: the partitions “/system/blockdevices/partitions” = nlist ( “sda1”, nlist (“holding_dev”, “sda”, “size”, 1*GB), “sda2”, nlist (“holding_dev”, “sda”) ); /dev/sda2 fills the rest of the disk Primary partition is assumed Relative to /system/blockdevices/physical_dev s

27 Luis Fernando Muñoz Mejías Example: the disks “/system/blockdevices/disks” = nlist ( “sda”, nlist (“label”, “msdos”), escape (“ciss/c0d0”), nlist (“label”, “none”) ); A PV lies directly on the disk, without partitions. No label must be set for this

28 Luis Fernando Muñoz Mejías Conclusion ➲ New layout is more flexible and easier to extend ➲ Implemented on AII and ncm-filesystems ● See next presentations ➲ Temporary path under /software/components/filesystems ➲ Ready to stabilize on /system/...

29 Luis Fernando Muñoz Mejías What's next ➲ LVM snapshots ● Are they needed? ● Are they Quattor business at all? ➲ LVM striping? ➲ Software RAID monitoring? ➲ Quota definition ➲ Other stuff...

30 Luis Fernando Muñoz Mejías More information CERN's twiki on the new layout


Download ppt "New layout for describing block devices and file systems Luis Fernando Muñoz Mejías Universidad Autónoma de Madrid 4 th Quattor Workshop (UAM, 2007)"

Similar presentations


Ads by Google