File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures.

File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures by Folk, Zoellick, and Ricarrdi

File StructuresSNU-OOPSLA Lab.2 Chapter Objectives u Introduce indexed sequential files u Describe operations on a sequence set of blocks that maintains records in order by key u Show how an index set can be built on top of the sequence set to produce an indexed sequential file structure u Introduce the use of a B-tree to maintain the index set, thereby introducing B + trees and simple prefix B + trees u Illustrate how the B-tree index in a simple prefix B + tree can be of variable order, holding a variable number of separators u Compare the strengths and weakness of B + trees, simple prefix B + trees, and B-trees

File StructuresSNU-OOPSLA Lab.3 Contents u 10.1 Indexed Sequential Access u 10.2 Maintaining a Sequence Set u 10.3 Adding a Simple Index to the Sequence Set u 10.5 The Contents of the Index: Separators Instead of Keys u 10.6 The Simple Prefix B + Tree Maintenance u 10.7 Index Set Block size u 10.8 Internal Structure of the Index Set Blocks: A variable-order B-Tree u 10.9 Loading a Simple Prefix B + Tree u 10.10 B + Trees u 10.11 B-Trees, B + Trees, and Simple Prefix B + Trees in Perspective

File StructuresSNU-OOPSLA Lab.4 10.1 Indexed Sequential Access u Two alternative views u indexed : records are indexed by keys u no good for sequential processing u sequential : records can be accessed sequentially u not good for access, insert, delete records in random order u In chap 9, we see B tree and now we want derive Indexed + Sequential ==> B+ tree with help of the idea of the sequence set u Sequential file ==> Indexed Sequential file ==> B+ tree u Indexed-Sequential file = Indexed Sequential Access Method (ISAM)

File StructuresSNU-OOPSLA Lab.5 Overview : ISAM File R main memory secondary memory 61  10205061101  304045 DCA 1310 ABA 1120 CD 515557 ADB 6570 101 EBC 120150 AD 50 D 60 B 61 A a bc ihgfed part description records PART # PART-Type primary key Example : Indexed sequential structure (when using overflow chain)

File StructuresSNU-OOPSLA Lab.6 Overview : ISAM File (2) u Compared with ordered relative file u Ordered on a key, like ordered relative file u Can be accessed by an index, structure that contains information on where a record with a given key is located (usually intermingled with blocks of records) u Tree search of an index replaces binary search of ordered relative files

File StructuresSNU-OOPSLA Lab.7 Indexed Sequential Files u Block types u Index Block u Primary Data Block u Overflow Data Block Index Block Data Block Data Block Overflow Data Block Data Block... Overflow Data Block Overflow Data Block

File StructuresSNU-OOPSLA Lab.8 Indexed Sequential Files :Retrieval ¬ Retrieve parts_file where part# = 60 u Primary Key search : nodes R,a,b,g accessed u 3 primary block access, 1 overflow block accessed Retrieve parts_file where part# = 101 and part_type = C (overqualified) u Primary Key search : nodes R,a,c,h accessed u 3 primary block accesses u Block “access”es are really block fetches. The blocks may be in main memory buffers so that actual block accesses aren’t performed

File StructuresSNU-OOPSLA Lab.9 Indexed Sequential Files : Retrieval(2) ® Retrieve part_file where part#= 101 or part_type = C u Scan : node R,d,e,f,g,h,I accessed u 6 primary block “accesses” u overflow block “accesses”

File StructuresSNU-OOPSLA Lab.10 61  10205061101  304045 DCA 1310 ABA 1120 CD 515557 ADB 6570 101 EBC 120150 AD 50 D 60 B 61 A a bc ihgfed Retrieval of Indexed sequential structure 123 R

File StructuresSNU-OOPSLA Lab.11 Indexed Sequential Files : Insertion u (Step 1) Locate data level node via key search in which to insert record u (Step 2) Determine if record is to be inserted into primary block or overflow in order to maintain primary key order sequence of records u (Step 3a) If record is to be placed in primary block and block is not full, shift all records with higher- valued primary keys to the right and place new record into vacated slot. STOP.

File StructuresSNU-OOPSLA Lab.12 Indexed Sequential Files : Insertion(2) u (Step 3b) If record is to be placed in primary block and block is full, place record of the block with highest valued primary key so that it is the first record on the overflow chain (move one record to the overflow chain). Primary block is now not full. Go to Step 3a. u (Step 4) If record is to be placed in overflow chain, place record in appropriate position on overflow chain so that primary key sequencing is maintained. STOP.

File StructuresSNU-OOPSLA Lab.13 110120130 FAE 110120130 FAE 120130150 AED 120130150 AED 120 AD 130 E 180 C 110 F 170 G 180 C 150 D 180 C 150 D 170 G 180 C Yields(step 3a) Yields(step 4) Yields(step 3b) Yields(step 4) insert insert i Example : Insertion

File StructuresSNU-OOPSLA Lab.14 Indexed Sequential Files : Deletion u (Step 1) Locate record to delete by primary key search u (Step 2) If record is in primary block, free its slot and shift all records in the block with higher-valued primary keys to the left. STOP u (Step 3) If record is in overflow, remove it from overflow chain. STOP

File StructuresSNU-OOPSLA Lab.15 110130 FE 150 D 170 G 180 C 110130 FE 170 G 180 C 120 A 150 D yields remove Example : Deletion

File StructuresSNU-OOPSLA Lab.16 Indexed Sequential Files : Update u (Step 1) Locate record to update by primary key search u (Step 2) If primary key was not altered, simply replace stored copy of record with the updated copy. STOP. u (Step 3) If primary key was altered, delete(remove) the located record. Insert updated record just as if were a new record. STOP.

File StructuresSNU-OOPSLA Lab.17 Indexed Sequential Files : Reorganization u Reading records out of old file in the primary key order u Building new indexed sequential structure with no records in overflow. (file creation) u Reorganization is really hectic !!! u Definitions u Loading Factor = average number of records per node u Initial Loading Factor = Loading Factor when file is created

File StructuresSNU-OOPSLA Lab.18 main memory secondary memory 45  70  3113045 120  2030 DD 13 AB 1011 AC 10 120 CA 150 D 51576170 6061 45 BA 5051 DA 5557 DB 6570 EB 4045 CA Example : Reorganization

File StructuresSNU-OOPSLA Lab.19 Indexed Sequential Files : Creation u (Step 1) Using a specified initial loading factor LF, pack LF records per node and create the data level of the new indexed sequential file structure. (Last node on data level will have from 1 to LF records in it) u (Step 2) Build consecutive levels of index nodes until a level is reached where there is only a single node. The root node is created and is placed on the next higher level blocks of index are to be packed as full as possible. Stop.

File StructuresSNU-OOPSLA Lab.20 10.2 Maintaining a Sequence Set u A sequence set (similar terms: ordered file, sequential set) u a set of records in physical order by key u Sequence set + Simple Index ===> Simple Prefix B+ Tree u The Use of Blocks u We want to rule out sorting and resorting of the sequence set u insertion of records into block : overflow -> split u deletion of records : underflow -> redistribution, concatenation u costs for avoidance of sorting u more space overhead (internal fragmentation in a block) -> redistribution in place of splitting, two-to-three splitting u the maximum guaranteed extent of physical sequentiality is within a block -> choice of block size 10.2 Maintaining a Sequence Set

File StructuresSNU-OOPSLA Lab.21 ADAMS...BAIRD...BIXBY...BOONE... BYNUM...CARSON...COLE...DAVIS... DENVER...ELLIS... Block1 Block2 Block3 ADAMS...BAIRD...BIXBY...BOONE... BYNUM...CARSON...CARTER... DENVER...ELLIS... Block1 Block2 Block3 COLE...DAVIS... Block4 (a)Initial blocked sequence set (b)Sequence set after insertion of CARTER record - block 2 splits, and the contents are divided between blocks 2 and 4 Block splitting & concatenation(1) (continued....) 10.2 Maintaining a Sequence Set

File StructuresSNU-OOPSLA Lab.22 ADAMS...BAIRD...BIXBY...BOONE... BYNUM...CARSON...CARTER... Block1 Block2 Block3 COLE...DENVER...ELLIS... Block4 (c)Sequence set after deletion of DAVIS record - block 4 is less than half full, so it is concatenated with block3 Block splitting & concatenation(2) Available for use 10.2 Maintaining a Sequence Set

File StructuresSNU-OOPSLA Lab.23 Issue: Choice of Block Size u Block : basic unit for I/O u The maximum guaranteed extent of physical sequentiality u Two considerations u several blocks should be in RAM at once u e.g. for split or concatenation, at least two blocks in RAM u reading/writing a block should not be very long u Cluster :- the minimum number of sectors allocated at a time - the minimum size of a file u Reasonable suggestion: block size == cluster size u can access a block without seeking within a cluster 10.2 Maintaining a Sequence Set

File StructuresSNU-OOPSLA Lab.24 10.3 Adding a Simple Index to the Sequence Set (1) u An efficient way to locate some specific block containing a particular record, given the record’s key u build index records containing the key for the last record in a block u Possible Index Structures u simple index u binary search of the index u works well while the entire index is in RAM u B + tree u B-tree index + a sequence set with actual records

File StructuresSNU-OOPSLA Lab.25 ADAMS -BERNE BOLEN -CAGE CAMP -DUTTON EMBRY -EVANS FABER -FOLK FOLKS -GADDIS 132456 Sequence of blocks KeyBlock Number BERNE CAGE DUTTON EVANS FOLK GADDIS 123456123456 Simple index

File StructuresSNU-OOPSLA Lab.26 10.4 The Content of the Index :Separators Instead of Keys u Need not to have actual keys in the index set u Our real need is separators u Separator - distinguishes between 2 blocks u among many candidates, shortest separator is preferable u there is not always a unique shortest separator

File StructuresSNU-OOPSLA Lab.27 ADAMS -BERNE BOLEN -CAGE CAMP -DUTTON EMBRY -EVANS FABER -FOLK FOLKS -GADDIS 132456 Separators:BOCAMEFFOLKS Separators between blocks in the sequence set CAMP -DUTTON EMBRY -EVANS DUTU DVXGHSJF DZ E EBQX ELEEMOSYNARY A list of potential separators

File StructuresSNU-OOPSLA Lab.28 10.5 The Simple Prefix B + Tree u Index like B-tree + blocks of sequential sets u The use of simple prefixes u prefixes of the keys rather than actual keys u contains shortest separators u N separators -> N+1 children u Properties of B+ tree u B-tree like Index u Sequential data set u Indexed-sequential file

File StructuresSNU-OOPSLA Lab.29 E BOCAMFFOLKS ADAMS -BERNE BOLEN -CAGE CAMP -DUTTON EMBRY -EVANS FABER -FOLK FOLKS -GADDIS 132456 Index set A B-tree index set for the sequence set, forming a simple prefix B + tree

File StructuresSNU-OOPSLA Lab.30 10.6 Simple Prefix B + Tree Maintenance (1) u Changes localized to single blocks in the sequence set u deletion without concatenation, redistribution u e.g. delete EMBRY, FOLKS u insertion without splitting u e.g. insert EATON

File StructuresSNU-OOPSLA Lab.31 E BOCAMFFOLKS ADAMS -BERNE BOLEN -CAGE CAMP -DUTTON ERVIN -EVANS FABER -FOLK FROST -GADDIS 132456 Deletion of the EMBRY and FOLKS from the sequence set

File StructuresSNU-OOPSLA Lab.32 10.6 Simple Prefix B + Tree Maintenance(2) u Changes involving multiple blocks in the sequence set u split, concatenation : propagate to index set u change the number of blocks in the sequence set è change the number of separators è change the index set u insertion with splitting u e.g. overflow in block1 block1, block7 with separator AY u deletion with concatenation/redistribution u e.g. underflow in block2 block2, block3 split concatenation

File StructuresSNU-OOPSLA Lab.33 AYCAMFFOLKS AYERS -BERNE BOLEN -CAGE CAMP -DUTTON ERVIN -EVANS FABER -FOLK FROST -GADDIS 732456 ADAMS -AVERY 1 BOE An insertion into block 1 causes a split and the consequent addition of block 7

File StructuresSNU-OOPSLA Lab.34 AYERS -BERNE BOLEN -DUTTON ERVIN -EVANS FABER -FOLK FROST -GADDIS 72456 ADAMS -AVERY 1 AYBOFFOLKS E A deletion from block 2 causes underflow and the consequent concatenation of blocks 2 and 3

File StructuresSNU-OOPSLA Lab.35 Bottom up procedure to handle changes ** insert/delete in the sequence set as if there is no B-tree index set if blocks are split a new separator must be inserted into the index set if blocks are concatenated a separator must be removed from the index set if records are redistributed between blocks the value of a separator in the index set must be changed else no propagation to index set

File StructuresSNU-OOPSLA Lab.36 10.7 Index Set Block Size u size of an index node for the index set == size of a data block in the sequence set u Reasons for using a common block size u the best size for sequence set is usually the best for the index set u a common block size makes it easier to implement a buffering scheme u the index set blocks and sequence set blocks are often mingled within the same file u to avoid seeking between separate files while accessing the simple prefix B + tree

File StructuresSNU-OOPSLA Lab.37 10.8 Internal Structure of Index Set Blocks: A variable-order B-tree u Variable-length shortest separator u possibility of packing them into a node u separator index (fixed length) : means of performing binary searches on a list of variable-length entities u A simple prefix B + tree with a variable order u not maximum order -> not minimum depth u decisions about when to split, concatenate, or redistribute become more complicated

File StructuresSNU-OOPSLA Lab.38 separators As, Ba, Bro, C, Ch, Cra, Dele, Edi, Err, Fa, File 00 02 04 07 08 10 13 17 20 23 25AsBaBroCChCraDeleEdiErrFaFile Variable-length separators and corresponding index AsBaBroCChCraDeleEdiErrFaFile00 02 04 07 08 10 13 17 20 23 25 B00 B01..... B10 B11 1128 Separator count Total length of separators SeparatorsIndex to separators Relative block numbers Structure of an index set block

File StructuresSNU-OOPSLA Lab.39 10.9 Loading a Simple Prefix B+ Tree(1) u One way is successive insertions and splits u The other way is using separate loading process u working from a sorted file and then u place the records into sequence set block u if one block is full u determine the separator and insert it into the index set block u place the records into new sequence set block

File StructuresSNU-OOPSLA Lab.40 10.9 Loading Simple Prefix B+ Tree(2) u Advantages to using a separate loading process u the output can be written sequentially u simple than succcessive insert & split u performance during loading u can load 100% utilization (c.f. insert & split produces blocks between 67~80% full) u creating a degree of spatial locality

File StructuresSNU-OOPSLA Lab.41 10.10 B + Trees u Contains copies of actual keys u cf. simple prefix B + tree : separator ALWAYS/ASPECT/BETTER 001206 ALWAYS -ASK ASPECT -BEST ACCESS -ALSO Next separator: CATCH BETTER -CAST CATCH -CHECK Next sequence set block:

File StructuresSNU-OOPSLA Lab.42 10.11 B-Tree, B + Tree and Simple Prefix B + Tree in Perspective u Shared characteristics u Paged index structures : broad and shallow u Height-balanced u Growing from bottom-up u Possible to obtain greater storage efficiency through two- three block splitting, concatenation, redistribution u Can be implemented as virtual tree structures u Can be adapted for variable-length records

File StructuresSNU-OOPSLA Lab.43 B-Trees u General Characteristics u Information can be found at any level of the B-tree u B-tree take up less space than Ｂ + tree ( Ｂ + tree ｈ as additional space) u Ordered sequential access u Through in-order traversal of the tree(virtual tree is necessary) u Separated record files(B-tree has only pointers) are not workable

File StructuresSNU-OOPSLA Lab.44 B ＋ Trees u General Characteristics u Separation of index set and sequence set u Separators : copies of keys u Shallower tree than B-tree u Ordered sequential access u Sequence set is truly linear è efficient access to records in order by key

File StructuresSNU-OOPSLA Lab.45 Simple Prefix B + Trees u General Characteristics u Separators : smaller than actual keys u Shallower than Ｂ + Trees u Separator compression, variable-length field management overhead u Ordered sequential access u Sequence set is truly linear (same as Ｂ + Tree)

File StructuresSNU-OOPSLA Lab.46 Let’s Review !!! u 10.1 Indexed Sequential Access u 10.2 Maintaining a Sequence Set u 10.3 Adding a Simple Index to the Sequence Set u 10.5 The Contents of the Index: Separators Instead of Keys u 10.6 The Simple Prefix B + Tree Maintenance u 10.7 Index Set Block size u 10.8 Internal Structure of the Index Set Blocks: A variable-order B-Tree u 10.9 Loading a Simple Prefix B + Tree u 10.10 B + Trees u 10.11 B-Trees, B + Trees, and Simple Prefix B + Trees in Perspective

File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures.

Similar presentations

Presentation on theme: "File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures.

Similar presentations

Presentation on theme: "File StructuresSNU-OOPSLA Lab.1 Chap.10 Indexed Sequential File Access and Prefix B+ Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Structures."— Presentation transcript:

Similar presentations

About project

Feedback