Download presentation
Presentation is loading. Please wait.
Published byCornelia Gibson Modified over 9 years ago
1
CSC 213 – Large Scale Programming
2
Dictionaries in Real World Often need large database on many machines Split search terms across machines Updating & searching work split between machines Database way too large for any single machine If you think about it, this is incredibly common Where?
3
Split Dictionaries
4
Splitting Keys From Values In real world, we often have many indices Simple units measure where we can find values Values could be searched for in multiple ways
5
Splitting Keys From Values In real world, we often have many indices Simple units measure where we can find values Values could be searched for in multiple ways
6
Index & Data Files Split information into two (or more) files Data file uses fixed-size records to store data Index files contain search terms & data locations Fixed-size records usually used in data file Each record will use exactly that much space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier Makes it far easier to reuse space & rebuild index
7
Index File Format No standard format – depends on type of data Often variable sized, but this not specific requirement Each entry in index file begins with exact search term Followed by position containing matching data As a result, often find indexes smushed together Can read indexes at start of program execution Reasonably assumes index file smaller than data file Changes written immediately, however When program starts, do NOT read data file
8
Never Read Data File
9
Indexed Files Enables splitting search terms across computers Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z
10
Indexed Files Enables splitting search terms across computers Create indexes for different types of searching Song name Song Length
11
How Does This Work? Using index files simplified using positions Look in index structure to find position of data in file With this position can then seek to specific record Create instance & initialize by reading data from file
12
Starting with Indexed Files IBM106IBMAT & T23TFord2F American Telephone & Telegraph0 International Business Machines112 Ford Motorcars, Inc.224 F IBM0 T112
13
How Does This Work? Adding new records takes only a few steps Add space for record with setLength on data file Update index structure(s) to include new record Records in data file updated at each change
14
Adding New Data To The Files IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 0 American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224
15
Adding New Data To The Files IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224
16
How Does This Work? Removing records even easier To prevent using record, remove items from indexes Do NOT update index file(s) until program completes Use impossible magic numbers for record in data file
17
Removing Data As We Go IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224
18
Removing Data As We Go IBM106IBMAT & T23TFord0Ø C336 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112
19
For Next Lecture Weekly assignment still available online Continues to be due Wednesday at 5PM Ask me questions, if you have trouble on a problem Reading Section 9.1 in textbook about Map ADT How do we look up data? What other ADTs are out there? How could they relate to today's lecture?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.