Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 213 – Large Scale Programming. Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating.

Similar presentations


Presentation on theme: "CSC 213 – Large Scale Programming. Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating."— Presentation transcript:

1 CSC 213 – Large Scale Programming

2 Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating & searching work split between machines  Database way too large for any single machine  If you think about it, this is incredibly common  Where?

3 Split Dictionaries

4 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

5 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

6 Index & Data Files  Split information into two (or more) files  Data file uses fixed-size records to store data  Index files contain search terms & data locations  Fixed-size records usually used in data file  Each record will use exactly that much space  Extra space wasted if the value is smaller  But limits data size, cannot get more space  Makes it far easier  Makes it far easier to reuse space & rebuild index

7 Index File Format  No standard format – depends on type of data  Often variable sized, but this not specific requirement  Each entry in index file begins with exact search term  Followed by position containing matching data  As a result, often find indexes smushed together  Can read indexes at start of program execution  Reasonably assumes index file smaller than data file  Changes written immediately, however  When program starts, do NOT read data file

8 Never Read Data File

9 Indexed Files  Enables splitting search terms across computers  Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z

10 Indexed Files  Enables splitting search terms across computers  Create indexes for different types of searching Song name Song Length

11 How Does This Work?  Using index files simplified using positions  Look in index structure to find position of data in file  With this position can then seek to specific record  Create instance & initialize by reading data from file

12 Starting with Indexed Files IBM106IBMAT & T23TFord2F American Telephone & Telegraph0 International Business Machines112 Ford Motorcars, Inc.224 F IBM0 T112

13 How Does This Work?  Adding new records takes only a few steps  Add space for record with setLength on data file  Update index structure(s) to include new record  Records in data file updated at each change

14 Adding New Data To The Files IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 0 American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224

15 Adding New Data To The Files IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224

16 How Does This Work?  Removing records even easier  To prevent using record, remove items from indexes  Do NOT update index file(s) until program completes  Use impossible magic numbers for record in data file

17 Removing Data As We Go IBM106IBMAT & T23TFord2F C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112 Ford Motorcars, Inc.224

18 Removing Data As We Go IBM106IBMAT & T23TFord0Ø C336 IBM0 T112 Citibank-2C American Telephone & Telegraph0 Citibank336 International Business Machines112

19 For Next Lecture  Weekly assignment still available online  Continues to be due Wednesday at 5PM  Ask me questions, if you have trouble on a problem  Reading Section 9.1 in textbook about Map ADT  How do we look up data?  What other ADTs are out there?  How could they relate to today's lecture?


Download ppt "CSC 213 – Large Scale Programming. Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating."

Similar presentations


Ads by Google