Download presentation
Presentation is loading. Please wait.
Published byLeslie Harris Modified over 9 years ago
1
CSC 213 – Large Scale Programming
2
Today’s Goals Look at how Dictionary s used in real world Where this would occur & why they are used there In real world setting, what problems can/do occur Indexed file usage presented and shown How & why we split index & data files Formatting of each file and how they get used Describe what problems solved using indexed files Java coding techniques that simplify using these files Idea needed when using multiple indexes shown
3
Dictionaries in Real World Often need large database on many machines Split search terms across machines Updating & searching work split between machines Database way too large for any single machine If you think about it, this is incredibly common Where?
4
Split Dictionaries
6
Splitting Keys From Values In real world, we often have many indices Simple units measure where we can find values Values could be searched for in multiple ways
7
Splitting Keys From Values In real world, we often have many indices Simple units measure where we can find values Values could be searched for in multiple ways
8
Index & Data Files Split information into two (or more) files Data file uses fixed-size records to store data Index files contain search terms & data locations Fixed-size records usually used in data file Each record will use exactly that much space Extra space wasted if the value is smaller But limits data size, cannot get more space Makes it far easier Makes it far easier to reuse space & rebuild index
9
Index File Format No standard format – depends on type of data Often variable sized, but this not specific requirement Each entry in index file begins with exact search term Followed by position containing matching data As a result, often find indexes smushed together Can read indexes at start of program execution Reasonably assumes index file smaller than data file Changes written immediately, however When program starts, do NOT read data file
10
Never Read Entire Data File
11
Indexed Files Enables splitting search terms across computers Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z
12
Indexed Files Enables splitting search terms across computers Create indexes for different types of searching Song name Song Length
13
How Does This Work? Using index files simplified using positions Look in index structure to find position of data in file With this position can then seek to specific record Create instance & initialize by reading data from file
14
Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord2F F224 IBM0 T112
15
Where Was "Searching" Used? Indexed files used in Map s and Dictionary s Read data into searchable object after opening file For each record, Entry uses indexed data as its key Single data file has multiple indexes to search it Not a problem, each index has own Collection Cannot have multiple instances for each data item Cannot have single instance for each data item Then how can we construct each Entry 's value?
16
Proxy Pattern For The Win!
17
Create proxy instances to use as Entry 's value Proxy pretends has data by defining getters & setters Data's position & file only fields these objects have Whenever method called looks up & returns data Other classes will think proxy has fields declared Simplifies using class & ensures up-to-date data used But little memory needed, since data resides on disk!
18
Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23T F224 IBM0 T112 Ford12F
19
Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; }
20
Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Fixed max. size of each field Fixed size of a record in data file
21
Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Offset in record to field start
22
Coding public class Stock { // Continues from last time public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt(); } public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price); } public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym); } // More getters & setters from here…
23
Visualizing Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 F IBM0 T112 IBM106IBMAT & T23TFord12F
24
How Do We Add Data? Adding new records takes only a few steps Add space for record with setLength on data file Update index structure(s) to include new record Records in data file updated at each change
25
Adding New Data To The Files C336 F224 IBM0 T112 0Ø American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F
26
Adding New Data To The Files C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F
27
How Does This Work? Removing records even easier To prevent using record, remove items from indexes Do NOT update index file(s) until program completes Use impossible magic numbers for record in data file
28
Removing Data As We Go C336 F224 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 Citibank-2CIBM106IBMAT & T23TFord12F
29
Removing Data As We Go C336 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Citibank-2CIBM106IBMAT & T23T0Ø
30
Using Multiple Indexes Multiple indexes for data file very often needed Provides many ways of searching for important data Since file read individually could also create problem Multiple proxy instances for data could be created Duplicates of instance are created for each index Makes removing them all difficult, since not linked Very easy to solve: use Map while loading index Converts positions in file to proxy instances to solve this
31
Linking Multiple Indexes Use one Map instance while reading all indexes For each position in file, check if already in Map Use existing proxy instance, if position already in Map If a search in Map returns null, create new instance Make sure to call put() when we must create proxy
32
For Next Lecture Angel now has week #9 assignment (due 3/20) This is after break, but might want to get start now Angel will also have project #2 available Has staggered submissions like previous project Based upon index files, so can start working now! Will discuss implementing space efficient BST red black Start coloring nodes red & black Keeps balanced, but limits amount of movement
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.