Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In.

Similar presentations


Presentation on theme: "CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In."— Presentation transcript:

1 CSC 213 – Large Scale Programming

2 Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In real world setting, what problems can/do occur  Indexed file usage presented and shown  How & why we split index & data files  Formatting of each file and how they get used  Describe what problems solved using indexed files  Java coding techniques that simplify using these files  Idea needed when using multiple indexes shown

3 Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating & searching work split between machines  Database way too large for any single machine  If you think about it, this is incredibly common  Where?

4 Split Dictionaries

5

6 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

7 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

8 Index & Data Files  Split information into two (or more) files  Data file uses fixed-size records to store data  Index files contain search terms & data locations  Fixed-size records usually used in data file  Each record will use exactly that much space  Extra space wasted if the value is smaller  But limits data size, cannot get more space  Makes it far easier  Makes it far easier to reuse space & rebuild index

9 Index File Format  No standard format – depends on type of data  Often variable sized, but this not specific requirement  Each entry in index file begins with exact search term  Followed by position containing matching data  As a result, often find indexes smushed together  Can read indexes at start of program execution  Reasonably assumes index file smaller than data file  Changes written immediately, however  When program starts, do NOT read data file

10 Never Read Entire Data File

11 Indexed Files  Enables splitting search terms across computers  Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z

12 Indexed Files  Enables splitting search terms across computers  Create indexes for different types of searching Song name Song Length

13 How Does This Work?  Using index files simplified using positions  Look in index structure to find position of data in file  With this position can then seek to specific record  Create instance & initialize by reading data from file

14 Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord2F F224 IBM0 T112

15 Where Was "Searching" Used?  Indexed files used in Map s and Dictionary s  Read data into searchable object after opening file  For each record, Entry uses indexed data as its key  Single data file has multiple indexes to search it  Not a problem, each index has own Collection  Cannot have multiple instances for each data item  Cannot have single instance for each data item  Then how can we construct each Entry 's value?

16 Proxy Pattern For The Win!

17  Create proxy instances to use as Entry 's value  Proxy pretends has data by defining getters & setters  Data's position & file only fields these objects have  Whenever method called looks up & returns data  Other classes will think proxy has fields declared  Simplifies using class & ensures up-to-date data used  But little memory needed, since data resides on disk!

18 Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23T F224 IBM0 T112 Ford12F

19 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; }

20 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Fixed max. size of each field Fixed size of a record in data file

21 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Offset in record to field start

22 Coding public class Stock { // Continues from last time public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt(); } public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price); } public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym); } // More getters & setters from here…

23 Visualizing Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 F IBM0 T112 IBM106IBMAT & T23TFord12F

24 How Do We Add Data?  Adding new records takes only a few steps  Add space for record with setLength on data file  Update index structure(s) to include new record  Records in data file updated at each change

25 Adding New Data To The Files C336 F224 IBM0 T112 0Ø American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

26 Adding New Data To The Files C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

27 How Does This Work?  Removing records even easier  To prevent using record, remove items from indexes  Do NOT update index file(s) until program completes  Use impossible magic numbers for record in data file

28 Removing Data As We Go C336 F224 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 Citibank-2CIBM106IBMAT & T23TFord12F

29 Removing Data As We Go C336 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Citibank-2CIBM106IBMAT & T23T0Ø

30 Using Multiple Indexes  Multiple indexes for data file very often needed  Provides many ways of searching for important data  Since file read individually could also create problem  Multiple proxy instances for data could be created  Duplicates of instance are created for each index  Makes removing them all difficult, since not linked  Very easy to solve: use Map while loading index  Converts positions in file to proxy instances to solve this

31 Linking Multiple Indexes  Use one Map instance while reading all indexes  For each position in file, check if already in Map  Use existing proxy instance, if position already in Map  If a search in Map returns null, create new instance  Make sure to call put() when we must create proxy

32 For Next Lecture  Angel now has week #9 assignment (due 3/20)  This is after break, but might want to get start now  Angel will also have project #2 available  Has staggered submissions like previous project  Based upon index files, so can start working now!  Will discuss implementing space efficient BST red black  Start coloring nodes red & black  Keeps balanced, but limits amount of movement


Download ppt "CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In."

Similar presentations


Ads by Google