Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In.

Similar presentations


Presentation on theme: "CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In."— Presentation transcript:

1 CSC 213 – Large Scale Programming

2 Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In real world setting, what problems can/do occur  Indexed file usage presented and shown  How & why we split index & data files  Formatting of each file and how they get used  Describe what problems solved using indexed files  Java coding techniques that simplify using these files  Idea needed when using multiple indexes shown

3 Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating & searching work split between machines  Database way too large for any single machine  If you think about it, this is incredibly common  Where?

4 Split Dictionaries

5

6 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

7 Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

8 Index & Data Files  Split information into two (or more) files  Data file uses fixed-size records to store data  Index files contain search terms & data locations  Fixed-size records usually used in data file  Each record will use exactly that much space  Extra space wasted if the value is smaller  But limits data size, cannot get more space  Makes it far easier  Makes it far easier to reuse space & rebuild index

9 Index File Format  No standard format – depends on type of data  Often variable sized, but this not specific requirement  Each entry in index file begins with exact search term  Followed by position containing matching data  As a result, often find indexes smushed together  Can read indexes at start of program execution  Reasonably assumes index file smaller than data file  Changes written immediately, however  When program starts, do NOT read data file

10 Never Read Entire Data File

11 Indexed Files  Enables splitting search terms across computers  Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z

12 Indexed Files  Enables splitting search terms across computers  Create indexes for different types of searching Song name Song Length

13 How Does This Work?  Using index files simplified using positions  Look in index structure to find position of data in file  With this position can then seek to specific record  Create instance & initialize by reading data from file

14 Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord2F F224 IBM0 T112

15 Where Was "Searching" Used?  Indexed files used in Map s and Dictionary s  Read data into searchable object after opening file  For each record, Entry uses indexed data as its key  Single data file has multiple indexes to search it  Not a problem, each index has own Collection  Cannot have multiple instances for each data item  Cannot have single instance for each data item  Then how can we construct each Entry 's value?

16 Proxy Pattern For The Win!

17  Create proxy instances to use as Entry 's value  Proxy pretends has data by defining getters & setters  Data's position & file only fields these objects have  Whenever method called looks up & returns data  Other classes will think proxy has fields declared  Simplifies using class & ensures up-to-date data used  But little memory needed, since data resides on disk!

18 Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23T F224 IBM0 T112 Ford12F

19 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; }

20 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Fixed max. size of each field Fixed size of a record in data file

21 Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Offset in record to field start

22 Coding public class Stock { // Continues from last time public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt(); } public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price); } public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym); } // More getters & setters from here…

23 Visualizing Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 F IBM0 T112 IBM106IBMAT & T23TFord12F

24 How Do We Add Data?  Adding new records takes only a few steps  Add space for record with setLength on data file  Update index structure(s) to include new record  Records in data file updated at each change

25 Adding New Data To The Files C336 F224 IBM0 T112 0Ø American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

26 Adding New Data To The Files C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

27 How Does This Work?  Removing records even easier  To prevent using record, remove items from indexes  Do NOT update index file(s) until program completes  Use impossible magic numbers for record in data file

28 Removing Data As We Go C336 F224 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 Citibank-2CIBM106IBMAT & T23TFord12F

29 Removing Data As We Go C336 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Citibank-2CIBM106IBMAT & T23T0Ø

30 Using Multiple Indexes  Multiple indexes for data file very often needed  Provides many ways of searching for important data  Since file read individually could also create problem  Multiple proxy instances for data could be created  Duplicates of instance are created for each index  Makes removing them all difficult, since not linked  Very easy to solve: use Map while loading index  Converts positions in file to proxy instances to solve this

31 Linking Multiple Indexes  Use one Map instance while reading all indexes  For each position in file, check if already in Map  Use existing proxy instance, if position already in Map  If a search in Map returns null, create new instance  Make sure to call put() when we must create proxy

32 What to Study for Midterm  Study your Map s and Dictionary s Why?  When would we use each of the ADTs? Why?  What do their methods do? Why do they differ?  Consider each implementation of these ADTs  Explain why method has its given big-Oh complexity  Why use an implementation? Where is it used?  What are negatives or limitations of implementation?  What fields needed by implementation? Why is this?

33 What to Study for Midterm  Hash tables  How do hash functions work? What does mod do?  How do we add & remove data from hash table?  What are collisions & how do we handle them?  What is real & pretend big-Oh complexity? Why?  Binary Search Trees  How do we add, remove, & search in these trees?  How are data in BSTs organized? Tricks to their use?  How do we code & use BSTs? What methods exist?

34 What to Study for Midterm  List-based approaches – Why? When?  Hash tables  How do hash functions work? What does mod do?  How do we add & remove data from hash table?  What are collisions & how do we handle them?  What is real & pretend big-Oh complexity? Why?  Binary Search Trees  How do we add, remove, & search in these trees?  How are data in BSTs organized? Tricks to their use?  How do we code & use BSTs? What methods exist?

35 What to Study for Midterm  AVL Trees  How do we add, remove, & search in these trees?  How are data in them organized? Tricks to their use?  When must we reorganize tree? How is this done?  Splay Trees  How do we add, remove, & search in these trees?  For each method is node splayed & which one?  How to chain splayings together? When do we stop?

36 What to Study for Midterm  Class selection & design  Where do classes come from? How do we know?  When to use each connection between classes?  How to list methods & fields in UML class diagram?  Comments & Outlines  When, where, and how much?  What should & should not be included?

37 Midterm Process  Open-book & open-note test; do not memorize  But have methods & information at your fingertips  Use my slides ONLY with note(s) on that day's slides  Cannot use daily or weekly activities  Must submit all printed pages along with test  Problems resembles tone of those already seen  All new problems, however; do not memorize answers  Includes tracing, showing state of ADT, method returns  Coding, big-Oh analysis, and more can be asked

38 For Next Lecture


Download ppt "CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In."

Similar presentations


Ads by Google