CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

Hashing Part Two Better Collision Resolution Small parts of this material stolen from "File Organization and Access" by Austing and Cassel.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Managing Files of Records CS 3050, Spring /4/2007 Dr Melanie Martin.
CS 255: Database System Principles slides: Variable length data and record By:- Arunesh Joshi( 107) Id: Cs257_107_ch13_13.7.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
(c) University of Washingtonhashing-1 CSC 143 Java Hashing Set Implementation via Hashing.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
CSC 213 – Large Scale Programming. Today’s Goal  Consider what will be important when searching  Why search in first place? What is its purpose?  What.
Stacks & Recursion. Stack pushpop LIFO list - only top element is visible top.
CSC 213 – Large Scale Programming. Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In.
11 A First Game Program Session Session Overview  Begin the creation of an arcade game  Learn software design techniques that apply to any form.
CSC 212 – Data Structures Lecture 12: Java Review.
1 CSC 221: Computer Programming I Spring 2010 interaction & design  modular design: roulette game  constants, static fields  % operator, string equals.
Question of the Day While walking across a bridge I saw a boat filled with people. Nobody boarded or left the boat, but on board the boat there was not.
CS 106 Introduction to Computer Science I 03 / 19 / 2007 Instructor: Michael Eckmann.
LECTURE 37: ORDERED DICTIONARY CSC 212 – Data Structures.
File Processing - Indexing MVNC1 Indexing Jim Skon.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
CSC 213 – Large Scale Programming. Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating.
CSC 107 – Programming For Science. History of C  Dennis Ritchie developed C from 1969 – 1973  Based upon B (& other) earlier languages  Since its creation,
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
Question of the Day  On a game show you’re given the choice of three doors: Behind one door is a car; behind the others, goats. After you pick a door,
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Announcements  If you need more review of Java…  I have lots of good resources – talk to me  Use “Additional Help” link on webpage  Weekly assignments.
CSC 213 Lecture 10: BTrees. Announcements You should not need to do more than the lab exercise states  If only says add a CharRange, you should not need.
CSC 211 Data Structures Lecture 13
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
LECTURE 34: MAPS & HASH CSC 212 – Data Structures.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSC 107 – Programming For Science. Announcements  Locations of Macs to use outside of lab time  Can find on Library ground floor (6) & main floor (6)
CSC 107 – Programming For Science. History of C  Dennis Ritchie developed C from 1969 – 1973  While at Bell Labs, created language to develop Unix 
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Hashing Hashing is another method for sorting and searching data.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
3 Data. Software And Data Data Data element – a single, meaningful unit of data. Name Social Security Number Data structure – a set of related data elements.
“Never doubt that a small group of thoughtful, committed people can change the world. Indeed, it is the only thing that ever has.” – Margaret Meade Thought.
(c) University of Washington16-1 CSC 143 Java Linked Lists Reading: Ch. 20.
Building Java Programs Bonus Slides Hashing. 2 Recall: ADTs (11.1) abstract data type (ADT): A specification of a collection of data and the operations.
CSC 213 – Large Scale Programming Lecture 38: BTrees.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
LECTURE 20: RECURSION CSC 212 – Data Structures. Humorous Asides.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
Question of the Day  What three letter word completes the first word and starts the second one: DON???CAR.
2005MEE Software Engineering Lecture 7 –Stacks, Queues.
CS 106 Introduction to Computer Science I 03 / 22 / 2010 Instructor: Michael Eckmann.
LECTURE 21: RECURSION & LINKED LIST REVIEW CSC 212 – Data Structures.
Question of the Day  What three letter word completes the first word and starts the second one: DON???CAR.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
( ) 1 Chapter # 8 How Data is stored DATABASE.
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
CSC 213 – Large Scale Programming. Today’s Goal  Review when, where, & why we use Map s  Why Sequence -based approach causes problems  How hash can.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
CHP - 9 File Structures.
Containers and Lists CIS 40 – Introduction to Programming in Python
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
CSE 373 Data Structures and Algorithms
Indexing 4/11/2019.
Advance Database System
CSE 326: Data Structures Lecture #14
Presentation transcript:

CSC 213 – Large Scale Programming

Today’s Goals  Look at how Dictionary s used in real world  Where this would occur & why they are used there  In real world setting, what problems can/do occur  Indexed file usage presented and shown  How & why we split index & data files  Formatting of each file and how they get used  Describe what problems solved using indexed files  Java coding techniques that simplify using these files  Idea needed when using multiple indexes shown

Dictionaries in Real World  Often need large database on many machines  Split search terms across machines  Updating & searching work split between machines  Database way too large for any single machine  If you think about it, this is incredibly common  Where?

Split Dictionaries

Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

Splitting Keys From Values  In real world, we often have many indices  Simple units measure where we can find values  Values could be searched for in multiple ways

Index & Data Files  Split information into two (or more) files  Data file uses fixed-size records to store data  Index files contain search terms & data locations  Fixed-size records usually used in data file  Each record will use exactly that much space  Extra space wasted if the value is smaller  But limits data size, cannot get more space  Makes it far easier  Makes it far easier to reuse space & rebuild index

Index File Format  No standard format – depends on type of data  Often variable sized, but this not specific requirement  Each entry in index file begins with exact search term  Followed by position containing matching data  As a result, often find indexes smushed together  Can read indexes at start of program execution  Reasonably assumes index file smaller than data file  Changes written immediately, however  When program starts, do NOT read data file

Never Read Entire Data File

Indexed Files  Enables splitting search terms across computers  Alphabetical split searches faster on many servers A - C D-E F-H I-P Q-R S-T U-XY-Z

Indexed Files  Enables splitting search terms across computers  Create indexes for different types of searching Song name Song Length

How Does This Work?  Using index files simplified using positions  Look in index structure to find position of data in file  With this position can then seek to specific record  Create instance & initialize by reading data from file

Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord2F F224 IBM0 T112

Where Was "Searching" Used?  Indexed files used in Map s and Dictionary s  Read data into searchable object after opening file  For each record, Entry uses indexed data as its key  Single data file has multiple indexes to search it  Not a problem, each index has own Collection  Cannot have multiple instances for each data item  Cannot have single instance for each data item  Then how can we construct each Entry 's value?

Proxy Pattern For The Win!

 Create proxy instances to use as Entry 's value  Proxy pretends has data by defining getters & setters  Data's position & file only fields these objects have  Whenever method called looks up & returns data  Other classes will think proxy has fields declared  Simplifies using class & ensures up-to-date data used  But little memory needed, since data resides on disk!

Starting with Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23T F224 IBM0 T112 Ford12F

Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; }

Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Fixed max. size of each field Fixed size of a record in data file

Coding public class Stock { private static final int NAME_OFF = 0; private static final int NAME_SZ = 50; private static final int PRC_OFF=NAME_OFF + NAME_SZ; private static final int PRC_SZ = 4; private static final int TICK_OFF = PRC_OFF + PRC_SZ; private static final int TICK_SZ = 6; private static final int SIZE = TICK_OFF + TICK_SZ; private long position; private RandomAccessFile theFile; public Stock(long pos, RandomAccessFile file) { position = pos; theFile = file; } Offset in record to field start

Coding public class Stock { // Continues from last time public int getStockPrice() { theFile.seek(position + PRC_OFF); return theFile.readInt(); } public void setStockPrice(int price) { theFile.seek(position + PRC_OFF); theFile.writeInt(price); } public void setTickerSymbol(String sym) { theFile.seek(position + TICK_OFFSET); theFile.writeUTF(sym); } // More getters & setters from here…

Visualizing Indexed Files American Telephone & Telegraph112 International Business Machines0 Ford Motorcars, Inc.224 F IBM0 T112 IBM106IBMAT & T23TFord12F

How Do We Add Data?  Adding new records takes only a few steps  Add space for record with setLength on data file  Update index structure(s) to include new record  Records in data file updated at each change

Adding New Data To The Files C336 F224 IBM0 T112 0Ø American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

Adding New Data To The Files C336 F224 IBM0 T112 Citibank-2C American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 IBM106IBMAT & T23TFord12F

How Does This Work?  Removing records even easier  To prevent using record, remove items from indexes  Do NOT update index file(s) until program completes  Use impossible magic numbers for record in data file

Removing Data As We Go C336 F224 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Ford Motorcars, Inc.224 Citibank-2CIBM106IBMAT & T23TFord12F

Removing Data As We Go C336 IBM0 T112 American Telephone & Telegraph112 Citibank336 International Business Machines0 Citibank-2CIBM106IBMAT & T23T0Ø

Using Multiple Indexes  Multiple indexes for data file very often needed  Provides many ways of searching for important data  Since file read individually could also create problem  Multiple proxy instances for data could be created  Duplicates of instance are created for each index  Makes removing them all difficult, since not linked  Very easy to solve: use Map while loading index  Converts positions in file to proxy instances to solve this

Linking Multiple Indexes  Use one Map instance while reading all indexes  For each position in file, check if already in Map  Use existing proxy instance, if position already in Map  If a search in Map returns null, create new instance  Make sure to call put() when we must create proxy

For Next Lecture  Angel now has week #9 assignment (due 3/20)  This is after break, but might want to get start now  Angel will also have project #2 available  Has staggered submissions like previous project  Based upon index files, so can start working now!  Will discuss implementing space efficient BST red black  Start coloring nodes red & black  Keeps balanced, but limits amount of movement