Exercise Open the Store database and copy the m-customer dataset into a file called Custfile Then look at the contents of Custfile >base store >get m-customer.

Slides:



Advertisements
Similar presentations
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Advertisements

Chapter 2.7 Data management.
1 PowerHouse and Suprtool Page n Why use Suprtool?2 n Combining Suprtool and QUIZ 4 n Linking multiple files8 n Creating complex subfiles14 n Importing.
1 Working with Suprlink Page n Accessing Suprlink4 n A sample scenario6 n Self-describing files15 n Linking files19 n Adding information32 n Suprlink requirements41.
Comp 335 File Structures Indexes. The Search for Information When searching for information, the information desired is usually associated with a key.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
1 Hash Tables Gordon College CS Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary.
1 Inside Module 5 Working with Files Page n Copying files 2 n Working with ordinary disc files3 n Defining fields 4 n New or existing files7 n Writing.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
1 Inside Module 2 Working with Databases Page n Choosing input from databases2 n Reading an entire dataset serially6 n Determining fields in a dataset8.
1 HowMessy How Messy is Your Database Page n How messy is your database? 2 n Hashing algorithm5 n Interpreting master dataset lines12 n Master dataset.
1 Physical Data Organization and Indexing Lecture 14.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Identifying Reversible Functions From an ROBDD Adam MacDonald.
Inside Module 7 Exporting Data to the World Page n Exporting data to other applications2 n STExport converts the data4 n Running STExport5 n Dates and.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
More about Databases. Data Entry through Forms Table View (Data sheet view) is useful for data entry of new records But sometimes customization would.
Microsoft Access. Microsoft access is a database programs that allows you to store retrieve, analyze and print information. Companies use databases for.
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
13- 1 Chapter 13.  Overview of Sequential File Processing  Sequential File Updating - Creating a New Master File  Validity Checking in Update Procedures.
B+ Trees: An IO-Aware Index Structure Lecture 13.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
Presentation on Database management Submitted To: Prof: Rutvi Sarang Submitted By: Dharmishtha A. Baria Roll:No:1(sem-3)
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
Create a PO-Based Invoice
Databases.
Module 11: File Structure
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Indexing Goals: Store large files Support multiple search keys
Indexing and hashing.
CS522 Advanced database Systems
Data and Information.
Database application MySQL Database and PhpMyAdmin
Inside Module 10 Editing TurboIMAGE Datasets Page
Inventory Transactions庫存交易
SECTION 5: INFORMATION PROCESSING
Boeing Supply Chain Platform (BSCP) Detailed Training
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Display Item Information
Database Queries.
Database Implementation Issues
Disk storage Index structures for files
Suprtool Suprtool High Speed Database Extract for HP 3K/9K
Databases Lesson 2.
More about Databases.
Inside Module 2 Working with Databases Page
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Inside Module 6 Working with Suprlink Page Accessing Suprlink 4
Database Management System
Flat Files & Relational Databases
DATABASE IMPLEMENTATION ISSUES
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Chapter 12 Query Processing (1)
Inside Module 8 Extracting Data Page Using the Extract command 2
Access Test Questions Test Date: 05/05/16.
Grauer and Barber Series Microsoft Access Chapter One
Indexes and more Table Creation
Database Implementation Issues
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Cycle 3: Unit 27 Lessons 104 – 111.
Finding Your GP Data Ian Richardson BA CPA CGA Colin Pich CPA CGA.
Database Implementation Issues
Professional Services Tools Library (PSTL)
Purchase Document Management
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

Exercise Open the Store database and copy the m-customer dataset into a file called Custfile Then look at the contents of Custfile >base store >get m-customer >output custfile >xeq >input custfile >output * >list >list char In addition to Output *, try these commands: List List Char 14 Answers to Module 1 14

Exercise GET versus CHAIN: quick, choose one! Ord-Line detail dataset has2.3 million records of 308 bytes Ordfile has 162,000 key values which will select 261,000 records chain ord-line,ord-num=my-table table my-table,ord-num,file-ordfile output myfile xeq get ord-line table my-table,ord-num,file,ordfile if $lookup(my-table,ord-num) sort ord-num output myfile xeq The Chain method will use 162,000 disc reads for Dbfinds, and 261,000 reads for Dbgets, resulting in a total of 423,000 disc reads. Because Suprtool maintains the Ord-nums in the table in sorted sequence, and the Chain command reads the records in the table sequence, there is no need to sort the records. They will be retrieved in the desired order. The Get command reads many contiguous records with each disc access. Each Get reads 50,000 bytes, so Suprtool will read 162 records with each disc read. At 162 records per read, the entire 2.3 million record dataset will be read using only 14,197 disc reads. Compared to Chain's 423,000 disc reads, that's a saving of 98.7 percent of the I/O. It's true that using the Get method you must sort the records, but adding the sort still results in a major saving over the Chain method. With the Ord-line dataset, the Get command will always take 14,197 reads, regardless of the number of records selected. However, the performance of Chain will depend on the number of Ord-nums and records selected. If there are few enough records to be selected, Chain will be faster. So, while it may be more intuitive to do a chained read when you have index values of the required records, it is often more efficient to read the whole dataset sequentially and simply not select the unwanted records. When in doubt, use the GET command to read the dataset sequentially, with Set Statistics On. If the "Input FREAD calls" returned is less than the number of records selected, Get will be faster than Chain. 13 This was published in the What’s Up, DOCumentation newsletter, 1994 issue #6. Answers to Module 2 13

Exercise Create a listing of the Alberta customers Create the following report from the STORE database: Mar 20, 1995 20:32 Alberta Customers Page 1 Account# Name City 10004 Rogers Edmonton 10005 Coyle Edmonton 10006 Frahm Calgary 10007 Tiernan Calgary 10015 Young Edmonton 10016 Bamford Edmonton 10017 Morrison Calgary 10018 Johnston Calgary >base store >get m-customer >if state-code = "AL" >extract cust-account >extract name-last >extract city >list standard,& >>title "Alberta Customers",& >>heading "Account# Name City" >sort cust-account >xeq 21 Answers to Module 2 21

Exercise Duplicates, Duplicates, Duplicates, Duplicates Exercise 1: Create a list of all the states/provinces in which we have customers. Exercise 2: List all the dates on which we made more than one sale. Bonus Exercise 3: List all the sales made on those dates. Hints: requires two passes, and the Table command *** Exercise 1 >base store >get m-customer >sort state-code >duplicate none keys >extract state-code >list standard >output states,link >xeq *** Exercise 2 and first pass of Exercise 3 >get d-sales >sort purch-date >duplicate only keys >extract purch-date >output purchdt,link *** Second pass of Exercise 3 >table date-tbl,purch-date,sorted,purchdt >if $lookup(date-tbl,purch-date) 26 Answers to Module 4 26

Suprlink Exercise 1 From the Store database, find all the products of British Columbia suppliers with inventories less than 20 You should include the product number, quantity in stock, as well as the supplier's name and number Solution >get d-inventory >if on-hand-qty < 20 >sort supplier-no >extract supplier-no,product-no,on-hand-qty >output prodfile,temp,link >xeq >get m-supplier >if state-code = "BC" >sort supplier-no >extract supplier-no,supplier-name >output suppfile,temp,link >xeq >link input prodfile >link link suppfile >link output listfile,temp >link xeq >input listfile;list standard;xeq 31 Answers to Module 5 31

Suprlink Exercise 2 Add the product price to the list in Exercise 1 (page 31) SUPPLIER- PRODUCT-N ON-HAND-QTY SUPPLIER-NAME 5051 50512501 7 Makita Canada Inc. 5051 50511501 5 Makita Canada Inc. 5051 50512001 2 Makita Canada Inc. 5051 50513001 3 Makita Canada Inc. 5052 50521001 10 Black & Decker ... Solution >get d-sales >sort product-no >item product-price,decimal,2 >extract product-no,product-price >output descfile,temp,link >xeq >input listfile >sort product-no >output = input >xeq >link input listfile >link link descfile >link output ordrfile temp >link xeq [COMPLETE PRICE LIST IN DEMO] 40 Answers to Module 5 40

HTML Exercise Create an HTML Table that looks like this: 16 >export input custsale >export html table title "Purchase History" & heading "Customer Purchase History" >export date mmddyy "-" invalid " " >export sign none >export heading none >export heading column "Acct #” >export heading column "Surname" >export heading column "Given Name" >export heading column "Credit Limit" >export heading column "Total Amount Purchased" >export heading column "# of Purchases" >export heading column "Earliest Purchase" >export heading column "Latest Purchase" >export purge htmlout >export output htmlout >export xeq 16 Answers to Module 6 16

HowMessy Exercise #1 (Master) Secon-Max Type Load daries Blks Blk Data Set Capacity Entries Factor (Highwater) Fact A-MASTER Ato 14505679 9709758 66.9% 36.8% 2395 29 Max Ave Std Expd Avg Ineff Elong- Search Field Chain Chain Dev Blocks Blocks Ptrs ation MASTER-KEY 37 1.58 1.26 1.00 1.88 48.5% 1.88 Suggestions: The number of secondaries, MaxBlks, MaxChain and Inefficient pointers are a bit high. Look at the search item data type. There is a possibility that some clustering is happening here. MaxChain could be an indication but with a dataset of this size, it is hard to tell for sure. 16 Answers to Module 11 16

HowMessy Exercise #2 (Detail) Secon-Max Type Load daries Blks Blk Data Set Capacity Entries Factor (Highwater) Fact D-ITEMS Det 620571 119213 19.2% ( 242025) 7 Max Ave Std Expd Avg Ineff Elong- Search Field Chain Chain Dev Blocks Blocks Ptrs ation S ! ITEM-NO 3 1.00 0.02 1.00 1.00 0.0% 1.00 S SUPPLIER-NO 23 8.07 3.25 1.77 3.30 28.4% 1.86 LOCATION 5938 11.62 63.64 2.24 2.53 13.2% 1.13 BO-STATUS 99999 99999.99 0.00 17031.00 17047.00 14.3% 1.00 DISCOUNT 99999 120.18 1337.15 3.73 39.37 31.9% 10.55 Suggestions: The Highwater mark is more than twice the number of entries. Repacking would definitely help serial scans. Primary path should be on the Supplier-no or Location search path There might some logical inconsistency on the Itemno search path. The average chain is1.00 and MaxChain is 3. There is a good chance there should be only one entry per item number. If that is the case, the sorted path on Itemno is probably not needed The BO-STATUS path is unnecessary. MaxChain and AveChain are very high, probably equal. The current number of entries can be stored in 17,031 blocks which is the number of expected blocks for this path. This indicates that all the entries contain the same value. The last path, DISCOUNT, could possibly be removed. You have to look at how it is used and the access frequency. MaxChain is very high indicating that a large number of records have the same value. Reading along this chain will be very slow. Unlike BO-STATUS, there are entries with other values with much smaller chains as indicated by AveChain and StdDev. Reading along these chains will be fairly fast. 27 Answers to Module 11 27