Database Management Systems

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Tutorial 6: normalize the following relation to 1NF, 2NF, and 3NF TransIDRentDateCustomerIDLastNamePhoneAddressVideoIDCopy#TitleRent 14/18/043Washington
Jerry Post McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. Database Management Systems Chapter 2 Database Design.
Jerry Post Copyright © Database Management Systems Chapter 1 Introduction.
Database Management Systems
Chapter 2 Database System Design (part II)
DATABASE APPLICATION DEVELOPMENT SAK 3408 Introduction (week 1)
Ch1: File Systems and Databases Hachim Haddouti
Jerry Post McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. Database Management Systems Chapter 1.
Jerry Post McGraw-Hill/Irwin Copyright © 2005 by The McGraw-Hill Companies, Inc. All rights reserved. Database Management Systems Chapter 1 Introduction.
Database Design Chapter 2. Goal of all Information Systems  To add value –Reduce costs –Increase sales or revenue –Provide a competitive advantage.
Based on G. Post, Database Management Systems University of Manitoba Asper School of Business 3500 DBMS Bob Travica Updated 2015 Chapter 1 Introduction.
1 Copyright © 2010 Jerry Post. All rights reserved. Introduction to DBMS IS240 – DBMS Lecture #2 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
1 Copyright © 2010 Jerry Post. All rights reserved. Database System Design IS240 – DBMS Lecture #3 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
Jerry Post Copyright © Database Management Systems Chapter 3 Data Normalization.
1 Copyright © 2010 Jerry Post. All rights reserved. Data Normalization (1) IS240 – DBMS Lecture # 4 – M. E. Kabay, PhD, CISSP-ISSMP Assoc. Prof.
DATABASE DESIGN LECTURE FOUR. Why Design a Database? Goal:  To produce an information system that adds value for the user  Reduce costs  Increase sales/revenue.
MIS 327 Database Management system 1 MIS 327: DBMS Dr. Monther Tarawneh Dr. Monther Tarawneh Week 6: Database Design: Example Rolling Thunder.
Jerry Post Copyright © Database Management Systems Chapter 1 Introduction.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Database Development and Data Normalization. 2 What is a Database and a DBMS?  Database  A collection of data stored in a standardized format, designed.
University of Manitoba Asper School of Business 3500 DBMS Bob Travica
DBSYSTEMS Chapter 3 Data Normalization Get data properly tabled! Based on G. Post, DBMS: Designing & Building Business Applications University of Manitoba.
1 All Powder Board and Ski SQL Server Workbook Chapter 2: Database Design Jerry Post Copyright © 2004.
1.NET Web Forms Business Forms © 2002 by Jerry Post.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
School of Computer & Communication of LNPU 辽宁石油化工大学计算机与通信工程学院 刘旸 1 Chapter 2 Database Design 第二章 数据库设计 数据库管理系统 Database Management Systems.
DATABASE MIS 327 Advanced Database 1. DATABASE 2 Objectives  Why are models important in designing systems?  How do you begin a database project? 
Jerry Post Copyright © Database Management Systems Chapter 2 Database Design.
1 Data Normalization Text book Chapter 3: Jerry Post Copyright © 2003.
CS 325 Spring ‘09 Chapter 1 Goals:
Database Management Systems
Data Normalization (1) IS240 – DBMS Lecture # 4 –
Database Development Lifecycle
Introduction To DBMS.
Microsoft Office Access 2010 Lab 1
Chapter 1 Introduction.
Databases Chapter 9 Asfia Rahman.
Chapter 1: Introduction
Chapter 6 - Database Implementation and Use
M. E. Kabay, PhD, CISSP-ISSMP V:
Get data properly tabled!
Normalization Karolina muszyńska
MIS2502: Data Analytics Relational Data Modeling
Data Normalization (2) IS 240 – Database Lecture #5 –
Database Management:.
Information Systems Today: Managing in the Digital World
Chapter 2: Database Design All Powder Board and Ski
Chapter 1: Introduction
Database Management System
Chapter Ten Managing a Database.
All Powder Board and Ski
9/22/2018.
Database Management Systems
Introduction to Database Management System
Database System Design
Supplement: Using the DBDesign System
Relational Database Model
Introduction to DBMS IS240 – DBMS Lecture #2 –
CIS16 Application Programming with Visual Basic
Chapter 1: Introduction
Database Management Systems
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 3 Database Management
Chapter 1: Introduction
Terms: Data: Database: Database Management System: INTRODUCTION
Chapter 1: Introduction
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Database management systems
Presentation transcript:

Database Management Systems Chapter 1 Introduction

Goal: Build a Business Application Tools: Database Design SQL (queries) Programming SQL Design Program Program Best: Spend your time on design and SQL. Design SQL Worst: Compensate for poor design and limited SQL with programming.

DBMS: Database Management System A collection of data stored in a standardized format, designed to be shared by multiple users. Database Management System Software that defines a database, stores the data, supports a query language, produces reports, and creates data entry screens.

Application Development tasks Feasibility Identify scope, costs, and schedule Analysis Gather information from users Design Define tables, relationships, forms, reports Development Create forms, reports, and help; test Implementation Transfer data, install, train, review time

DBMS Application Design 1. Identify business rules. 2. Define tables and relationships. 3. Create input forms and reports. 4. Combine as applications for users.

DBMS Features/Components Database engine Storage Retrieval Update Query Processor Data dictionary Utilities Security Report writer Forms generator (input screens) Application generator Communications 3GL Interface

DBMS Engine, Security, Utilities Data Tables Product ItemID Description 887 Dog food 946 Cat food Order OrderID ODate 9874 3-3-97 9888 3-9-97 Customer CustomerID Name 1195 Jones 2355 Rojas Database Engine Product ItemID Integer, Unique Description Text, 100 char Customer CustomerID Integer, Unique Name Text, 50 char Data Dictionary Security User Identification Access Rights Concurrency and Lock Manager Utilities Backup and Recovery Administration

Database Tables (Access)

Database Tables (Oracle)

DBMS Query Processor All Data Database Engine Data Dictionary Animal AnimalID Name Category Breed Category CountOfAnimalID Dog 100 Cat 47 Bird 15 Fish 14 Reptile 6 Mammal Spider 3 Field Category AnimalID Table Animal Totals Group By Count Sort Descending Criteria Or

DBMS Report Writer All Data Database Engine Data Dictionary Report Query Processor Report Writer Report Format and Query

Report Writer (Oracle)

DBMS Input Forms All Data Database Engine Data Dictionary Input Form Query Processor Form Builder Input Form Design

DBMS Components All Data Database Engine Data Dictionary Security 3GL Communication Network Database Engine Data Dictionary Security 3GL Connector Query Processor Form Builder Report Writer Program Application Generator

Advantages of Database Approach Minimal data redundancy. Data consistency. Integration of data. Sharing of data. Enforcement of standards. Ease of application development. Uniform security, privacy and integrity. Data independence.

Database Management Approach Data is most important Data defined first Standard format Access through DBMS Queries, Reports, Forms Application Programs 3GL Interface Data independence Change data definition without changing code Alter code without changing data Move/split data without changing code All Data DBMS Program1 Queries Reports Program2

Modifying Data with DBMS Field Name Data Type Description EmployeeID Number Autonumber.. TaxpayerID Text Federal ID LastName Text FirstName Text . . . Phone Text CellPhone Text Cellular . . . Add cell number to employee table Open table definition Add data element If desired, modify reports Use report writer No programming Existing reports, queries, code will all run as before with no changes.

Drawbacks of old File methods Uncontrolled Duplication Wastes space Hard to update all files Inconsistent data Inflexibility Hard to change data Hard to change programs Limited data sharing Poor enforcement of standards Poor programmer productivity Excessive program maintenance

File Method Problems Files defined in program Multiuser problems Cannot read file without definition Hard to find definition Every time you alter file, you must rewrite code Change in a program/file will crash other code Cannot tell which programs use each file Multiuser problems Concurrency Security Access Backup & Restore Efficiency Indexes Programmer talent System Application

Old File Method/3GL Programs Files Pay History Benefits Employee Payroll Pay History Data Definition File 1 … File 2 Benefits Benefits Data Definition File A File 2 File C … Employee Employee Choices

Example of File Method v DBMS COBOL Employee File File Division 01 Employees 02 ID 02 Name 02 Address 01 Department 02 . . . 112 Davy Jones 999 Elm Street . . . 113 Peter Smith 101 Oak St . . . Add to file (e.g.Cell phone) Write code to copy employee file and add empty cell phone slot. Find all programs that use employee file. Modify file definitions. Modify reports (as needed) Recompile, fix new bugs. Easier: Keep two employee files? 02 Cell Phone More programs File Division 01 Employees ...

Examples of Commercial Systems Oracle Informix (Unix) DB2, SQL/DS (IBM) Access (Microsoft) SQL Server (Microsoft +) Many older (Focus, IMS, ...) mySQL ProgresSQL

Hierarchical Database Customers Customer Order Items Ordered Orders To retrieve data, you must start at the top (customer). When you retrieve a customer, you retrieve all nested data. Items Item Description Quantity 998 Dog Food 12 764 Cat Food 11

Network Database Customer Items Order Ordered Items Entry point

Relational Database Customer(CustomerID, Name, … Order(OrderID, CustomerID, OrderDate, … ItemsOrdered(OrderID, ItemID, Quantity, … Items(ItemID, Description, Price, …

Object-Oriented DBMS Order Customer Government Customer Commercial OrderID CustomerID … CustomerID Name … Government Customer ContactName ContactPhone Discount, … NewContact Commercial Customer ContactName ContactPhone … NewContact NewOrder DeleteOrder … Add Customer Drop Customer Change Address OrderItem Item OrderID ItemID … ItemID Description … OrderItem DropOrderItem … New Item Sell Item Buy Item …

20 Base Data Types Numbers Text Date/Time Images Sound Video Integers Reals Text Length International Date/Time Images Bitmap Vector Sound Samples MIDI Video Input Process Output Numbers, Text, and Dates 20 000001100 12 + 8 = 20 000001000 ---------------- 000010100 0010000000000000000 0100000000000001001 0110000011000011011 Images 0111111111111001111 1111111111111011111 1111111111100011111 pitch, Sound volume 8 9 20 7 8 19 5 6 15 time 000001000 000001001 000010100 ..... Video 00101010111 00101010111 00101010111 01010101010 11010101010 01010101010 11010101010 01010101010 11010101010 00101011011 11110100011 11110100011 00101011011 00101011011 11110100011 01010101010 11010101010 00101010111 11010101010 00101010111 11110100011 01010101010 00101011011 11110100011 00101011011

Objects Object Definition--encapsulation. Object Name Properties Methods Most existing DBMS do not handle inheritance. Combine into one table. Use multiple tables and link by primary key. More efficient. Need to add rows to many tables. Class name Customer CustomerID Address Phone AddCustomer DropCustomer Properties Methods Inheritance Commercial Contact VolumeDiscount ComputeDiscount Government Contact BalanceDue BillLateFees AddCustomer Polymorphism

Objects in a Relational Database Separate inherited classes. Link by primary key. Adding a new customer requires new rows in each table. Definitely need cascade delete. Customer CustomerID Address Phone CommercialCustomer GovernmentCustomer CustomerID Contact VolumeDiscount CustomerID Contact BalanceDue

OO Difficulties: Methods IBM Server Unix Server Database Object Personal Computer Database Object How can a method run on different computers? Different processors use different code. Possibility: Java Customer Method: Add New Customer Application Customer Name Address Phone Program code

SQL 99: OO Features Abstract data type User defined data types. Equality and ordering functions. Encapsulation: Public, Private, Protected. Inheritance. Sub-tables that inherit all columns from another table. Persistent Stored Modules (Programming Language). Create methods. SQL and extensions. External language. User defined operators. Triggers for events. External language support Call-Level Interface (CLI) Direct access to DBMS Embedded SQL SQL commands in an external language.

Abstract Data Types GeoPoint Latitude Longitude Altitude GeoLine Procedure: DrawRegion { Find region components. SQL: Select … For each component { Fetch MapLine Set line attributes MapLine.Draw } GeoLine NumberOfPoints ListOfGeoPoints

SQL 99 Sub-Tables CREATE SET TABLE Customer ( Customer CustomerID INTEGER, Address VARCHAR, Phone CHAR(15) ) Customer CustomerID Address Phone CREATE SET TABLE CommercialCustomer ( Contact VARCHAR, VolumeDiscount NUMERIC(5,2) ) UNDER Customer; Inherits columns from Customer. CommercialCustomer Contact VolumeDiscount

SQL 99: Programming Database External Programs Data Types Tables, … Embedded SQL Call-Level Interface Persistent Stored Modules SQL Extended SQL code External language code CURSOR … SELECT … FETCH …

OODBMS Vendors GemStone Systems, Inc. Hewlett-Packard, Inc. (OpenODB) IBEX Corporation, SA. Illustra (Informix, Inc.) Matisse Software, Inc. O2 Technology, Inc. Objectivity, Inc. Object Design, Inc. ONTOS, Inc. POET Software Corporation UniSQL Unisys Corporation (OSMOS) Versant Object Technology

Why don’t all developers use a DBMS? Most new projects (in last 5 years) do use a DBMS Need specialized personnel Programmers Designers/Analysts Database administrators Need to define data for organization Cost PC: $400 - $2000 Large: $100,000 +

How do you sell a DBMS approach? Applications change a lot, but same data. Need for ad hoc questions and queries. Need to reduce development times. Need shared data. Improve quality of data. Enable users to do more development.

Building the Right System: Feasibility Costs Up-front/one-time Software ($ millions !) Hardware Communications Data conversion Studies and Design Training On-going costs Personnel Software upgrades Supplies Support Software & Hardware maintenance Benefits Cost Savings Software maintenance Fewer errors Less data maintenance Less user training Increased Value Better access to data Better decisions Better communication More timely reports Faster reaction to change New products & services Strategic Advantages Lock out competitors Easy to estimate Hard to value

Economic Feasibility: NPV =NPV(B14,$D$7:$D$11)+$D$6 =NPV(rate, range) + starting

Exercise: Build a First Database Employee(EmployeeID, LastName, FirstName, Address, DateHired) 332 Ant Adam 354 Elm 5/5/1964 442 Bono Sonny 765 Pine 8/8/1972 553 Cass Mama 886 Oak 2/2/1985 673 Donovan Michael 421 Willow 3/3/1971 773 Moon Keith 554 Cherry 4/4/1972 847 Morrison Jim 676 Sandalwood 5/5/1968 Client(ClientID, LastName, FirstName, Balance, EmployeeID) 1101 Jones Joe 113.42 442 2203 Smith Mary 993.55 673 2256 Brown Laura 225.44 332 4456 Dieter Jackie 664.90 442 5543 Wodkoski John 984.00 847 6673 Sanchez Paula 194.87 773 7353 Chen Charles 487.34 332 7775 Hagen Fritz 595.55 673 8890 Hauer Marianne 627.39 773 9662 Nguyen Suzie 433.88 553 9983 Martin Mark 983.31 847

Exercise: Report Ant, Adam 5/5/1964 Brown, Laura 225.24 Chen, Charles 487.34 712.58 Bono, Sonny 8/8/1972 Dieter, Jackie 664.90 Jones, Joe 114.32 779.22

Database Management Systems Chapter 2 Database Design

Database System Design User views of data. Conceptual data model. Implementation (relational) data model. Physical data storage. Class diagram that shows business entities, relationships, and rules. List of nicely-behaved tables. Use data normalization to derive the list. Indexes and storage methods to improve performance.

The Need for Design Goal: To produce an information system that adds value for the user Reduce costs Increase sales/revenue Provide competitive advantage Objective: To understand the system To improve it To communicate with users and IT staff Methodology: Build models of the system

Designing Systems Designs are a model of existing & proposed systems They provide a picture or representation of reality They are a simplification Someone should be able to read your design (model) and describe the features of the actual system. You build models by talking with the users Identify processes Identify objects Determine current problems and future needs Collect user documents (views) Break complex systems into pieces and levels

Design Stages Initiation Requirements Analysis Conceptual Design Scope Feasibility Cost & Time estimates Requirements Analysis User Views & Needs Forms Reports Processes & Events Objects & Attributes Conceptual Design Models Data flow diagram Entity Relationships Objects User feedback Physical Design Table definitions Application development Queries Forms Reports Application integration Data storage Security Procedures Implementation Training Purchases Data conversion Installation Evaluation & Review

Initial Steps of Design 1. Identify the exact goals of the system. 2. Talk with the users to identify the basic forms and reports. 3. Identify the data items to be stored. 4. Design the classes (tables) and relationships. 5. Identify any business constraints. 6. Verify the design matches the business rules.

Entities/Classes Customer Name CustomerID LastName FirstName Phone Address City State ZIP Code Properties Methods (optional for database) Add Customer Delete Customer

Definitions Relational database: A collection of tables. Table: A collection of columns (attributes) describing an entity. Individual objects are stored as rows of data in the table. Property (attribute): a characteristic or descriptor of a class or entity. Every table has a primary key. The smallest set of columns that uniquely identifies any row Primary keys can span more than one column (concatenated keys) We often create a primary key to insure uniqueness (e.g., CustomerID, Product#, . . .) called a surrogate key. Primary key Properties Class: Employee Rows/Objects Employee EmployeeID TaxpayerID LastName FirstName HomePhone Address 12512 888-22-5552 Cartom Abdul (603) 323-9893 252 South Street 15293 222-55-3737 Venetiaan Roland (804) 888-6667 937 Paramaribo Lane 22343 293-87-4343 Johnson John (703) 222-9384 234 Main Street 29387 837-36-2933 Stenheim Susan (410) 330-9837 8934 W. Maple

Unified Modeling Language (UML) A relatively new method to design systems. Contains several types of diagrams: Contains several types of diagrams: The class diagram is the most important for database design.

Definitions Entity: Something in the real world that we wish to describe or track. Class: Description of an entity, that includes its attributes (properties) and behavior (methods). Object: One instance of a class with specific data. Property: A characteristic or descriptor of a class or entity. Method: A function that is performed by the class. Association: A relationship between two or more classes. Pet Store Examples Entity: Customer, Merchandise, Sales Class: Customer, Merchandise, Sale Object: Joe Jones, Premium Cat Food, Sale #32 Property: LastName, Description, SaleDate Method: AddCustomer, UpdateInventory, ComputeTotal Association: Each Sale can have only one Customer.

Associations 1 * 1 * * 1 * General One-to-one (1:1) One-to-many (1:M) Many-to-many (M:N) Relationships represent business rules Sometimes common-sense Sometimes unique to an organization Users often know current relationships, rarely future Objects related to objects An employee can work in only one department Many departments can work on many different products Objects related to properties An employee can have only one name Many employees can have the same last name places  1 * Animal Breed 1 * places  Sale Cust. * performs  Tasks Emp 1 * Purch. Order Supplier sent to

Class Diagram 1 … 1 0 … * 0 … * 1 … * Customer Class/Entity (box) . Customer Class/Entity (box) Association/Relationship Lines Minimum 0: optional 1: required Maximum Arrows 1, M 1 … 1 0 … * Order 0 … * 1 … * Item .

Sample Association Rules (Multiplicity) Customer An order must have exactly 1 customer, 1 … 1 Minimum of 1 1 … 1 Maximum of 1 And at least one item. 1 … * Minimum of 1 1 … * Maximum many An item can show up on no orders or many orders. 0 … * Optional (0) 0 … * Maximum many 1 … 1 0 … * Sale 0 … * 1 … * Item

N-ary Associations Employee * * Component * * Product Associations can connect more than two classes. Associations can become classes. Events Many-to-many Need to keep data Example has two many-to-many relationships. We know which components go into each product. We know which employees worked on a product. We need to expand the relationships to show which employees installed which components into each product. Each assembly entry lists one employee, one component, and one product. By appearing on many assembly rows, the many-to-many relationships can still exist. * * Component * * Product

N-ary Association Example Employee Name ... 1 * Assembly 1 * * 1 Component CompID Type Name Product ProductID Type Name Assembly EmployeeID CompID ProductID Multiplicity is defined as the number of items that could appear if the other N-1 objects are fixed. Almost always “many.”

Association Details: Aggregation Sale Item contains * * SaleDate Employee Description Cost Aggregation: the Sale consists of a set of Items being sold.

Association Details: Composition Bicycle Wheels Bicycle Size Model Type … 1 built from 2 Rims Spokes … Size Model Type … 1 1 Wheels Crank 1 ItemID Weight Crank Stem Two ways to display composition. 1 Stem ItemID Weight Size Composition: aggregation where the components become the new object.

Association Details: Generalization Animal DateBorn Name Gender Color ListPrice {disjoint} Mammal Fish Spider LitterSize TailLength Claws FreshWater ScaleCondition Venomous Habitat

Inheritance Class name Properties Methods Inheritance Polymorphism Accounts AccountID CustomerID DateOpened CurrentBalance OpenAccount CloseAccount Class Definition--encapsulation Class Name Properties Methods Inheritance Relationships Generic classes Focus on differences Polymorphism Most existing DBMS do not handle inheritance Properties Methods Inheritance Savings Accounts InterestRate PayInterest Checking Accounts MinimumBalance Overdrafts BillOverdraftFees CloseAccount Polymorphism

Multiple Parents Vehicle Motorized Human Powered On-Road Off-Road or Car Bicycle

Association Details: Reflexive Relationship manages worker * Employee 0…1 manager A reflexive relationship is an association from one class back to itself. In this example, an employee can also be a manager of other employees.

Defining Packages for High-Level Views Purchase Animals Sell Animals Supplier Employee Customer Purchase Merchandise Sell Merchandise

PetStore Overview Class Diagram * Animal * 1 * Animal Purchase * 1 1 1 1 * * 1 Supplier Employee Sale Customer 1 1 * Merchandise Purchase * * * Merchandise * *

Pet Store Class Diagram: Access

Data Types (Domain) Common data types Text Memo/Note Numeric Fixed length 1 to 64 K bytes Variable length 1 to 2 G bytes Memo/Note Numeric Byte 1 byte 0 to 255 Boolean 2 bytes True or False Integer 2 bytes -32,768 to 32,767 (no decimal points) Long 4 bytes -2,147,483,648 to 2,147,483,647 (no decimal points) Floating 4 bytes 1.401298E-45 to 3.402823E38 Double 8 bytes 4.94065645841247E-324 to 1.79769313486232E308 Currency 8 bytes -922,377,203,685,477.5808 to 922,377,203,685,477.5807 Date/Time 8 bytes Jan 1, 100 to Dec 31, 9999 Objects/Raw binary Any type of data supported by the machine Pictures, sound, video . . .

Data Type Sizes Access SQL Server Oracle Text fixed variable Unicode memo Memo char, varchar nchar, nvarchar text CHAR VARCHAR2 NVARCHAR2 LONG Number Byte (8 bits) Integer (16 bits) Long (32 bits) (64 bits) Fixed precision Float Double Currency Yes/No Byte Integer Long NA tinyint smallint int bigint decimal(p,s) real float money bit INTEGER NUMBER(38,0) NUMBER(p,s) NUMBER, FLOAT NUMBER NUMBER(38,4) Date/Time Interval datetime smalldatetime interval year … DATE INTERVAL YEAR … Image OLE Object image LONG RAW, BLOB AutoNumber Identity rowguidcol SEQUENCES ROWID

Computed Attributes Denote computed values with a preceding slash (/). Employee Name DateOfBirth /Age Phone … {Age = Today - DateOfBirth}

Event Examples Trigger ON (QuantityOnHand < 100) Business Event Item is sold. Decrease Inventory count. Data Event Inventory drops below preset level. Order more inventory. User Event User clicks on icon. Send purchase order to supplier. Trigger ON (QuantityOnHand < 100) THEN Notify Purchasing Manager

Event Triggers Property: Current Inventory. Object: Inventory Property: Current Inventory. Function: Update Inventory. Trigger: On Update, call Analyze function. Process: Analyze Inventory Function: Determine need to reorder. Trigger: Generate new order. Business Process: Ship Product Trigger: Inventory Change Executes function/trigger in Inventory object. Order … ShipOrder Inventory … Subtract Analyze Inventory … Subtract Analyze 1. Subtract(Prod, Qty sold) 1.1 Analyze (Product) low 1.1.1 Reorder (Product, quantity) Purchase … Reorder

Design Importance: Large Projects Design is harder on large projects. Communication with multiple users. Communication between IT workers. Need to divide project into pieces for teams. Finding data/components. Staff turnover--retraining. Need to monitor design process. Scheduling. Evaluation. Build systems that can be modified later. Documentation. Communication/underlying assumptions and model.

Large Projects Project Teams Standards Project planning software Divide the work Fit pieces together Evaluate progress Standards Design Templates Actions Events Objects Naming convention Properties Project planning software Schedules Gantt charts CASE tools Groupware tools Track changes Document work Track revisions

CASE Tools Computer-Aided Software Engineering Examples Rational Rose Diagrams (linked) Data Dictionary Teamwork Prototyping Forms Reports Sample data Code generation Reverse Engineering Examples Rational Rose Sterling COOL: Dat COOL: Jex (UML) Oracle IBM

Rolling Thunder: Top-Level Sales Bicycle Assembly Employee Location Purchasing

Rolling Thunder: Sales Customer Bicycle::Bicycle 1…1 CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue BicycleID … CustomerID StoreID 1…1 0…* 0…* Retail Store Customer Transaction StoreID StoreName Phone ContactFirstName ContactLastName Address ZipCode CityID 0…1 CustomerID TransactionDate EmployeeID Amount Description Reference 0…*

Rolling Thunder: Bicycle ModelType Bicycle 1…1 BicycleTubeUsed 1…1 ModelType Description SerialNumber CustomerID ModelType PaintID FrameSize OrderDate StartDate ShipDate ShipEmployee FrameAssembler Painter Construction WaterBottleBrazeOn CustomName LetterStyleID StoreID EmployeeID TopTube ChainStay … 1…* SerialNumber TubeID Quantity 1…1 0…* 0…* Paint 1…1 PaintID ColorName ColorStyle ColorList DateIntroduced DateDiscontinued BikeParts 0…* SerialNumber ComponentID SubstituteID Location Quantity DateInstalled EmployeeID 0…* LetterStyle 1…1 LetterStyleID Description

Rolling Thunder: Assembly Component 1…1 1…1 Bicycle::BikeParts ComponentID ManufacturerID ProductNumber Road Category Length Height Width Description ListPrice EstimatedCost QuantityOnHand GroupComponents 0…* SerialNumber ComponentID ... GroupID ComponentID 0…* 0…* 0…* Groupo GroupID GroupName BikeType 1…1 ComponentName Bicycle:: BicycleTubeUsed TubeMaterial 1…1 1…1 ComponentName AssemblyOrder Description TubeID Material Description Diameter … SerialNumber TubeID Quantity 0…*

Rolling Thunder: Purchasing PurchaseOrder Manufacturer 1…1 1…1 1…1 PurchaseID EmployeeID ManufacturerID TotalList ShippingCost Discount OrderDate ReceiveDate AmountDue ManufacturerID ManufacturerName ContactName Phone Address ZipCode CityID BalanceDue 1…1 0…* ManufacturerTrans 0…* ManufacturerID TransactionDate Reference EmployeeID Amount Description PurchaseItem Assembly:: Component 1…* PurchaseID ComponentID PricePaid Quantity QuantityReceived 1…1 0…* ComponentID ManufacturerID ProductNumber 0…*

Rolling Thunder: Location Sales:: Customer Employee:: Employee City 1…1 1…1 CustomerID … CityID EmployeeID … CityID CityID ZipCode City State AreaCode Population1990 Population1980 Country Latitude Longitude 0…* 1…1 1…1 1…1 Sales:: RetailStore Purchasing:: Manufacturer StoreID … CityID ManufacturerID … CityID 0…* 0…* StateTaxRate 0…1 State TaxRate

Rolling Thunder: Employee 1…1 0…* worker Bicycle:: Bicycle EmployeeID TaxpayerID LastName FirstName HomePhone Address ZipCode CityID DateHired DateReleased CurrentManager SalaryGrade Salary Title WorkArea 1…1 1…1 Purchasing:: PurchaseOrder SerialNumber … EmployeeID ShipEmployee FrameAssembler Painter 0…* PurchaseID … EmployeeID 0…* 0…* manages 0…* 0…* Bicycle:: BikeParts 0…1 manager SerialNumber ComponentID … EmployeeID 0…*

Rolling Thunder Combined CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue Customer TransDate EmployeeID Amount Description Reference CustomerTrans StoreID StoreName ContacFirstName ContactLastName Zipcode RetailStore State TaxRate StateTaxRate SerialNumber ModelType PaintID FrameSize OrderDate StartDate ShipDate ShipEmployee FrameAssembler Painter Construction WaterBottle CustomName LetterStyleID TopTube ChainStay HeadTubeAngle SeatTueAngle ListPrice SalePrice SalesTax SaleState ShipPrice FramePrice ComponentList Bicycle City AreaCode Population1990 Population1980 Country Latitude Longitude ComponentID Paint TaxpayerID HomePhone DateHired DateReleased CurrentManager SalaryGrade Salary Title WorkArea Employee TubeID Quantity BicycleTube MSize TotalLength GroundClearance SeatTubeAngle ModelSize LetterStyle PurchaseID ManufacturerID TotalList ShippingCost Discount ReceiveDate AmountDue PurchaseOrder TubeName Length BikeTubes SubstituteID Location DateInstalled BikeParts PricePaid QuantityReceived PurchaseItem ManufacturerName ContactName Manufacturer CompGroup GroupName BikeType Year EndYear Weight Groupo ProductNumber Road Category Height Width EstimatedCost QuantityOnHand Component TransactionDate ManufacturerTrans Material Diameter Thickness Roundness Stiffness TubeMaterial GroupID GroupCompon ComponentName AssemblyOrder ColorName ColorStyle ColorList DateIntroduced DateDiscontinued Rolling Thunder Combined

Rolling Thunder: Combined

Application Design Simple form based on one table (Animal). But also need lookup tables for Category and Breed.

Appendix: DB Design System http://time-post.com/dbdesign Students and instructors need only an Internet connection and a Java-enabled Web browser. Instructor can sign up free by sending email to: jpost@time-post.com Instructors set up the class and select assignments. Students create accounts and work on the assignments. The system provides immediate feedback in the form of comments and questions for each proposed table.

Appendix: Typical Customer Order

Appendix: DB Design Screen Column list Menu Title box (can be moved) Drawing area Scroll bars to display more of the drawing area Status line Feedback window

Appendix: Adding a Table and a Key Right click in the main drawing window and select the option to Add table. Right click the gray bar at the top of the table, select the Rename table option and enter “Customer” Drag the Generate Key item onto the new Customer table. Right click on the new column name, select the Rename option and enter “CustomerID” 1 2 3 4

Appendix: Two Tables The Customer table has a generated key of CustomerID Each column in the table represents data collected for each customer. Each column depends completely on the primary key. Each Order is identified by a unique OrderID generated by the database system. The CustomerID column is used because the customer number can be used to look up the corresponding data in the Customer table.

Appendix: Relationships—Linking Tables Drag the CustomerID column from the Customer table and drop it on the CustomerID column in the Orders table. For the Min value in Customer, select One instead of Optional. Click the OK button to accept the relationship definition.

Appendix: Creating Problems

Appendix: Detecting Problems (Grading) Double click a line to mark the errors.

Appendix: Testing a Change Attempted fix Make the relationship many-to-many Make OrderID a key But, the score went down!!!

Appendix: A Solution The intermediate table OrderItem converts the many-to-many relationship into two one-to-many relationships. Both OrderID and ItemID are keys, indicating that each order can have many items, and each item can be sold on many orders.

Appendix: Data Types Right click the column names and set the data type.

Database Management Systems Chapter 3 Data Normalization

Why Normalization? Need standardized data definition Advantages of DBMS require careful design Define data correctly and the rest is much easier It especially makes it easier to expand database later Method applies to most models and most DBMS Similar to Entity-Relationship Similar to Objects (without inheritance and methods) Goal: Define tables carefully Save space Minimize redundancy Protect data

Definitions Relational database: A collection of tables. Table: A collection of columns (attributes) describing an entity. Individual objects are stored as rows of data in the table. Property (attribute): a characteristic or descriptor of a class or entity. Every table has a primary key. The smallest set of columns that uniquely identifies any row Primary keys can span more than one column (concatenated keys) We often create a primary key to insure uniqueness (e.g., CustomerID, Product#, . . .) called a surrogate key. Primary key Properties Class: Employee Rows/Objects Employee EmployeeID TaxpayerID LastName FirstName HomePhone Address 12512 888-22-5552 Cartom Abdul (603) 323-9893 252 South Street 15293 222-55-3737 Venetiaan Roland (804) 888-6667 937 Paramaribo Lane 22343 293-87-4343 Johnson John (703) 222-9384 234 Main Street 29387 837-36-2933 Stenheim Susan (410) 330-9837 8934 W. Maple

Keys Primary key Concatenated (or composite) key Every table (object) must have a primary key Uniquely identifies a row (one-to-one) Concatenated (or composite) key Multiple columns needed for primary key Identify repeating relationships (1 : M or M : N) Key columns are underlined First step Collect user documents Identify possible keys: unique or repeating relationships

Notation Table columns Table name Customer(CustomerID, Phone, Name, Address, City, State, ZipCode) Primary key is underlined CustomerID Phone LastName FirstName Address City State Zipcode 1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122 2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101 3 502-777-7575 Washington Elroy 95 Easy Street Smith’s Grove KY 42171 4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122 5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102 6 616-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031 9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721 10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127

Identifying Key Columns Orders Each order has only one customer. So Customer is not part of the key. OrderID Date Customer 8367 5-5-04 6794 8368 5-6-04 9263 OrderItems OrderID Item Quantity 8367 229 2 8367 253 4 8367 876 1 8368 555 4 8368 229 1 Each order has many items. Each item can appear on many orders. So OrderID and Item are both part of the key.

Surrogate Keys Real world keys sometimes cause problems in a database. Example: Customer Avoid phone numbers: people may not notify you when numbers change. Avoid SSN (privacy and most businesses are not authorized to ask for verification, so you could end up with duplicate values) Often best to let the DBMS generate unique values Access: AutoNumber SQL Server: Identity Oracle: Sequences (but require additional programming) Drawback: Numbers are not related to any business data, so the application needs to hide them and provide other look up mechanisms.

Common Order System Customer Salesperson 1 1 * Order * 1 * OrderItem * Customer(CustomerID, Name, Address, City, Phone) Salesperson(EmployeeID, Name, Commission, DateHired) Order(OrderID, OrderDate, CustomerID, EmployeeID) OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description, ListPrice)

Client Billing Example Client(ClientID, Name, Address, BusinessType) Partner(PartnerID, Name, Speciality, Office, Phone) PartnerAssignment(PartnerID, ClientID, DateAcquired) Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) Each partner can be assigned many clients. Each client can be assigned to many partners.

Client Billing--Different Rules Client(ClientID, Name, Address, BusinessType) Partner(PartnerID, Name, Speciality, Office, Phone) PartnerAssignment(PartnerID, ClientID, DateAcquired) Billing(ClientID, PartnerID, Date/Time, Item, Description, Hours, AmountBilled) combine Each client is assigned to only one partner. Cannot key PartnerID. Combine Client and PartnerAssignment tables, since they have the same key.

Client Billing--New Assumptions ClientID PartnerID Date/Time Item Description Hours AmountBilled 115 963 8-4-04 10:03 967 Stress analysis 2 $500 295 967 8-5-04 11:15 754 New Design 3 $750 115 963 8-8-04 09:30 967 Stress analysis 2.5 $650 More realistic assumptions for a large firm. Each Partner may work with many clients. Each client may work with many partners. Each partner and client may work together many times. The identifying feature is the date/time of the service. What happens if you do not include Date/Time as a key?

Sample: Video Database Possible Keys Repeating section

Initial Objects Customers Videos RentalTransaction VideosRented Key: Assign a CustomerID Sample Properties Name Address Phone Videos Key: Assign a VideoID Title RentalPrice Rating Description RentalTransaction Event/Relationship Key: Assign TransactionID Sample Properties CustomerID Date VideosRented Event/Repeating list Keys: TransactionID + VideoID VideoCopy#

Initial Form Evaluation RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) Collect forms from users Write down properties Find repeating groups ( . . .) Look for potential keys: key Identify computed values Notation makes it easier to identify and solve problems Results equivalent to diagrams, but will fit on one or two pages

Problems with Repeating Sections RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) Storing data in this raw form would not work very well. For example, repeating sections will cause problems. Note the duplication of data. Also, what if a customer has not yet checked out a movie--where do we store that customer’s data? Repeating Section Causes duplication TransID RentDate CustomerID LastName Phone Address VideoID Copy# Title Rent 1 4/18/04 3 Washington 502-777-7575 95 Easy Street 1 2 2001: A Space Odyssey $1.50 1 4/18/04 3 Washington 502-777-7575 95 Easy Street 6 3 Clockwork Orange $1.50 2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 8 1 Hopscotch $1.50 2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 2 1 Apocalypse Now $2.00 2 4/30/04 7 Lasater 615-888-4474 67 S. Ray Drive 6 1 Clockwork Orange $1.50 3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 9 1 Luggage Of The Gods $2.50 3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 15 1 Fabulous Baker Boys $2.00 3 4/18/04 8 Jones 615-452-1162 867 Lakeside Drive 4 1 Boy And His Dog $2.50 4 4/18/04 3 Washington 502-777-7575 95 Easy Street 3 1 Blues Brothers $2.00 4 4/18/04 3 Washington 502-777-7575 95 Easy Street 8 1 Hopscotch $1.50 4 4/18/04 3 Washington 502-777-7575 95 Easy Street 13 1 Surf Nazis Must Die $2.50 4 4/18/04 3 Washington 502-777-7575 95 Easy Street 17 1 Witches of Eastwick $2.00

Problems with Repeating Sections Name Phone Address City State ZipCode Customer Rentals Store repeating data Allocate space How much? Can’t be short Wasted space e.g., How many videos will be rented at one time? A better definition eliminates this problem. VideoID Copy# Title Rent 1. 6 1 Clockwork Orange 1.50 2. 8 2 Hopscotch 1.50 3. 4. 5. {Unused Space} Not in First Normal Form

First Normal Form Remove repeating sections RentalForm(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode, (VideoID, Copy#, Title, Rent ) ) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) RentalLine(TransID, VideoID, Copy#, Title, Rent ) Remove repeating sections Split into two tables Bring key from main and repeating section RentalLine(TransID, VideoID, Copy#, . . .) Each transaction can have many videos (key VideoID) Each video can be rented on many transactions (key TransID) For each TransID and VideoID, only one Copy# (no key on Copy#)

Nested Repeating Sections Table (Key1, . . . (Key2, . . . (Key3, . . .) ) ) Table1(Key1, . . .) TableA (Key1,Key2 . . .(Key3, . . .) ) Table2 (Key1, Key2 . . .) Table3 (Key1, Key2, Key3, . . .) Nested: Table (Key1, aaa. . . (Key2, bbb. . . (Key3, ccc. . .) ) ) First Normal Form (1NF) Table1(Key1, aaa . . .) Table2(Key1, Key2, bbb . .) Table3(Key1, Key2, Key3, ccc. . .)

First Normal Form Problems (Data) TransID RentDate CustID Phone LastName FirstName Address City State ZipCode 1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171 2 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148 3 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031 4 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171 1NF splits repeating groups Still have problems Replication Hidden dependency: If a video has not been rented yet, then what is its title? TransID VideoID Copy# Title Rent 1 1 2 2001: A Space Odyssey $1.50 1 6 3 Clockwork Orange $1.50 2 8 1 Hopscotch $1.50 2 2 1 Apocalypse Now $2.00 2 6 1 Clockwork Orange $1.50 3 9 1 Luggage Of The Gods $2.50 3 15 1 Fabulous Baker Boys $2.00 3 4 1 Boy And His Dog $2.50 4 3 1 Blues Brothers $2.00 4 8 1 Hopscotch $1.50 4 13 1 Surf Nazis Must Die $2.50 4 17 1 Witches of Eastwick $2.00

Second Normal Form Definition Depends on both TransID and VideoID RentalLine(TransID, VideoID, Copy#, Title, Rent) Depend only on VideoID Each non-key column must depend on the entire key. Only applies to concatenated keys Some columns only depend on part of the key Split those into a new table. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. If you change part of a key and the questionable property does not change, then the table is not in 2NF.

Second Normal Form Example RentalLine(TransID, VideoID, Copy#, Title, Rent) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent) Title depends only on VideoID Each VideoID can have only one title Rent depends on VideoID This statement is actually a business rule. It might be different at different stores. Some stores might charge a different rent for each video depending on the day (or time). Each non-key column depends on the whole key.

Second Normal Form Example (Data) VideosRented(TransID, VideoID, Copy#) TransID VideoID Copy# 1 1 2 1 6 3 2 2 1 2 6 1 2 8 1 3 4 1 3 9 1 3 15 1 4 3 1 4 8 1 4 13 1 4 17 1 Videos(VideoID, Title, Rent) VideoID Title Rent 1 2001: A Space Odyssey $1.50 2 Apocalypse Now $2.00 3 Blues Brothers $2.00 4 Boy And His Dog $2.50 5 Brother From Another Planet $2.00 6 Clockwork Orange $1.50 7 Gods Must Be Crazy $2.00 8 Hopscotch $1.50 (Unchanged) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode)

Second Normal Form Problems (Data) RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) TransID RentDate CustID Phone LastName FirstName Address City State ZipCode 1 4/18/04 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171 2 4/30/04 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148 3 4/18/04 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031 4 4/18/042 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171 Even in 2NF, problems remain Replication Hidden dependency If a customer has not rented a video yet, where do we store their personal data? Solution: split table.

Third Normal Form Definition Depend on TransID RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Depend only on CustomerID Each non-key column must depend on nothing but the key. Some columns depend on columns that are not part of the key. Split those into a new table. Example: Customers name does not change for every transaction. Dependence (definition) If given a value for the key you always know the value of the property in question, then that property is said to depend on the key. If you change the key and the questionable property does not change, then the table is not in 3NF.

Third Normal Form Example RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode) Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode ) Customer attributes depend only on Customer ID Split them into new table (Customer) Remember to leave CustomerID in Rentals table. We need to be able to reconnect tables. 3NF is sometimes easier to see if you identify primary objects at the start--then you would recognize that Customer was a separate object.

Third Normal Form Example Data RentalForm2(TransID, RentDate, CustomerID, Phone, Name, Address, City, State, ZipCode Rentals(TransID, RentDate, CustomerID ) TransID RentDate CustomerID 1 4/18/04 3 2 4/30/04 7 3 4/18/04 8 4 4/18/04 3 Customers(CustomerID, Phone, Name, Address, City, State, ZipCode ) CustomerID Phone LastName FirstName Address City State ZipCode 1 502-666-7777 Johnson Martha 125 Main Street Alvaton KY 42122 2 502-888-6464 Smith Jack 873 Elm Street Bowling Green KY 42101 3 502-777-7575 Washington Elroy 95 Easy Street Smith's Grove KY 42171 4 502-333-9494 Adams Samuel 746 Brown Drive Alvaton KY 42122 5 502-474-4746 Rabitz Victor 645 White Avenue Bowling Green KY 42102 6 615-373-4746 Steinmetz Susan 15 Speedway Drive Portland TN 37148 7 615-888-4474 Lasater Les 67 S. Ray Drive Portland TN 37148 8 615-452-1162 Jones Charlie 867 Lakeside Drive Castalian Springs TN 37031 9 502-222-4351 Chavez Juan 673 Industry Blvd. Caneyville KY 42721 10 502-444-2512 Rojo Maria 88 Main Street Cave City KY 42127 VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent)

Third Normal Form Tables (3NF) Rentals 1 TransID RentDate CustomerID * Customers VideosRented 1 * CustomerID Phone LastName FirstName Address City State ZipCode TransID VideoID Copy# * Videos 1 VideoID Title Rent Rentals(TransID, RentDate, CustomerID ) Customers(CustomerID, Phone, Name, Address, City, State, ZipCode ) VideosRented(TransID, VideoID, Copy#) Videos(VideoID, Title, Rent)

3NF Rules/Procedure Split out repeating sections Be sure to include a key from the parent section in the new piece so the two parts can be recombined. Verify that the keys are correct Is each row uniquely identified by the primary key? Are one-to-many and many-to-many relationships correct? Check “many” for keyed columns and “one” for non-key columns. Make sure that each non-key column depends on the whole key and nothing but the key. No hidden dependencies.

Checking Your Work (Quality Control) Look for one-to-many relationships. Many side should be keyed (underlined). e.g., VideosRented(TransID, VideoID, . . .). Check each column and ask if it should be 1 : 1 or 1: M. If add a key, renormalize. Verify no repeating sections (1NF) Check 3NF Check each column and ask: Does it depend on the whole key and nothing but the key? Verify that the tables can be reconnected (joined) to form the original tables (draw lines). Each table represents one object. Enter sample data--look for replication.

Boyce-Codd Normal Form (BCNF) Hidden dependency Example: Employee-Specialty(E#, Specialty, Manager) Is in 3NF now. Business rules. Employee may have many specialties. Each specialty has many managers. Each manager has only one specialty. Employee has only one manager for each specialty. Problem is hidden relationship between manager and specialty. Need separate table for manager. But then we don’t need to repeat specialty. In real life, probably accept the duplication (specialty listed in both tables). Employee-Specialty(E#, Specialty, Manager) Employee(E#, Manager) Manager(Manager, Specialty) Employee(E#, Specialty, Manager) Manager(Manager, Specialty) acceptable

Fourth Normal Form (Keys) EmployeeTasks(EID, Specialty, ToolID) Technically, if you keyed every column, any table would be in 3NF, which does not solve any problems. In some cases, there are hidden relationships between key properties. Example: EmployeeTasks(EID, Specialty, ToolID) In 3NF (BCNF) now. Business Rules Each employee has many specialties. Each employee has many tools. Tools and specialties are unrelated EmployeeSpecialty(EID, Specialty) EmployeeTools(EID, ToolID)

Domain-Key Normal Form (DKNF) DKNF is ultimate goal: table will always be in 4NF, etc. Drawbacks No mechanical method to get to DKNF No guarantee a table can be converted to DKNF Rules Table => one topic All business rules explicitly written as domain constraints and key relationships. No hidden relationships. Employee(EID, Name, speciality) Business rule: An employee can have many specialties. So example is not in DKNF, since EID is not unique.

DKNF Examples Employee(EID, Name, Speciality) Business rule: An employee can have many specialties. Example is not in DKNF: EID is not unique. Employee(EID, Name, Speciality) Business rule: An employee has one name. Example is not DKNF: hidden relationship between EID and name. Employee(EID, Name) EmployeeSpecialty(EID, Speciality)

DKNF Examples Student(SID, Name, Major, Advisor) Advisor(FID, Name, Office, Discipline) Business rules: A student can have many advisors, but only one for each major. Faculty can only be advisors for their discipline. Not in DKNF: Primary key and hidden rule. Student(SID, Name) Advisors(SID, Major, FID) Faculty(FID, Name, Office, Discipline) DKNF: Foreign key (Major <--> Discipline) makes advisor rule explicit.

No Hidden Dependencies The simple normalization rules: Remove repeating sections Each non-key column must depend on the whole key and nothing but the key. There must be no hidden dependencies. Solution: Split the table. Make sure you can rejoin the two pieces to recreate the original data relationships. For some hidden dependencies within keys, double-check the business assumption to be sure that it is realistic. Sometimes you are better off with a more flexible assumption.

Data Rules and Integrity Order Simple business rules Limits on data ranges Price > 0 Salary < 100,000 DateHired > 1/12/1995 Choosing from a set Gender = M, F, Unknown Jurisdiction=City, County, State, Federal Referential Integrity Foreign key values in one table must exist in the master table. Order(O#, Odate, C#,…) C# must exist in the customer table. O# Odate C# … 1173 1-4-97 321 1174 1-5-97 938 1185 1-8-97 337 1190 1-9-97 321 1192 1-9-97 776 No data for this customer yet! Customer C# Name Phone … 321 Jones 9983- 337 Sanchez 7738- 938 Carson 8738-

SQL Foreign Key (Oracle, SQL Server) CREATE TABLE Order ( OID NUMBER(9) NOT NULL, Odate DATE, CID NUMBER(9), CONSTRAINT pk_Order PRIMARY KEY (OID), CONSTRAINT fk_OrderCustomer FOREIGN KEY (CID) REFERENCES Customer (CID) ON DELETE CASCADE )

Effect of Business Rules Key business rules: A player can play on only one team. There is one referee per match.

Business Rules 1 There is one referee per match. A player can play on only one team. Match(MatchID, DatePlayed, Location, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB, TeamID) PlayerStats(MatchID, PlayerID, Points, Penalties) RefID and TeamID are not keys in the Match and Team tables, because of the one-to-one rules.

Business Rules 2 There can be several referees per match. A player can play on only several teams (substitute), but only on one team per match. Match(MatchID, DatePlayed, Location, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB, TeamID) PlayerStats(MatchID, PlayerID, Points, Penalties) To handle the many-to-many relationship, we need to make RefID and TeamID keys. But if you leave them in the same tables, the tables are not in 3NF. DatePlayed does not depend on RefID. Player Name does not depend on TeamID.

Business Rules 2: Normalized There can be several referees per match. A player can play on only several teams (substitute), but only on one team per match. Match(MatchID, DatePlayed, Location) RefereeMatch(MatchID, RefID) Score(MatchID, TeamID, Score) Referee(RefID, Phone, Address) Team(TeamID, Name, Sponsor) Player(PlayerID, Name, Phone, DoB) PlayerStats(MatchID, PlayerID, TeamID, Points, Penalties)

Converting a Class Diagram to Normalized Tables Manager * 1 * 1 1 * Purchase Order Supplier Employee * * Item subtypes Raw Materials Assembled Components Office Supplies

One-to-Many Relationships 1 * * 1 Purchase Order Supplier Employee Supplier(SID, Name, Address, City, State, Zip, Phone) Employee(EID, Name, Salary, Address, …) PurchaseOrder(POID, Date, SID, EID) The many side becomes a key (underlined). Each PO has one supplier and employee. (Do not key SID or EID) Each supplier can receive many POs. (Key PO) Each employee can place many POs. (Key PO)

One-to-Many Sample Data Supplier Purchase Order POID Date SID EID 22234 9-9-2004 5676 221 22235 9-10-2004 554 22236 7831 22237 9-11-2004 8872 335 Employee

Many-to-Many Relationships Purchase Order Purchase Order PurchaseOrder(POID, Date, SID, EID) 1 1 * * * POItem(POID, ItemID, Quantity, PricePaid) POItem * * * 1 1 Item Item(ItemID, Description, ListPrice) Item Each POID can have many Items (key/underline ItemID). Each ItemID can be on many POIDs (key POID). Need the new intermediate table (POItem) because: You cannot put ItemID into PurchaseOrder because Date, SID, and EID do not depend on the ItemID. You cannot put POID into Item because Description and ListPrice do not depend on POID.

Many-to-Many Sample Data Purchase Order POID Date SID EID 22234 9-9-2004 5676 221 22235 9-10-2004 554 22236 7831 22237 9-11-2004 8872 335 POItem Item

N-ary Associations 1 * 1 * 1 * Employee Name ... Assembly Component CompID Type Name Product ProductID Type Name Assembly EmployeeID CompID ProductID

Composition Bicycle Bicycle Components Wheels Crank Stem Size Model Type … SerialNumber ModelType WheelID CrankID StemID … Components Wheels ComponentID Category Description Weight Cost Crank Stem

Generalization or Subtypes Item Raw Materials Assembled Components Office Supplies Item(ItemID, Description, ListPrice) RawMaterials(ItemID, Weight, StrengthRating) AssembledComponents(ItemID, Width, Height, Depth) OfficeSupplies(ItemID, BulkQuantity, Discount) Add new tables for each subtype. Use the same key as the generic type (ItemID)--one-to-one relationship. Add the attributes specific to each subtype.

Subtypes Sample Data Item RawMaterials AssembledComponents OfficeSupplies

Recursive Relationships Manager * 1 Employee Employee(EID, Name, Salary, Address, Manager) Employee Add a manager column that contains Employee IDs. An employee can have only one manager. (Manager is not a key.) A manager can supervise many employees. (EID is a key.)

Normalization Examples Possible topics Auto repair Auto sales Department store Hair stylist HRM department Law firm Manufacturing National Park Service Personal stock portfolio Pet shop Restaurant Social club Sports team

Multiple Views & View Integration Collect multiple views Documents Reports Input forms Create normalized tables from each view Combine the views into one complete model. Keep meta-data in a data dictionary Type of data Size Volume Usage Example Federal Emergency Management Agency (FEMA). Disaster planning and relief. Make business assumptions as necessary, but try to keep them simple.

The Pet Store: Sales Form Sales(SaleID, Date, CustomerID, Name, Address, City, State, Zip, EmployeeID, Name, (AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice, SalePrice), (ItemID, Description, Category, ListPrice, SalePrice, Quantity))

The Pet Store: Purchase Animals AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, Name, Contact, Phone, Address, City, State, Zip, EmployeeID, Name, Phone, DateHired, (AnimalID, Name, Category, Breed, Gender, Registration, Cost), ShippingCost)

The Pet Store: Purchase Merchandise MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SupplierID, Name, Contact, Phone, Address, City, State, Zip, EmployeeID, Name, HomePhone, (ItemID, Description, Category, Price, Quantity, QuantityOnHand), ShippingCost)

Pet Store Normalization Sale(SaleID, Date, CustomerID, EmployeeID) SaleAnimal(SaleID, AnimalID, SalePrice) SaleItem(SaleID, ItemID, SalePrice, Quantity) Customer(CustomerID, Name, Address, City, State, Zip) Employee(EmployeeID, Name) Animal(AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice) Merchandise(ItemID, Description, Category, ListPrice) AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, EmpID, ShipCost) AnimalOrderItem(OrderID, AnimalID, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone, DateHired) Animal(AnimalID, Name, Category, Breed, Gender, Registration, Cost) MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SID, EmpID, ShipCost) MerchandiseOrderItem(PONumber, ItemID, Quantity, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Employee(EmployeeID, Name, Phone) Merchandise(ItemID, Description, Category, QuantityOnHand)

Pet Store View Integration Sale(SaleID, Date, CustomerID, EmployeeID) SaleAnimal(SaleID, AnimalID, SalePrice) SaleItem(SaleID, ItemID, SalePrice, Quantity) Customer(CustomerID, Name, Address, City, State, Zip) Employee(EmployeeID, Name, Phone, DateHired) Animal(AnimalID, Name, Category, Breed, DateOfBirth, Gender, Registration, Color, ListPrice, Cost) Merchandise(ItemID, Description, Category, ListPrice, QuantityOnHand) AnimalOrder(OrderID, OrderDate, ReceiveDate, SupplierID, EmpID, ShipCost) AnimalOrderItem(OrderID, AnimalID, Cost) Supplier(SupplierID, Name, Contact, Phone, Address, City, State, Zip) Animal(AnimalID, Name, Category, Breed, Gender, Registration, Cost) MerchandiseOrder(PONumber, OrderDate, ReceiveDate, SID, EmpID, ShipCost) MerchandiseOrderItem(PONumber, ItemID, Quantity, Cost) Employee(EmployeeID, Name, Phone) Merchandise(ItemID, Description, Category, QuantityOnHand)

Pet Store Class Diagram

Rolling Thunder Integration Example Bicycle Assembly form. The main EmployeeID control is not stored directly, but the value is entered in the assembly column when the employee clicks the column.

Initial Tables for Bicycle Assembly SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube, PaintID, PaintColor, ColorStyle, ColorList, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate, (Tube, TubeType, TubeMaterial, TubeDescription), (CompCategory, ComponentID, SubstID, ProdNumber, EmpInstall, DateInstall, Quantity, QOH) ) Bicycle(SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube, PaintID, ColorStyle, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate) Paint(PaintID, ColorList) BikeTubes(SerialNumber, TubeID, Quantity) TubeMaterial(TubeID, Type, Material, Description) BikeParts(SerialNumber, ComponentID, SubstID, Quantity, DateInstalled, EmpInstalled) Component(ComponentID, ProdNumber, Category, QOH)

Rolling Thunder: Purchase Order

RT Purchase Order: Initial Tables PurchaseOrder(PurchaseID, PODate, EmployeeID, FirstName, LastName, ManufacturerID, MfgName, Address, Phone, CityID, CurrentBalance, ShipReceiveDate, (ComponentID, Category, ManufacturerID, ProductNumber, Description, PricePaid, Quantity, ReceiveQuantity, ExtendedValue, QOH, ExtendedReceived), ShippingCost, Discount) PurchaseOrder(PurchaseID, PODate, EmployeeID, ManufacturerID, ShipReceiveDate, ShippingCost, Discount) Employee(EmployeeID, FirstName, LastName) Manufacturer(ManufacturerID, Name, Address, Phone, Address, CityID, CurrentBalance) City(CityID, Name, ZipCode) PurchaseItem(PurchaseID, ComponentID, Quantity, PricePaid, ReceivedQuantity) Component(ComponentID, Category, ManufacturerID, ProductNumber, Description, QOH)

Rolling Thunder: Transactions

RT Transactions: Initial Tables ManufacturerTransactions(ManufacturerID, Name, Phone, Contact, BalanceDue, (TransDate, Employee, Amount, Description) ) Manufacturer(ManufacturerID, Name, Phone, Contact, BalanceDue) ManufacturerTransaction(ManufacturerID, TransactionDate, EmployeeID, Amount, Description)

Rolling Thunder: Components

RT Components: Initial Tables ComponentForm(ComponentID, Product, BikeType, Category, Length, Height, Width, Weight, ListPrice,Description, QOH, ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID, City, State, AreaCode) Component(ComponentID, ProductNumber, BikeType, Category, Length, Height, Width,Weight, ListPrice, Description, QOH, ManufacturerID) Manufacturer(ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID) City(CityID, City, State, ZipCode, AreaCode)

RT: Integrating Tables Duplicate Manufacturer tables: PO Mfr(ManufacturerID, Name, Address, Phone, CityID, CurrentBalance) Mfg Mfr(ManufacturerID, Name, Phone, Contact, BalanceDue) Comp Mfr(ManufacturerID, Name, Phone, Contact, Address, ZipCode, CityID) Note that each form can lead to duplicate tables. Look for tables with the same keys, but do not expect them to be named exactly alike. Find all of the data and combine it into one table. Manufacturer(ManufacturerID, Name, Contact, Address, Phone, Address, CityID, |ZipCode, CurrentBalance)

RT Example: Integrated Tables Bicycle(SerialNumber, Model, Construction, FrameSize, TopTube, ChainStay, HeadTube, SeatTube, PaintID, ColorStyle, CustomName, LetterStyle, EmpFrame, EmpPaint, BuildDate, ShipDate) Paint(PaintID, ColorList) BikeTubes(SerialNumber, TubeID, Quantity) TubeMaterial(TubeID, Type, Material, Description) BikeParts(SerialNumber, ComponentID, SubstID, Quantity, DateInstalled, EmpInstalled) Component(ComponentID, ProductNumber, BikeType, Category, Length, Height, Width, Weight, ListPrice, Description, QOH, ManufacturerID) PurchaseOrder(PurchaseID, PODate, EmployeeID, ManufacturerID, ShipReceiveDate, ShippingCost, Discount) PurchaseItem(PurchaseID, ComponentID, Quantity, PricePaid, ReceivedQuantity) Employee(EmployeeID, FirstName, LastName) Manufacturer(ManufacturerID, Name, Contact, Address, Phone, CityID, ZipCode, CurrentBalance) ManufacturerTransaction(ManufacturerID, TransactionDate, EmployeeID, Amount, Description, Reference) City(CityID, City, State, ZipCode, AreaCode)

Rolling Thunder Tables CustomerID Phone FirstName LastName Address ZipCode CityID BalanceDue Customer TransDate EmployeeID Amount Description Reference CustomerTrans StoreID StoreName ContacFirstName ContactLastName Zipcode RetailStore State TaxRate StateTaxRate SerialNumber ModelType PaintID FrameSize OrderDate StartDate ShipDate ShipEmployee FrameAssembler Painter Construction WaterBottle CustomName LetterStyleID TopTube ChainStay HeadTubeAngle SeatTueAngle ListPrice SalePrice SalesTax SaleState ShipPrice FramePrice ComponentList Bicycle City AreaCode Population1990 Population1980 Country Latitude Longitude ComponentID Paint TaxpayerID HomePhone DateHired DateReleased CurrentManager SalaryGrade Salary Title WorkArea Employee TubeID Quantity BicycleTube MSize TotalLength GroundClearance SeatTubeAngle ModelSize LetterStyle PurchaseID ManufacturerID TotalList ShippingCost Discount ReceiveDate AmountDue PurchaseOrder TubeName Length BikeTubes SubstituteID Location DateInstalled BikeParts PricePaid QuantityReceived PurchaseItem ManufacturerName ContactName Manufacturer CompGroup GroupName BikeType Year EndYear Weight Groupo ProductNumber Road Category Height Width EstimatedCost QuantityOnHand Component TransactionDate ManufacturerTrans Material Diameter Thickness Roundness Stiffness TubeMaterial GroupID GroupCompon ComponentName AssemblyOrder ColorName ColorStyle ColorList DateIntroduced DateDiscontinued

View Integration (FEMA Example 1) Team Roster Team# Date Formed Leader Home Base Name Fax Phone Response time (days) Address, C,S,Z Home phone Total Salary Team Members/Crew ID Name Home phone Specialty DoB SSN Salary This first form is kept for each team that can be called on to help in emergencies.

View Integration (FEMA Example 2) Disaster Name HQ Location On-Site Problem Report Local Agency Commander Political Contact Date Reported Assigned Problem# Severity Problem Description Reported By: Specialty Specialty Rating Verified By: Specialty Specialty Rating SubProblem Details Total Est. Cost Sub Prob# Category Description Action Est. Cost Major problems are reported to HQ to be prioritized and scheduled for correction.

View Integration (FEMA Example 3) Location Damage Analysis Date Evaluated LocationID, Address Team Leader Title Repair Priority Latitude, Longitude Cellular Phone Damage Description Item Loss Total Estimated Damage Total Room Damage Descrip. Damage% Item Value $Loss Item Value $Loss On-site teams examine buildings and file a report on damage at that location.

View Integration (FEMA Example 3a) Location Analysis(LocationID, MapLatitude, MapLongitude, Date, Address, Damage, PriorityRepair, Leader, LeaderPhone, LeaderTitle, (Room, Description, PercentDamage, (Item, Value, Loss)))

View Integration (FEMA Example 4) Task Completion Report Date Disaster Name Disaster Rating HQ Phone Problem# Supervisor Date Total Expenses SubProblem Team# Team Specialty CompletionStatus Comment Expenses SubProblem Team# Team Specialty CompletionStatus Comment Expenses Teams file task completion reports. If a task is not completed, the percentage accomplished is reported as the completion status.

View Integration (FEMA Example 4a) TasksCompleted(Date, DisasterName, DisasterRating, HQPhone, (Problem#, Supervisor, (SubProblem, Team#, CompletionStatus, Comments, Expenses))

DBMS Table Definition Enter Tables Column Properties Relationships Columns Keys Data Types Text Memo Number Byte Integer, Long Single, Double Date/Time Currency AutoNumber (Long) Yes/No OLE Object Descriptions Column Properties Format Input Mask Caption Default Validation Rule Validation Text Required & Zero Length Indexed Relationships One-to-One One-to-Many Referential Integrity Cascade Update/Delete Define before entering data

Table Definition in Access Key Numeric Subtypes or text length

Graphical Table Definition in Oracle Key Minimal ability to change the table definition later.

Graphical Table Definition in SQL Server Keys

SQL Table Definition CREATE TABLE Animal ( AnimalID INTEGER, Name VARCHAR2(50), Category VARCHAR2(50), Breed VARCHAR2(50), DateBorn DATE, Gender VARCHAR2(50) CHECK (Gender='Male' Or Gender='Female' Or Gender='Unknown' Or Gender Is Null), Registered VARCHAR2(50), Color VARCHAR2(50), ListPrice NUMBER(38,4) DEFAULT 0, Photo LONG RAW, ImageFile VARCHAR2(250), ImageHeight INTEGER, ImageWidth INTEGER, CONSTRAINT pk_Animal PRIMARY KEY (AnimalID), CONSTRAINT fk_BreedAnimal FOREIGN KEY (Category,Breed) REFERENCES Breed(Category,Breed) ON DELETE CASCADE, CONSTRAINT fk_CategoryAnimal FOREIGN KEY (Category) REFERENCES Category(Category) ON DELETE CASCADE );

Oracle Databases For Oracle and SQL Server, it is best to create a text file that contains all of the SQL statements to create the table. It is usually easier to modify the text table definition. The text file can be used to recreate the tables for backup or transfer to another system. To make major modifications to the tables, you usually create a new table, then copy the data from the old table, then delete the old table and rename the new one. It is much easier to create the new table using the text file definition. Be sure to specify Primary Key and Foreign Key constraints. Be sure to create tables in the correct order—any table that appears in a Foreign Key constraint must first be created. For example, create Customer before creating Order. In Oracle, to substantially improve performance, issue the following command once all tables have been created: Analyze table Animal compute statistics;

Data Volume Estimate the total size of the database. For each table. Current. Future growth. Guide for hardware and software purchases. For each table. Use data types to estimate the number of bytes used for each row. Multiply by the estimated number of rows. Add the value for each table to get the total size. For concatenated keys (and similar tables). OrderItems(O#, Item#, Qty) Hard to “know” the total number of items ordered. Start with the total number of orders. Multiply by the average number of items on a typical order. Need to know time frame or how long to keep data. Do we store all customer data forever? Do we keep all orders in the active database, or do we migrate older ones?

Data Volume Example Customer(C#, Name, Address, City, State, Zip) Row: 4 + 15 + 25 + 20 + 2 + 10 = 76 Order(O#, C#, Odate) Row: 4 + 4 + 8 = 16 OrderItem(O#, P#, Quantity, SalePrice) Row: 4 + 4 + 4 + 8 = 20 Business rules Three year retention. 1000 customers. Average 10 orders per customer per year. Average 5 items per order. Customer 76 * 1000 76,000 Order 16 * 30,000 480,000 OrderItem 20 * 150,000 3,000,000 Total 3,556,000

Appendix: Formal Definitions: Terms Informal Relation A set of attributes with data that changes over time. Often denoted R. Table Attribute Characteristic with a real-world domain. Subsets of attributes are multiple columns, often denoted X or Y. Column Tuple The data values returned for specific attribute sets are often denoted as t[X] Row of data Schema Collection of tables and constraints/relationships Functional dependency X  Y Business rule dependency

Appendix: Functional Dependency Derives from a real-world relationship/constraint. Denoted X  Y for sets of attributes X and Y Holds when any rows of data that have identical values for X attributes also have identical values for their Y attributes: If t1[X] = t2[X], then t1[Y] = t2[Y] X is also known as a determinant if X is non-trivial (not a subset of Y).

Appendix: Keys Keys are attributes that are ultimately used to identify rows of data. A key K (sometimes called candidate key) is a set of attributes With FD K  U where U is all other attributes in the relation If K’ is a subset of K, then there is no FD K’  U A set of key attributes functionally determines all other attributes in the relation, and it is the smallest set of attributes that will do so (there is no smaller subset of K that determines the columns.)

Appendix: First Normal Form A relation is in first normal form (1NF) if and only if all attributes are atomic. Atomic attributes are single valued, and cannot be composite, multi-valued or nested relations. Example: Customer(CID, Name: First + Last, Phones, Address) CID Name: First + Last Phones Address 111 Joe Jones 111-2223 111-3393 112-4582 123 Main

Appendix: Second Normal Form A relation is in second normal form (2NF) if it is in 1NF and each non-key attribute is fully functionally dependent on the primary key. K  Ai for each non-key attribute Ai That is, there is no subset K’ such that K’  Ai Example: OrderProduct(OrderID, ProductID, Quantity, Description) OrderID ProductID Quantity Description 32 15 1 Blue Hose 16 2 Pliers 33

Appendix: Transitive Dependency Given functional dependencies: X  Y and Y  Z, the transitive dependency X  Z must also hold. Example: There is an FD between OrderID and CustomerID. Given the OrderID key attribute, you always know the CustomerID. There is an FD between CustomerID and the other customer data, because CustomerID is the primary key. Given the CustomerID, you always know the corresponding attributes for Name, Phone, and so on. Consequently, given the OrderID (X), you always know the corresponding customer data by transitivity.

Appendix: Third Normal Form A relation is in third normal form if and only if it is in 2NF and no non-key attributes are transitively dependent on the primary key. That is, K  Ai for each attribute, (2NF) and There is no subset of attributes X such that K  X  Ai Example: Order(OrderID, OrderDate, CustomerID, Name, Phone) OrderID OrderDate CustomerID Name Phone 32 5/5/2004 1 Jones 222-3333 33 2 Hong 444-8888 34 5/6/2004

Appendix: Boyce-Codd Normal Form A relation is in Boyce-Codd Normal Form (BCNF) if and only if it is in 3NF and every determinant is a candidate key (or K is a superkey). That is, K  Ai for every attribute, and there is no subset X (key or nonkey) such that X  Ai where X is different from K. Example: Employees can have many specialties, and many employees can be within a specialty. Employees can have many managers, but a manager can have only one specialty: Mgr  Specialty EmpSpecMgr(EID, Specialty, ManagerID) EID Speciality ManagerID 32 Drill 1 33 Weld 2 34 FD ManagerID  Specialty is not currently a key.

Appendix: Multi-Valued Dependency A multi-valued dependency (MVD) exists when there are at least three attributes in a relation (A, B, and C; and they could be sets), and one attribute (A) determines the other two (B and C) but the other two are independent of each other. That is, A B and A  C but B and C have no FDs Example: Employees have many specialties and many tools, but tools and specialties are not directly related.

Appendix: Fourth Normal Form A relation is in fourth normal form 4NF if and only if it is in BCNF and there are no multi-valued dependencies. That is, all attributes of R are also functionally dependent on A. If A   B, then all attributes of R are also functionally dependent on A: A  Ai for each attribute. Example: EmpSpecTools(EID, Specialty, ToolID) EmpSpec(EID, Specialty) EmpTools(EID, ToolID)

Database Management Systems Chapter 4 Queries

Why do we Need Queries Natural languages (English) are too vague With complex questions, it can be hard to verify that the question was interpreted correctly, and that the answer we received is truly correct. Consider the question: Who are our best customers? We need a query system with more structure We need a standardized system so users and developers can learn one method that works on any (most) systems. Query By Example (QBE) SQL

Four Questions to Create a Query What output do you want to see? What do you already know (or what constraints are given)? What tables are involved? How are the tables joined together?

Tables

Organization Single table Constraints Computations Groups/Subtotals Multiple Tables

Sample Questions List all animals with yellow in their color. List all dogs with yellow in their color born after 6/1/04. List all merchandise for cats with a list price greater than $10. List all dogs who are male and registered or who were born before 6/1/04 and have white in their color. What is the average sale price of all animals? What is the total cost we paid for all animals? List the top 10 customers and total amount they spent. How many cats are in the animal list? Count the number of animals in each category. List the CustomerID of everyone who bought something between 4/1/04 and 5/31/04. List the first name and phone of every customer who bought something between 4/1/04 and 5/31/04. List the last name and phone of anyone who bought a registered white cat between 6/1/04 and 12/31/04. Which employee has sold the most items?

Query By Example & SQL What tables? SELECT AnimalID, Category, Breed, Color FROM Animal WHERE (Color LIKE ‘%Yellow%’); Animal AnimalID Name Category Breed DateBorn Gender What to see? What conditions? Field AnimalID Category Breed Color Table Animal Sort Criteria Like ‘%Yellow%’ Or List all animals with yellow in their color

Basic SQL SELECT SELECT columns What do you want to see? FROM tables What tables are involved? JOIN conditions How are the tables joined? WHERE criteria What are the constraints?

ORDER BY SELECT columns FROM tables JOIN join columns WHERE conditions ORDER BY columns (ASC DESC) Animal SELECT Name, Category, Breed FROM Animal ORDER BY Category, Breed; Name Category Breed Cathy Bird African Grey Bird Canary Debbie Bird Cockatiel Bird Cockatiel Terry Bird Lovebird Bird Other Charles Bird Parakeet Curtis Bird Parakeet Ruby Bird Parakeet Sandy Bird Parrot Hoyt Bird Parrot Bird Parrot AnimalID Name Category Breed DateBorn Gender Field Name Category Breed Table Animal Sort Ascending Criteria Or

DISTINCT SELECT Category FROM Animal; SELECT DISTINCT Category Fish Dog Cat . . . Category Bird Cat Dog Fish Mammal Reptile Spider

Constraints: And SELECT AnimalID, Category, DateBorn FROM Animal Query04_02 Animal SELECT AnimalID, Category, DateBorn FROM Animal WHERE ((Category=‘Dog’) AND (Color Like ‘%Yellow%’) AND (DateBorn>’01-Jun-2004’)); AnimalID Name Category Breed DateBorn Gender Field AnimalID Category DateBorn Color Table Animal Sort Criteria ‘Dog’ >’01-Jun-2004’ Like ‘%Yellow%’ Or List all dogs with yellow in their color born after 6/1/04.

Boolean Algebra a = 3 And: Both must be true. b = -1 c = 2 And: Both must be true. Or: Either one is true. Not: Reverse the value. (a > 4) And (b < 0) F T F (a > 4) Or (b < 0) F T T NOT (b < 0) T F

Boolean Algebra a = 3 b = -1 c = 2 The result is affected by the order of the operations. Parentheses indicate that an operation should be performed first. With no parentheses, operations are performed left-to-right. a = 3 b = -1 c = 2 ( (a > 4) AND (b < 0) ) OR (c > 1) F T T F T T (a > 4) AND ( (b < 0) OR (c > 1) ) Always use parentheses, so other people can read and understand your query. F T T F T F

DeMorgan’s Law Example Customer: "I want to look at a cat, but I don’t want any cats that are registered or that have red in their color." Animal SELECT AnimalID, Category, Registered, Color FROM Animal WHERE (Category=‘Cat’) AND NOT ((Registered is NOT NULL) OR (Color LIKE ‘%Red%’)). AnimalID Name Category Breed DateBorn Gender Field AnimalID Category Registered Color Table Animal Sort Criteria ‘Cat’ Is Null Not Like ‘%Red%’ Or

DeMorgan’s Law NOT ((Registered is NOT NULL) OR (Color LIKE ‘%Red%’)) Negation of clauses Not (A And B) becomes Not A Or Not B Not (A Or B) becomes Not A And Not B Registered=ASCF Color=Black NOT ((Registered is NOT NULL) OR (Color LIKE ‘%Red%’)) T or F T not F (Registered is NULL) AND NOT (Color LIKE ‘%Red%’) F F not and T F

Conditions: AND, OR Query04_03 SELECT AnimalID, Category, Gender, Registered, DateBorn, Color FROM Animal WHERE (( Category=‘Dog’) AND ( ( (Gender=‘Male’) AND (Registered Is Not Null) ) OR ( (DateBorn<’01-Jun-2004’) AND (Color Like ‘%White%’) ) ) ); Animal AnimalID Name Category Breed DateBorn Gender Field AnimalID Category Gender Registered DateBorn Color Table Animal Sort Criteria ‘Dog’ ‘Male’ Is Not Null Or < ’01-Jun-2004’ Like ‘%White%’ List all dogs who are male and registered or who were born before 6/1/2004 and have white in their color.

Useful Where Conditions Comparisons Examples Operators <, =, >, <>, BETWEEN, LIKE, IN Numbers AccountBalance > 200 Text Simple Pattern match one Pattern match any Name > ‘Jones’ License LIKE ‘A_ _82_’ Name LIKE ‘J%’ Dates SaleDate BETWEEN ’15-Aug-2004’ AND ’31-Aug-2004’ Missing Data City IS NULL Negation Name IS NOT NULL Sets Category IN (‘Cat’, ‘Dog’, ‘Hamster’)

Oracle Views

Oracle Views and SQL CREATE VIEW Pets.Example AS SELECT Pets.Animal.AnimalID, Pets.Animal.Breed, Pets.Animal.Category, Pets.Animal.Color FROM Pets.Animal WHERE (Pets.Animal.Color LIKE ‘%Yellow%’) SQL version is created by the Oracle View Wizard. The CREATE VIEW command saves it with the specified name.

Oracle View Wizard

Oracle Schema Manager: Views

Oracle Content Viewer

SQL Server Views

Simple Computations SaleItem(OrderID, ItemID, SalePrice, Quantity) Select OrderID, ItemID, SalePrice, Quantity, SalePrice*Quantity As Extended From SaleItem; OrderID ItemID Price Quantity Extended 151 9764 19.50 2 39.00 151 7653 8.35 3 25.05 151 8673 6.89 2 13.78 Basic computations (+ - * /) can be performed on numeric data. The new display column should be given a meaningful name.

Computations: Aggregation--Avg Query04_04 SELECT Avg(SalePrice) AS AvgOfSalePrice FROM SaleAnimal; SaleAnimal Sum Avg Min Max Count StDev or StdDev Var SaleID AnimalID SalePrice Field SalePrice Table SaleAnimal Total Avg Sort Criteria Or What is the average sale price of all animals?

Computations (Math Operators) Query04_05 OrderItem SELECT Sum(Quantity*Cost) AS OrderTotal FROM OrderItem WHERE (PONumber=22); PONumber ItemID Quantity Cost Field PONumber OrderTotal: Quantity*Cost Table OrderItem Total Sort Criteria =22 Or OrderTotal 1798.28 What is the total value of the order for PONumber 22? Use any common math operators on numeric data. Operate on data in one row at a time.

SQL Differences

Subtotals (Where) How many cats are in the Animal list? Query04_06 Animal AnimalID Name Category Breed DateBorn Gender SELECT Count(AnimalID) AS CountOfAnimalID FROM Animal WHERE (Category = ‘Cat’); Field AnimalID Category Table Animal Total Count Where Sort Criteria ‘Cat’ Or How many cats are in the Animal list?

Groups and Subtotals Category CountOfAnimalID Dog 100 Cat 47 Bird 15 Query04_07 SELECT Category, Count(AnimalID) AS CountOfAnimalID FROM Animal GROUP BY Category ORDER BY Count(AnimalID) DESC; Animal AnimalID Name Category Breed DateBorn Gender Category CountOfAnimalID Dog 100 Cat 47 Bird 15 Fish 14 Reptile 6 Mammal 6 Spider 3 Field Category AnimalID Table Animal Total Group By Count Sort Descending Criteria Or Count the number of animals in each category. You could type in each WHERE clause, but that is slow. And you would have to know all of the Category values.

Conditions on Totals (Having) Query04_08 Animal SELECT Category, Count(AnimalID) AS CountOfAnimalID FROM Animal GROUP BY Category HAVING Count(AnimalID) > 10 ORDER BY Count(AnimalID) DESC; AnimalID Name Category Breed DateBorn Gender Field Category AnimalID Table Animal Total Group By Count Sort Descending Criteria >10 Or Category CountOfAnimalID Dog 100 Cat 47 Bird 15 Fish 14 Count number of Animals in each Category, but only list them if more than 10.

Where (Detail) v Having (Group) Query04_09 Animal SELECT Category, Count(AnimalID) AS CountOfAnimalID FROM Animal WHERE DateBorn > ’01-Jun-2004’ GROUP BY Category HAVING Count(AnimalID) > 10 ORDER BY Count(AnimalID) DESC; AnimalID Name Category Breed DateBorn Gender Field Category AnimalID DateBorn Table Animal Total Group By Count Where Sort Descending Criteria >10 >’01-Jun-2004’ Or Category CountOfAnimalID Dog 30 Cat 18 Count Animals born after 6/1/2004 in each Category, but only list Category if more than 10.

Multiple Tables (Intro & Distinct) Query04_10 CustomerI D 6 8 14 19 22 24 28 36 37 38 39 42 50 57 58 63 74 80 90 Sale SELECT DISTINCT CustomerID FROM Sale WHERE (SaleDate Between ’01-Apr-2004’ And ’31-May-2004’) ORDER BY CustomerID; SaleID SaleDate EmployeeID CustomerID SalesTax Field CustomerID SaleDate Table Sale Sort Ascending Criteria Between ’01-Apr-2004’ And ’31-May-2004’ Or List the CustomerID of everyone who bought something between 01-Apr-2004 and 31-May-2004.

Joining Tables Query04_11 SELECT DISTINCT Sale.CustomerID, Customer.LastName FROM Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID WHERE (SaleDate Between ’01-Apr-2004’ And ’31-May-2004’) ORDER BY Customer.LastName; Sale Customer SaleID SaleDate EmployeeID CustomerID CustomerID Phone FirstName LastName CustomerID LastName 22 Adkins 57 Carter 38 Franklin 42 Froedge 63 Grimes 74 Hinton 36 Holland 6 Hopkins 50 Lee 58 McCain … Field CustomerID LastName SaleDate Table Sale Customer Sort Ascending Criteria Between ’01-Apr-2004’ And ’31-May-2004’ Or List LastNames of Customers who bought between 4/1/2004 and 5/31/2004.

SQL JOIN FROM table1 INNER JOIN table2 ON table1.column = table2.column SQL 92 syntax (Access and SQL Server) FROM table1, table2 WHERE table1.column = table2.column SQL 89 syntax (Oracle) FROM table1, table2 JOIN table1.column = table2.column Informal syntax

Syntax for Three Tables SQL ‘92 syntax to join three tables FROM Table1 INNER JOIN (Table2 INNER JOIN Table3 ON Table2.ColA = Table3.ColA) ON Table1.ColB = Table2.ColB Easier notation, but not correct syntax FROM Table1, Table2, Table3 JOIN Table1.ColB = Table2.ColB Table2.ColA = Table3.ColA

Multiple Tables (Many) Query04_12 SELECT DISTINCTROW Customer.LastName, Customer.Phone FROM Customer INNER JOIN (Sale INNER JOIN (Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID) ON Sale.SaleID = SaleAnimal.SaleID) ON Customer.CustomerID = Sale.CustomerID WHERE ((Animal.Category=‘Cat’) AND (Animal.Registered Is Not Null) AND (Color Like ‘%White%’) AND (SaleDate Between ’01-Jun-2004’ And ’31-Dec-2004’)); Animal SaleAnimal Sale Customer AnimalID Name Category Breed SaleID AnimalID SalePrice SaleID SaleDate EmployeeID CustomerID CustomerID Phone FirstName LastName Field LastName Phone Category Registered Color SaleDate Table Customer Animal Sale Sort Ascending Criteria ‘Cat’ Is Not Null Like ‘%White%’ Between ’01-Jun-2004’ And ’31-Dec-2004’ Or List the Last Name and Phone of anyone who bought a registered White cat between 6/1/2004 and 12/31/2004.

Oracle select lastname, phone from customer inner join sale on customer.customerid = sale.customerid inner join saleanimal on sale.saleid = saleanimal.saleid inner join animal on saleanimal.animalid = animal.animalid where (category = 'Cat') and (Registered is not null) and (color like '%White%') AND (saledate between '01-Jun-2004' and '31-Dec-2004') ;

Building a Query Identify the tables involved. List the Last Name and Phone of anyone who bought a registered White cat between 6/1/04 and 12/31/04. Identify the tables involved. Look at the columns you want to see. LastName, Phone: Customer Look at the columns used in the constraints. Registered, Color, Category: Animal Sale Date: Sale Find connector tables. To connect Animal to Sale: SaleAnimal Select the desired columns and test the query. Enter the constraints. Set Order By columns. Add Group By columns. Add summary computations to the SELECT statement.

Joining Tables (Hints) Build Relationships First Drag and drop From one side to many side Avoid multiple ties between tables SQL FROM Table1 INNER JOIN Table2 ON Table1.ColA = Table2.ColB Join columns are often keys, but they can be any columns--as long as the domains (types of data) match. Multiple Tables FROM (Table1 INNER JOIN Table2 ON T1.ColA = T2.ColB ) INNER JOIN Table3 ON T3.ColC = T3.ColD Shorter Notation FROM T1, T2, T3 JOIN T1.ColA = T2.ColB T1.ColC = T3.ColD Shorter Notation is not correct syntax, but it is easier to write.

Tables with Multiple Joins Potential problem with three or more tables. Access uses predefined relationships to automatically determine JOINs. JOINS might loop. Most queries will not work with loops. A query with these four tables with four JOINS would only return rows where the Employee had the same ZipCode as the Supplier. If you only need the Supplier city, just delete the JOIN between Employee and ZipCode. If you want both cities, add the ZipCode table again as a fifth table.

Table Alias City Supplier AnimalOrder Employee City2 CityID ZipCode City State SupplierID Address ZipCode CityID OrderDate SupplierID ShippingCost EmployeeID EmployeeID LastName ZipCode CityID CityID ZipCode City State SELECT Supplier.SID, Supplier.CityID, City.City, Employee.EID, Employee.LastName, Employee.CityID, City2.City FROM (City INNER JOIN Supplier ON City.CityID = Supplier.CityID) INNER JOIN ((City AS City2 INNER JOIN Employee ON City2.CityID = Employee.CityID) INNER JOIN AnimalOrder ON Employee.EmployeeID = AnimalOrder.EmployeeID) ON Supplier.SupplierID = AnimalOrder.SupplierID; SID Supplier.CityID City.City EID LastName Employee.CityID City2.City 4 7972 Middlesboro 5 James 7083 Orlando 2 10896 Springfield 1 Reeves 9201 Lincoln 4 7972 Middlesboro 3 Reasoner 8313 Springfield 9 10740 Columbia 8 Carpenter 10592 Philadelphia 5 10893 Smyrna 3 Reasoner 8313 Springfield

Saved Query: Create View Save a query Faster: only enter once Faster: only analyze once Any SELECT statement Can use the View within other SQL queries. CREATE VIEW Kittens AS SELECT * FROM Animal WHERE (Category = ‘Cat’) AND (Today - DateBorn < 180); SELECT Avg(ListPrice) FROM Kittens WHERE (Color LIKE ‘%Black%’);

Updateable Views OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description) OrderLine(OrderID, ItemID, Description, Quantity) To be updateable, a view must focus on one primary table. (OrderItem) Goal is to change data in only one table. (OrderItem) Data can be displayed from other tables. (Item) Never include or attempt to change primary keys from more than one table. (Item.ItemID)

Non Updateable View OrderItem(OrderID, ItemID, Quantity) Item(ItemID, Description) 121 57 3 121 82 2 122 57 1 57 Cat food 58 Dog food 59 Bird food OrderLine(OrderID, Item.ItemID, Description, Quantity) 121 57 Cat food 3 121 82 Bird feeder 2 122 57 Cat food 1 32 If you attempt to change the Item.ItemID in the OrderLineView: You will simply change the primary key value in the Item table. It will not add a new row to the OrderItem table.

SQL Syntax: ALTER TABLE ALTER TABLE table ADD COLUMN column datatype (size) DROP COLUMN column CREATE TABLE See also: DROP TABLE

SQL Syntax: COMMIT COMMIT WORK ROLLBACK See also:

SQL Syntax: CREATE INDEX CREATE [UNIQUE] INDEX index ON table (column1, column2, … ) WITH {PRIMARY | DISALLOW NULL | IGNORE NULL} CREATE TABLE See also:

SQL Syntax: CREATE TABLE CREATE TABLE table ( column1 datatype (size) [NOT NULL] [index1] , column2 datatype (size) [NOT NULL] [index2], … , CONSTRAINT pkname PRIMARY KEY (column, …), CONSTRAINT fkname FOREIGN KEY (column) REFERENCES existing_table (key_column) ON DELETE CASCASDE ) ALTER TABLE See also: DROP TABLE

SQL Syntax: CREATE VIEW CREATE VIEW viewname AS SELECT … SELECT See also:

SQL Syntax: DELETE DELETE FROM table WHERE condition DROP See also:

SQL Syntax: DROP See also: DROP INDEX index ON table DROP TABLE DROP VIEW DELETE See also:

SQL Syntax: INSERT See also: INSERT INTO table (column1, column2, …) VALUES (value1, value2, … ) INSERT INTO newtable (column1, column2, …) SELECT … SELECT See also:

SQL Syntax: GRANT See also: GRANT privilege privileges ON object ALL, ALTER, DELETE, INDEX, TO user | PUBLIC INSERT, SELECT, UPDATE REVOKE See also:

SQL Syntax: REVOKE See also: REVOKE privilege privileges ON object ALL, ALTER, DELETE, INDEX, FROM user | PUBLIC INSERT, SELECT, UPDATE GRANT See also:

SQL Syntax: ROLLBACK See also: SAVEPOINT savepoint {optional} ROLLBACK WORK TO savepoint COMMIT See also:

SQL Syntax: SELECT SELECT DISTINCT table.column {AS alias} , . . . FROM table/query INNER JOIN table/query ON T1.ColA = T2.ColB WHERE (condition) GROUP BY column HAVING (group condition) ORDER BY table.column { UNION, INTERSECT, EXCEPT … }

SQL Syntax: SELECT INTO SELECT column1, column2, … INTO newtable FROM tables WHERE condition SELECT See also:

SQL Syntax: UPDATE See also: UPDATE TABLE table SET column1 = value1, column2 = value2, … WHERE condition DELETE See also:

Database Management Systems Chapter 5 Advanced Queries

Tables

Organization Harder Questions Subqueries Not In, LEFT JOIN UNION, Multiple JOIN columns, Recursive JOIN Other SQL Commands DDL: Data Definition Language DML: Data Manipulation Language OLAP Microsoft SQL Server Oracle Microsoft Access Crosstab

Harder Questions How many cats are “in-stock” on 10/1/04? Which cats sold for more than the average price? Which animals sold for more than the average price of animals in their category? Which animals have not been sold? Which customers (who bought something at least once) did not buy anything between 11/1/04 and 12/31/04? Which customers who bought Dogs also bought products for Cats (at any time)?

Sub-query for Calculation Which cats sold for more than the average sale price of cats? Assume we know the average price is $170. Usually we need to compute it first. SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category=‘Cat’) AND (SaleAnimal.SalePrice>170)); SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category=‘Cat’) AND (SaleAnimal.SalePrice> ( SELECT AVG(SalePrice) WHERE (Animal.Category=‘Cat’) ) ) );

Query Sets (IN) Customer Sale SaleItem SELECT Customer.LastName, Customer.FirstName, SaleItem.ItemID FROM (Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID) INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE (SaleItem.ItemID In (1,2,30,32,33)) ORDER BY Customer.LastName, Customer.FirstName; Customer Sale SaleItem CustomerID Phone FirstName LastName SaleID SaleDate EmployeeID CustomerID SaleID ItemID Quantity SalePrice Field LastName FirstName ItemID Table Customer SaleItem Sort Ascending Criteria In (1,2,30,32,33) Or List all customers (Name) who purchased one of the following items: 1, 2, 30, 32, 33.

Using IN with a Sub-query List all customers who bought items for cats. SELECT Customer.LastName, Customer.FirstName, SaleItem.ItemID FROM (Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID) INNER JOIN SaleItem ON Sale.SaleID = SaleItem.SaleID WHERE (SaleItem.ItemID In (SELECT ItemID FROM Merchandise WHERE Category=‘Cat’) );

SubQuery (IN: Look up a Set) SELECT Customer.LastName, Customer.FirstName FROM Customer INNER JOIN Sale ON Customer.CustomerID = Sale.CustomerID WHERE ((Month([SaleDate])=3)) And Customer.CustomerID In (SELECT CustomerID FROM Sale WHERE (Month([SaleDate])=5) ); Customer Sale LastName First Adkins Inga McCain Sam Grimes Earl CustomerID Phone FirstName LastName SaleID SaleDate EmployeeID CustomerID Field LastName FirstName Month(SaleDate) CustomerID Table Customer Sale Sort Ascending Criteria 3 In (SELECT CustomerID FROM State WHERE (Month(SaleDAte)=5) Or List all of the customers who bought something in March and who bought something in May. (Two tests on the same data!)

SubQuery (ANY, ALL) Query04_15 SELECT Animal.AnimalID, Name, SalePrice, ListPrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((SalePrice > Any (SELECT 0.80*ListPrice WHERE Category = ‘Cat’)) AND (Category=‘Cat’); Any: value is compared to each item in the list. If it is True for any of the items, the statement is evaluated to True. All: value is compared to each item in the list. If it is True for every item in the list, the statement is evaluated to True (much more restrictive than any.

SubQuery: NOT IN (Subtract) Animal SELECT Animal.AnimalID, Animal.Name, Animal.Category FROM Animal WHERE (Animal.AnimalID Not In (SELECT AnimalID From SaleAnimal)); AnimalID Name Category Breed Field AnimalID Name Category Table Animal Sort Criteria Not In (SELECT AnimalID FROM SaleAnimal) Or AnimalID Name Category 12 Leisha Dog 19 Gene Dog 25 Vivian Dog 34 Rhonda Dog 88 Brandy Dog 181 Fish Which animals have not been sold? Start with list of all animals. Subtract out list of those who were sold.

SubQuery: NOT IN (Data) Animal SaleAnimal ID Name Category Breed 2 Fish Angel 4 Gary Dog Dalmation 5 Fish Shark 6 Rosie Cat Oriental Shorthair 7 Eugene Cat Bombay 8 Miranda Dog Norfolk Terrier 9 Fish Guppy 10 Sherri Dog Siberian Huskie 11 Susan Dog Dalmation 12 Leisha Dog Rottweiler ID SaleID SalePrice 2 35 $10.80 4 80 $156.66 6 27 $173.99 7 25 $251.59 8 4 $183.38 10 18 $150.11 11 17 $148.47 Which animals have not been sold?

Left Outer Join Animal SaleAnimal AnimalID Name Category 12 Leisha Dog Query04_17 SELECT Animal.AnimalID, Animal.Name, Animal.Category FROM Animal LEFT JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SaleID Is Null); Animal SaleAnimal AnimalID Name Category 12 Leisha Dog 19 Gene Dog 25 Vivian Dog 34 Rhonda Dog 88 Brandy Dog 181 Fish AnimalID Name Category Breed SaleID AnimalID SalePrice Field AnimalID SaleID Name Category Table Animal SaleAnimal Sort Criteria Is Null Or Which animals have not been sold? LEFT JOIN includes all rows from left table (Animal) But only those from right table (SaleAnimal) that match a row in Animal. Rows in Animal without matching data in Sale Animal will have Null.

Left Outer Join (Example) ID Name Category Breed 2 Fish Angel 4 Gary Dog Dalmation 5 Fish Shark 6 Rosie Cat Oriental Shorthair 7 Eugene Cat Bombay 8 Miranda Dog Norfolk Terrier 9 Fish Guppy 10 Sherri Dog Siberian Huskie 11 Susan Dog Dalmation 12 Leisha Dog Rottweiler ID SaleID SalePrice 2 35 $10.80 4 80 $156.66 Null Null Null 6 27 $173.99 7 25 $251.59 8 4 $183.38 10 18 $150.11 11 17 $148.47

Older Syntax for Left Join Which animals have not been sold? SELECT ALL FROM Animal, SaleAnimal WHERE Animal.AnimalID *= SaleAnimal.AnimalID And SaleAnimal.SaleID Is Null; Old Oracle syntax— note that the (+) symbol is on the reversed side. SELECT ALL FROM Animal, SaleAnimal WHERE Animal.AnimalID = SaleAnimal.AnimalID (+) And SaleAnimal.SaleID Is Null;

SubQuery for Computation SELECT SaleAnimal.AnimalID, Animal.Category, SaleAnimal.SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE ((Animal.Category=‘Cat’) AND (SaleAnimal.SalePrice> ( SELECT AVG(SalePrice) WHERE (Animal.Category=‘Cat’) ) ) ); Animal SaleAnimal Don’t know the average, so use a subquery to look it up. Watch parentheses. AnimalID Name Category Breed SaleID AnimalID SalePrice Field AnimalID Name Category SalePrice Table Animal SaleAnimal Sort Descending Criteria 3 > (SELECT Avg(SalePrice) FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Animal.Category = ‘Cat’) Or

Correlated Subquery List the Animals that have sold for a price higher than the average for animals in that Category. SELECT AnimalID, Name, Category, SalePrice FROM Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SalePrice> (SELECT Avg(SaleAnimal.SalePrice) WHERE (Animal.Category = Animal.Category) ) ) ORDER BY SaleAnimal.SalePrice DESC; The subquery needs to compute the average for a given category. Problem: Which category? Answer: the category that matches the category from the main part of the query. Problem: How do we refer to it? Both tables are called Animal. This query will not work yet.

Correlated SubQuery (Avoid) Match category in subquery with top level Rename tables (As) Correlated Subquery Recompute subquery for every row in top level--slow! Better to compute and save Subquery, then use in join. SELECT A1.AnimalID, A1.Name, A1.Category, SaleAnimal.SalePrice FROM Animal As A1 INNER JOIN SaleAnimal ON A1.AnimalID = SaleAnimal.AnimalID WHERE (SaleAnimal.SalePrice> (SELECT Avg(SaleAnimal.SalePrice) FROM Animal As A2 INNER JOIN SaleAnimal ON A2.AnimalID = SaleAnimal.AnimalID WHERE (A2.Category = A1.Category) ) ) ORDER BY SaleAnimal.SalePrice DESC; List the Animals that have sold for a price higher than the average for animals in that Category.

Correlated Subquery Problem Animal + SaleAnimal Category SalePrice Fish $10.80 Dog $156.66 Fish $19.80 Cat $173.99 Cat $251.59 Dog $183.38 Fish $1.80 Dog $150.11 Dog $148.47 Compute Avg: $37.78 Compute Avg: $174.20 Compute Avg: $37.78 Compute Avg: $169.73 Compute Avg: $169.73 Recompute average for every row in the main query! Assume small query 100,000 rows 5 categories of 20,000 rows 100,000 * 20,000 = 1 billion rows to read!

More Efficient Solution: 2 queries Animal + SaleAnimal Saved Query Category SalePrice Category AvgOfSalePrice Fish $10.80 Dog $156.66 Fish $19.80 Cat $173.99 Cat $251.59 Dog $183.38 Fish $1.80 Dog $150.11 Dog $148.47 Bird $176.57 Cat $169.73 Dog $174.20 Fish $37.78 Mammal $80.72 Reptile $181.83 Spider $118.16 JOIN Animal.Category = Query1.Category Compute the averages once and save query JOIN saved query to main query Two passes through table: 1 billion / 200,000 => 10,000

UNION Operator SELECT EID, Name, Phone, Salary, ‘East’ AS Office FROM EmployeeEast UNION SELECT EID, Name, Phone, Salary, ‘West’ AS Office FROM EmployeeWest EID Name Phone Salary Office 352 Jones 3352 45,000 East 876 Inez 8736 47,000 East 372 Stoiko 7632 38,000 East 890 Smythe 9803 62,000 West 361 Kim 7736 73,000 West Offices in Los Angeles and New York. Each has an Employee table (East and West). Need to search data from both tables. Columns in the two SELECT lines must match.

UNION, INTERSECT, EXCEPT List the name of any employee who has worked for both the East and West regions. A B C T1 T2 SELECT EID, Name FROM EmployeeEast INTERSECT FROM EmployeeWest

Multiple JOIN Columns Animal AnimalID Breed Name Category Category DateBorn Gender . . . Breed Category Breed SELECT * FROM Breed INNER JOIN Animal ON Breed.Category = Animal.Category AND Breed.Breed = Animal.Breed Sometimes need to JOIN tables on more than one column. PetStore: Category and Breed.

Reflexive Join Employee SQL EID Name . . . Manager 115 Sanchez 765 462 Miller 115 523 Hawk 115 765 Munoz 886 SELECT Employee.EID, Employee.Name, Employee.Manager, E2.Name FROM Employee INNER JOIN Employee AS E2 ON Employee.Manager = E2.EID Result EID Name Manager Name 115 Sanchez 765 Munoz 462 Miller 115 Sanchez 523 Hawk 115 Sanchez Need to connect a table to itself. Common example: Employee(EID, Name, . . ., Manager) A manager is also an employee. Use a second copy of the table and an alias.

Recursive Joins (SQL 99 and 200x) WITH RECURSIVE EmployeeList (EmployeeID, Title, Salary) AS ( SELECT EmployeeID, Title, 0.00 FROM Manages WHERE Title = “CEO” -- starting level UNION ALL SELECT Manages.EmployeeID, Manages.Title, Manages.Salary FROM EmployeeList INNER JOIN Manages ON EmployeeList.EmployeeID = Manages.ManagerID ) SELECT EmployeeID, Count(Title), Sum(Salary) FROM EmployeeList GROUP BY EmployeEID ; List all of the employees and list everyone who reports to them. Not yet supported by vendors. It provides tree spanning capabilities.

CASE Function Select AnimalID, CASE Not available in Microsoft Access. It is in SQL Server and Oracle. Select AnimalID, CASE WHEN Date()-DateBorn < 90 Then “Baby” WHEN Date()-DateBorn >= 90 AND Date()-DateBorn < 270 Then “Young” WHEN Date()-DateBorn >= 270 AND Date()-DateBorn < 365 Then “Grown” ELSE “Experienced” END FROM Animal; Used to change data to a different context. Example: Define age categories for the animals. Less than 3 months Between 3 months and 9 months Between 9 months and 1 year Over 1 year

Inequality Join AR(TransactionID, CustomerID, Amount, DateDue) AccountsReceivable Categorize by Days Late 30, 90, 120+ Three queries? New table for business rules AR(TransactionID, CustomerID, Amount, DateDue) LateCategory(Category, MinDays, MaxDays, Charge, …) Month 30 90 3% Quarter 90 120 5% Overdue 120 9999 10% SELECT * FROM AR INNER JOIN LateCategory ON ((Date() - AR.DateDue) >= LateCategory.MinDays) AND ((Date() - AR.DateDue) < LateCategory.MaxDays)

SQL SELECT SELECT DISTINCT Table.Column {AS alias} , . . . FROM Table/Query INNER JOIN Table/Query ON T1.ColA = T2.ColB WHERE (condition) GROUP BY Column HAVING (group condition) ORDER BY Table.Column { Union second select }

SQL Mnemonic Someone From Ireland Will Grow Horseradish and Onions SELECT FROM INNER JOIN WHERE GROUP BY HAVING ORDER BY SQL is picky about putting the commands in the proper sequence. If you have to memorize the sequence, this mnemonic may be helpful.

SQL Data Definition Create Schema Authorization dbName password Create Table TableName (Column Type, . . .) Alter Table Table {Add, Column, Constraint, Drop} Drop {Table Table | Index Index On table} Create Index IndexName ON Table (Column {ASC|DESC})

Syntax Examples CREATE TABLE Customer (CustomerID INTEGER NOT NULL, LastName CHAR (10), more columns ); ALTER TABLE Customer DROP COLUMN ZipCode; ALTER TABLE Customer ADD COLUMN CellPhone CHAR(15);

Queries with “Every” Need EXISTS List the employees who have sold animals from every category. By hand: List the employees and the categories. Go through the SaleAnimal list and check off the animals they have sold.

Query With EXISTS List the Animal categories that have not been sold by an employee (#5). SELECT Category FROM Category WHERE (Category <> "Other") And Category NOT IN (SELECT Animal.Category FROM Animal INNER JOIN (Sale INNER JOIN SaleAnimal ON Sale.SaleID = SaleAnimal.SaleID) ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Sale.EmployeeID = 5) If this query returns any rows, then the employee has not sold every animal. So list all the employees for whom the above query returns no rows: SELECT EmployeeID, LastName FROM Employee WHERE NOT EXISTS (above query slightly modified.)

Query for Every SELECT Employee.EmployeeID, Employee.LastName FROM Employee WHERE Not Exists (SELECT Category FROM Category WHERE (Category <> "Other") And Category NOT IN (SELECT Animal.Category FROM Animal INNER JOIN (Sale INNER JOIN SaleAnimal ON Sale.SaleID = SaleAnimal.SaleID) ON Animal.AnimalID = SaleAnimal.AnimalID WHERE Sale.EmployeeID = Employee.EmployeeID) ); Result: 3 Reasoner

Simpler Query for Every Sometimes it is easier to use Crosstab and the Count function. But some systems do not have Crosstab, and sometimes the lists would be too long. So you need to know both techniques.

SQL: Foreign Key CREATE TABLE Order (OrderID INTEGER NOT NULL, OrderDate DATE, CustomerID INTEGER CONSTRAINT pkorder PRIMARY KEY (OrderID), CONSTRAINT fkorder FOREIGN KEY (CustomerID) REFERENCES Customer (CustomerID) ); Order Customer OrderID OrderDate CustomerID CustomerID LastName FirstName Address … *

SQL Data Manipulation Commands Insert Into target (column1 . . .) VALUES (value1 . . .) Insert Into target (column1 . . .) SELECT . . . FROM. . . Delete From table WHERE condition Update table SET Column=Value,. . . Where condition Note the use of the Select and Where conditions. Synatx is the same--only learn it once. You can also use subqueries.

Copy Old Animal Data INSERT INTO OldAnimals SELECT * FROM Animal WHERE AnimalID IN (SELECT AnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ON AnimalOrder.OrderID = AnimalOrderItem.OrderID WHERE (AnimalOrder.OrderDate<’01-Jan-2004’) );

Delete Old Animal Data DELETE FROM Animal WHERE AnimalID IN (SELECT AnimalOrderItem.AnimalID FROM AnimalOrder INNER JOIN AnimalOrderItem ON AnimalOrder.OrderID = AnimalOrderItem.OrderID WHERE (AnimalOrder.OrderDate<’01-Jan-2004’) );

Update Example UPDATE Animal SET ListPrice = ListPrice*1.10 WHERE Category = ‘Cat’ ; UPDATE Animal SET ListPrice = ListPrice*1.20 WHERE Category = ‘Dog’ ; Change the ListPrice of Animals at the PetStore. For cats, increase the ListPrice by 10%. For dogs, increase the ListPrice by 20%. Typically use two similar UPDATE statements. With the CASE function, the statements can be combined.

Quality: Building Queries Break questions into smaller pieces. Test each query. Check the SQL. Look at the data. Check computations Combine into subqueries. Use cut-and-paste to avoid errors. Check for correlated subqueries. Test sample data. Identify different cases. Check final query and subqueries. Verify calculations. Which customers who bought Dogs also bought products for Cats (at any time)? Who bought dogs? Who bought cat products? Dogs and cat products on the same sale. Dogs and cat products at different times. Dogs and never any cat products. Cat products and never any Dogs.  Test SELECT queries before executing UPDATE queries.

Quality Queries: Example Which customers who bought Dogs also bought products for Cats? A. Which customers bought dogs? B. Which customers bought cat products? SELECT DISTINCT Animal.Category, Sale.CustomerID FROM Sale INNER JOIN (Animal INNER JOIN SaleAnimal ON Animal.AnimalID = SaleAnimal.AnimalID) ON Sale.SaleID = SaleAnimal.SaleID WHERE (((Animal.Category)=‘Dog’)) AND Sale.CustomerID IN ( SELECT DISTINCT Sale.CustomerID FROM Sale INNER JOIN (Merchandise INNER JOIN SaleItem ON Merchandise.ItemID = SaleItem.ItemID) ON Sale.SaleID = SaleItem.SaleID WHERE (((Merchandise.Category)=‘Cat’)) );

Programming Review: Variables Integer 2 bytes -32768 32767 Long 4 bytes +/- 2,147,483,648 Single +/- 3.402823 E 38 +/- 1.401298 E-45 Global, Const, Static Double 8 bytes +/- 1.79769313486232 E 308 +/- 4.94065645841247 E-324 Currency +/- 922,337,203,685,477.5808 String & String*n Variant Any data type Null

Programming: Scope and Lifetime Where is the variable, and which procedures can access it? Lifetime When is the variable created, and when is it destroyed? Form Button1 Button2 Form--Module Code Sub Button1_Click() Dim i1 As Integer i1 = 3 End Sub Different procedures, different variables. Created and destroyed each time the button is clicked. Sub Button2_Click() Dim i1 As Integer i1 = 7 End Sub

Programming: Global Variables Form Wider scope and lifetime Created at a higher level Form Public module Accessible to any procedure in that form or module. Declare it Global to make it available to any procedure. Button1 Button2 Form--Module Code Dim i2 As Integer Sub Button1_Click() i2 = 20 End Sub Variable is created when form is opened. Clicking Button1 sets the initial value. Clicking Button2 modifies the value. What if user clicks buttons in a different order? Sub Button2_Click() i2 = i2 + 7 End Sub

Programming: Computations Standard Math + - * / \ Integer divide ^ Exponentiation (2^3 = 2*2*2 = 8) Mod (15 Mod 4 = 3) (12 + 3 = 15) String & Concatenation Left, Right, Mid Trim, LTrim, RTrim Chr, Asc LCase, UCase InStr Len StrComp Format “Frank” & “Rose”  “FrankRose” Left(“Jackson”,5)  “Jacks” Trim(“ Maria “)  “Maria” Len(“Ramanujan”)  9 String(5,”a”)  “aaaaa” InStr(“8764 Main”,” “)  5

Programming: Standard Functions Numeric Exp, Log Atn, Cos, Sin, Tan Sqr Abs Sgn Int, Fix Rnd, Randomize ? =30 92 x = loge (ex) Trigonometric functions 2 = 1.414 Abs(-35)  35 Sgn(-35)  -1 Int(17.893)  17 Rnd()  0.198474

Programming: Standard Functions: Date/Time Date, Now, Time DateAdd, DateDiff “y”, “m”, “q” . . . Firstweekday 1=Sunday,. . . Can also be used to find number of Fridays, between two dates. 02/19/04 03/21/04 today DateDue DateDue = DateAdd(“d”, 30, Date())

Programming: Standard Functions: Variant IsDate IsNumeric VarType IsEmpty IsNull

Programming: Debug Stop Ctrl-Break F5: Go F8: Step through S-F8: Step over Breakpoints Immediate Window ? or Print Any assignment Any code

Programming: Output: Message Box MsgBox Message Type Title Types: Use Constants vbOKOnly vbOKCancel vbAbortRetryIgnore vbYesNoCancel vbYesNo vbRetryCancel Defaults vbDefaultButton1 vbDefaultButton2 vbDefaultButton3 Icons vbCritical Stop sign vbQuestion Question mark vbExclamation Warning vbInformation Circle i Responses vbOK vbCancel vbAbort vbRetry vbIgnore vbYes vbNo MsgBox "This is a message box", vbYesNoCancel + vbInformation, "Sample Box"

Programming: Input: InputBox Prompt Title Default X-Pos, Y-Pos Cannot change box size Use Chr(10) & Chr(13) for blank lines. Returns text or Variant Cancel = zero string ““ Positions Twips Twentieth of inch point 72 points 1440 twips per inch Dim str As String str = InputBox( "Enter your name:", "Sample Input", , 5000, 5000)

Programming: Conditions If If (Condition) Then statements for true Else statements for false End If IIF (Cond., True, False) Select Case (expr) Case value statements Case value2 Case Else End Select Conditions <, <=, >, >=, =, <> And, Or, Not, Xor Eqv, Imp (logic) If (Condition1) Then statements for true Else statements for false If (Condition2) Then End If

Programming Select Example Message Box Could use repeated If statements Better to use Select Case response = MsgBox(…) If (response == vbYes) Then ‘ statements for Yes Else If (response == vbNo) Then ‘ statements for No ‘statements for Cancel End If response = MsgBox(…) Select Case response Case vbYes ‘ statements for Yes Case vbNo ‘ statements for No Case vbCancel ‘ statements for Cancel End Case

Programming: Loops Initialize value Statements Change value Test condition Do For … Next For Each Do Until (x > 10) ‘ Statements x = x + 1 Loop Do While (x <= 10) ‘ Statements x = x + 1 Loop Do ‘ Statements x = x + 1 Loop Until (x > 10) For x = 1 to 10 ‘ Statements Next x

Programming: Loops Again Do Do {While | Until} Exit Do (optional) Loop Loop {While | Until} For/Next For counter = start To end Step increment Exit For (optional) Next counter For/Each (objects) For Each element In group [Exit For] (optional) Next element With (objects) With object End With

Programming Subroutines and Functions Sub name (var1 As . . ., var2, . . .) End Sub Function fname (var1 As . . .) As datatype fname = … ‘ returns a specific value End Function Variables are passed by reference Changes made to the parameters in the subroutine are passed back to the caller. Unless you use ByVal Changes are made to a copy of the parameter, but are not returned to the calling program.

Programming: Example Subroutine Main program … StatusMessage “Trying to connect.” StatusMessage “Verifying access.” End main program Sub StatusMessage (Msg As String) ‘ Display Msg, location, color End Sub

Programming: Parameter Types Main j = 3 DoSum j … ‘ j is now equal to 8 Subroutine DoSum (j2 As Integer) j2 = 8 End Sub By Reference Changes to data in the subroutine are passed back. Main j = 3 DoSum j … ‘ j is still equal to 3 Subroutine DoSum (ByVal j2 As Integer) j2 = 8 End Sub By Value Creates a copy of the variable, so changes are not returned.

Programming Arrays and User Types Dim array(sub, . . .) As type Dim iSorts(10) As Integer Specifying bounds: (lower To upper, . . .) ReDim [Preserve] array .. . Option Base 0 | 1 v 2.0 arrays less than 64KB User defined types Type Tname ename1 As type ename2 As type End Type Dim var1 As Tname var1.ename1 = . . . var1.ename2 = . . .

Programming: Financial Functions Fixed payments PV (rate, nper, pmt, fv, due) FV (rate, nper, pmt, pv, due) IPmt (rate, per, nper, pv, fv, due) NPer (rate, pmt, pv, fv, due) Pmt (rate, nper, pv, fv,due) PPmt (rate, per, nper, pv, fv, due) Rate (nper, pmt, pv, fv, due, guess) rate interest rate per period per specific period number nper # of periods pv present value fv future value due 0=due at end, 1=due at start Arrays NPV (rate, array) IRR (array, guess) MIRR (array, finrate, re_rate) Depreciation DDB (cost, salv, life, period) SLN (cost, salvage, life) SYD (cost, salv., life, period)

Programming: Text File Input/Output Open filename As # file# Close # file#, Reset Print #,Put, Write Spc, Tab Get, Input #, Line Input # EOF, LOF Seek # file#, position ChDir, ChDirve Dir Kill, (re)Name Lock, Unlock CurDir, MkDir, RmDir

OLE: Object Linking & Embedding CreateObject (class) “appname . objecttype” GetObject (file, class) Methods and syntax are defined by the software that exports the object. Example Dim obj As Object set obj = CreateObject(“Word.Basic”) obj.Bold obj.Insert “text” obj.SaveAs “file”

DDE: Dynamic Data Exchange Shell DDEInitiate DDEExecute DDEPoke, DDE Send Send data DDE, DDERequest Request data DDETerminate Application must be running Start a conversation/topic Issue a command Place data Get data Close the session

Database Management Systems Chapter 6 Forms and Reports

Uses of Forms Collect Data Display Query Results Display Analysis and Computations Switchboard for other Forms and Reports Direct Manipulation of Objects Graphics Drag and Drop

Human Factors Design User Control Consistency Clarity Aesthetics Match user tasks. Application responds to user control & events. User customization Consistency Layout, Design & colors Actions Clarity Organization Purpose Terminology Aesthetics Art to enhance, graphics Sound Feedback Methods Visual Text Audio Uses Acceptance of input Changes to data Completion of tasks Events / Activation Forgiveness Anticipation and correction of errors Confirmation on delete and updates Backup and recovery

Windows Interface Standards The Windows Interface: An Application Design Guide (Microsoft) Navigation and Choices Mouse, Icons Keyboard, Short-cuts Menus Selections from a list Single Contiguous Multiple Disjoint Multiple Focus Outline box Cursor Manipulation Activation Drag and Drop Feedback Progress indicators and status gauges Flashing Tool tips Status bar 3-D controls Message boxes

Windows Interface Window components Frame (sizing) Title bar Control-menu box Buttons Minimize Maximize Close Scroll box (thumb) Scroll bar

Windows Menus Menus Drop-down Pop-up (as needed) Short Cut Keys Mnemonic character Pop-up (as needed)

Message Box (A Simple Form) Message Boxes Title Message Simple buttons Icons Modal (required)

Interface / Accessibility Multiple Input Methods Keyboard Mouse Voice Multiple Output Visual Sound Color Some Suggestions: Beware of Red/Green. Avoid requiring rapid user responses. Avoid rapid flashing on the screen. Give users customization options. Volume Color Typefaces & Fonts

Form Layout Form 11 Dog 5 7 Dog 1 13 Cat 2 Types of Forms Controls Tabular Single Row Sub-forms (one-to-many) Switchboard Controls Form Properties Form Events Form Order Items 11 Dog 5 7 Dog 1 13 Cat 2

Tabular Form Works best for single table. Designer can control data entry sequence. Probably include buttons for sorting.

Single Row (Columnar) Form Data for only one row. Designer can set optimal layout. Similar in appearance to paper forms. Can use color, graphics, and command buttons to make the form easier to use. Note the importance of the navigation buttons. Probably want a Find command. Useful to include subforms.

Sub-Forms Typically a one-to-many relationship. Subform contents are linked to the main form through a common column (not displayed on the subform.) Can have multiple subforms (Independent or Nested).

Switchboard Form Blank Form Graphics/Picture/Background Identify User Choose Task.

Menu Design Main Menu Customer Information 1. Setup Choices 2. Data Input 3. Print Reports 4. DOS Utilities 5. Backups Daily Sales Reports Friday Sales Meeting Monthly Customer Letters Exit Hard to understand Organized by user tasks.

Menus Consistency Pull-down Submenus (>) Form indicator (…) Pop-up With operating environment Within project Pull-down Name, Action Shortcut keys Access keys (&File, File) Breaks/groups (-) Dimmed option Check mark Submenus (>) Logical groupings Tradeoff: length v depth Form indicator (…) Pop-up Miniature form Tied to location/pointer Right-mouse button Attribute settings Modal (keeps focus) or not

Queries Queries are used to automatically look up data. e.g., Customer name e.g., Product description Be very careful when using queries. Each form should store data in only one table. For multiple tables, use a subform or separate forms. Usually Lock the look up data so it cannot be changed accidentally.

Form Query Example Customer Order 1234 SaleID 17 CustomerID 7/25/01 Clerk enters a CustomerID. Stored in the Order table. Query joins Sale and Customer. Automatically matches the CustomerID. Matching name is displayed on the form. Do not include the join column (CustomerID) from the look up table (Customer). Customer Order 1234 SaleID 17 CustomerID 7/25/01 Date Carly Embry

Form Query: Underlying Tables Customer Order 1234 SaleID 17 CustomerID 7/25/01 Date Carly Embry Data entry Data display Sale Customer SaleID CustomerID Sdate 1232 23 7/24/01 1233 74 7/24/01 1234 17 7/25/01 CustomerID First Last 15 Connie Fisher 16 Rosie Wade 17 Carly Embry Query Join

Form Properties (Some) Data Base Table / Query Filters Sort Integrity Edits Additions, Deletions Locks Other Pop-up menus Menu Bar Help Format Caption Scroll Bars Record Selectors Navigation Buttons Size and Centering Background/Pictures Colors Tab Order

Controls on Forms (Basic) Drop down list or combo box List box Label Text box Last Name Clothing Shoes Electronics Country Payment Method Options Credit Card Check Cash Sales x Gift wrap Gift card Monogram x Command button Check box Option button

Pictures Employee Name: Che Zhang ID: 3354 Phone: 222-111-1524 . . . Background pictures Unbound, unchanging. Stored with the form. Keep edit screen readable. Sizing (zoom, scale, clip). Pictures stored as data Bound to a data column. Define column as object. Tie to scanner or graphics package through OLE. Beware of data size Resolution Number of colors User machine capabilities. Employee Name: Che Zhang ID: 3354 Phone: 222-111-1524 . . . Photo:

Basic Controls Option Group (single response) Label Text Box Command Button Combo Box (click arrow to open) List Box (always open)

Combo & List Boxes User selects from a list Combo box can enter new data, or restrict to list. Two basic uses: Insert a value into a table Choose from a list of preset options, e.g. gender. Select from a different table, e.g., choose a customer. Find the data record in this form that matches the choice. Be careful! Many systems do not distinguish between the two uses (enter data and search). Example when you want to use data entry: On a sales form, use a combo box for customer. It takes a value from the Customer table and inserts the ID into the Sale table. Example when you want to use a search: On a Customer edit form, you might use a combo to search the Customer table. Be sure the combo is not bound to the table! Probably need to write code for search.

Combo Box Properties Name CustomerID ControlSource CustomerID Format DecimalPlaces Auto InputMask RowSource/Type Table/Query RowSource SELECT . . . ColumnCount 4 ColumnHeads No ColumnWidths . . . BoundColumn 1 ControlSource sets the column to receive the choice (in the Sale table) RowSource generates the list of data to display. Uses standard SQL. Note 4 columns displayed. First column is the one to store in the data table. SELECT Customer.CustomerID, Customer.LastName, Customer.FirstName, Customer.Phone FROM Customer ORDER BY Customer.LastName;

Combo Box Sources Microsoft Access supports three methods: Fixed list. Query from a table. Defined function. With some systems (e.g., Visual Basic), you write code to generate each list entry. You might use a fixed list for simple lists like “male”, “female”, “unknown”. It is better to query from a table, even for simple lists. Use a one column table. Easier to add to a table than to change a combo box. Useful feature of list combo box. The Row Source property is a text string. This string can be generated by code. List entries can be changed in response to user actions. Programmed function. For straightforward cases, it is easier to use a fixed list and just change the text. More complex cases, you can write a subroutine that generates the list choices following a specific format.

Controls on Forms (Complex) Calendar Common Tab Grid Calendar Gauge Slider Spin Box Additional Purchase Create your own (C++) Tab Grid Gauge Slider Spin box

Charts Sale 1 Sale 2 Sale 3 Total Sales Merchandise Animal Build a query that generates the data to be graphed. Numeric data Individual series Aggregate data Labels Columns to link to form. Summary chart--unlinked. Insert chart. Set chart type. Set up data and labels. Set chart properties. Verify size. Sale 2 Merchandise Animal Sale 3 Merchandise Animal Total Sales Merchandise Animals

Multiple Forms Sale Edit Customer FirstName: Mary LastName: Jones Address: 123 Oaxaca Ave. Edit Customer FirstName: Mary LastName: Jones Address: 123 Oaxaca Ave. City: Los Angeles ZipCode: 90086 Gender: Female Age: 20 AccountBalance: $150 Animals Purchased Merchandise

Multiple Forms Animal AnimalID Sale =AnimalID from Animal form Using data on other forms The forms object collection Forms![FormName]![Control] Subtotals and subforms The form property Forms![MainForm]![SubForm].Form![Control] Multi-page v Separate forms Same recordset Screen size Side-by-side AnimalID Sale =AnimalID from Animal form ItemsSold - - - - - - - Subtotal=Sum(Price*Quantity) =Forms!Sale!ItemsSold.Form!Subtotal Subtotal Tax =Subtotal*[TaxRate] OrderTotal =Subtotal+Tax

Integrity Avoid relying on forms Set integrity conditions in table definitions Be sure to set referential integrity (relationships) Use forms to make it easy to enter quality data Combo/list boxes Menus Pop-up forms Ties to related forms Data transfer across forms Computations Error checking & trapping Controls Security rights Data formats Data entry Round-off Selectivity Visible Enabled & Locked Example: no production change after item is sold. User assistance Tool tips Status bar Menu Help--context sensitive

Large Projects Switchboard form Order Assembly form form Backorder Customer Order Assembly Design Standards Templates Colors, layout Titles Actions, common buttons Naming convention is crucial Forms Controls Event procedures Variables Team Coordination Menu design Within a form/standards Across an application Event / action diagrams State diagram Scenario diagram/messages Order form Assembly form Backorder Notice & Form Order form item not available Item# Customer# large customer Customer Discount

Objects Market Pricing Object Customer Order object Scenario diagrams Properties Events Messages Messages are usually initiated by calling exposed functions in an object Data can be passed directly, or made available by exposing properties Message: compute discount using Customer ID & Order size Market Pricing Object Customer Order object Message: discount pct

International Attributes Language Character sets and punctuation marks Sorting Data formats Date Time Metric v English Currency symbol and format Separators (decimal, . . .) Phone numbers Separators International code prefix Postal codes National ID Numbers

Direct Manipulation of Objects Current Choices Kennel/ Orders Customer Bird Cat Tabby Dog Fish Mammal Reptile Spider Brown Lab A graphical approach. Minimize data entry. Drag and drop objects (blue arrows).

Creating a Graphical Approach Get the hardware. Images: Scanners Sound: Microphone and Sound card Video: Camera and capture card Lots of disk space. High speed processors. Add an object column to your table definition. Design the screens. Be creative. Get user input. Make the user’s job easier. Avoid using graphics just for show. Double-click Drag-and-drop Programming!

Oracle Forms Use List of Values (LOV) instead of select boxes.

Oracle Forms Designer

Oracle Forms: Sales Oracle provides minimal support for updateable queries, so several items are grayed out to indicate they cannot be changed here.

Oracle Forms: Sales Design Two new data blocks are used for the repeating sections.

Oracle Forms Design Hints Displaying non-updateable data from other tables is tricky. In Master/Sale set: DML Data Target Type = Table DML Data Target Name = Sale For SaleID, set PrimaryKey = Yes Add the other tables Query Data Source Type = Table Query Data Source Name (parentheses are critical!) (SELECT Sale.columns, Customer.Columns, Employee.Columns FROM Sale, Customer, Employee WHERE (Sale.CustomerID = Customer.CustomerID) AND (Sale.EmployeeID = Employee.EmployeeID))

Oracle Forms Hints Add non-updateable columns by hand. Use aliases in the query to ensure all column names are unique. Then set properties: General – Name: cLastName Functional – Enabled No (optional but clearer to the user) Database – Database Item Yes Database – Column Name cLastName Database – Query Only Yes Database – Insert Allowed No Database – Update Allowed No

Report Design Report usage/user needs. Report layout choices. Tabular Columns/Subgroups Charts/graphs Paper sizes. Printer constraints. How often is it generated? Events that trigger report? How large is the report? Number of copies? Colors? Security controls Distribution list Unique numbering Concealed/non-printed data Secured printers Transmission limits Print queue controls Output concerns Typefaces Readability Size User disabilities OCR needs

Terminology Facing pages (portrait) Serif (Times New Roman) Page Layout Landscape v. portrait Margins Gutter (binding space) Typefaces Serif (Times New Roman) Sans-serif (Arial) Ornamental Fixed width Font size common: 10 - 12 point 72 points approx. 1 inch pica (1/6 inch) (12 points) Facing pages (portrait) gutter margins Landscape Alignment marks for color separations.

Report Types: Tabular

Report Types: Labels

Report Types Column. Column with groups.

Report Layout Report Header Page Header Group Header1 Group Footer1 . . . Detail Group Footer2 Group Footer1 Page Footer Report Footer

Report Layout/Common Use Report Header Title pages that are printed one time for entire report. Page Header Title lines or page notes that are printed at the top of every page. Group Header Data for a group (e.g., Order) and headings for the detail section. Detail Innermost data. Group Footer Subtotals for the group. Page Footer Printed at the bottom of every page--page totals or page numbers and notes. Report Footer Printed one time at the end of the report. Summary notes, overall totals and graphs for entire data set.

Report Layout/Groups Report of Orders Often use groups/breaks for one-to-many relationships. Use a query to join all necessary tables. Can include all columns. Use query to create computed columns (e.g., Extended:Price*Quantity). Avoid creating aggregates or subtotals in the query. Each one-to-many relationship becomes a new subgroup. Customer(C#, Name, …) Order(O#, C#, Odate, …) OrderItem(O#, Item#, Qty, …) Report of Orders Rpt footer: graph orders by customer Group1: Customer H1: Customer name, address, … F1: Customer total orders: Group2: Order H2: Order#, Odate, Salesperson. F2: Order total: Sum(Extended) Detail: Item#, Qty, Extended

Report Computations Query Report Same row computations. Extended=Price*Quantity Report Group subtotals. Page and report totals. Mixed, e.g., commission = rate * total Scope depends on location Group footer: subtotal Page footer: page total Report footer: report total

Report Graphs Graphs Separate query. Detail Subtotals and totals Locate in detail or group footer section. Avoid aggregation and groups in query. Include column that links to detail query in report. Subtotals and totals Typically located in report footer or header. Compare group totals Relies on Group By and aggregation. Be sure query groups match report groups.

Report Graph for Group

Oracle Report Writer: Preview

Oracle Report Writer: Design

Oracle Reports: Data View The data view can be used to create reports with complex data structures. It is primarily used for master/detail reports. In this example, each supplier can be sent many purchase orders, which each contain many items being ordered. The report can produce group breaks on all three sections.

Application Features Switchboard Application organization Menu Toolbar Customer Report Application organization Menu Toolbar Help Transactions Improving forms Customized reports Distributing Applications File Edit Help File Edit Help File Edit Help File Edit Help Sales Report Switchboard

Application Design Customer Order Customer Form Order Form Bad design: 1592 Jane Doe 333 Elm St. Order Customer: 1592 Jane Doe 333 Elm St. Customer Form Order Form Bad design: Enter data twice. Poor design: Memorize data (ID) on one form to enter on second. Better design: Automatically transfer data across forms.

Application Importance User interface Make users’ jobs easier. Tie input forms and reports. Automate basic tasks Tie to external data collection devices. Help system. Ensure data integrity Validate data. Perform computations. Verify totals. Control user access. Maintain related transactions. Backup and recovery. Decision Support Monitoring of events. Analysis, Graphs, Reports. Statistical analysis and optimization. Forecasts and simulation. Linking to other software. Expert Systems & Intelligence Logic and forward chaining. Analysis and decisions in code. Databases of cases, situations and solutions.

Application Organization Organized by user needs. Identify user. Outline tasks. Organize forms and reports. Direct users to tasks. Potential drawbacks Too many layers makes it difficult for users to find anything. Poor organization confuses users and requires additional support and training. Build forms and reports. Start with a core concept. Identify most important features. Get them correct. Add features, forms and reports. Issue application updates--number and date! Use menu stubs for incomplete and future work. Make them invisible to the user with the Visible property. Be sure they are disabled.

Application Structure Forms and Reports Visual Basic Internet Oracle Forms Front end Middle Tier (Optional) Business logic Rules If x > 10,000 Then Else End If Database Oracle SQL Server DB2 Access Back end

User Orientation Database application is a model of the organization. Applications based on user jobs. Flexibility and user control. Application organization User tasks. User control over sequence. Forms Minimize user entry. Anticipation. Reports Easy access from forms. User selection of scope and conditions or filters.

Initial Menu / Switchboard Starting point for users. Identify the user. From network if possible. Separate log in if needed. Customized for users. Hide restricted options. Different forms as needed. Avoid cluttered screens. Use graphics and color to enhance the presentation. Limit the number of options.

Switchboard Uses Acts as a directory for the application. Identifies users. Contains startup and shutdown code. Can preload forms in background. Make them invisible. Speed up later usage. Can initiate transaction and security logs. Can establish network connections. Contains copyright and usage notes.

Sally’s Pet Store: Poor Organization Order Merchandise Item Receive Merchandise Item Sell Merchandise Item What is wrong? Get Customer Data Focus needs to be at higher level (Order, Receipt, Sale); not Item.A You cannot go from Order to Receipt. You cannot go from Receipt to Sale. You need to get customer data before recording the sale.

Sally’s Pet Store: Better Organization Supplier Customer Orders Receipt special orders Sale Inventory Items More links--usually as buttons. Separate sales from orders, except for special orders.

Sally’s Pet Store: Initial VTOC Sales Sale Animal Sale Merchandise Animals Customers Animal Health Animal Genealogy Customer Receipts Suppliers Supplier Payments Purchase Animals Purchase Merchandise Inventory Sales Report Accounting Cash Flow Marketing Accounts Payable Employees Accounts Receivable

Menus File Help Contents Search About Rolling Thunder File Help Edit Why a custom menu? Limit user actions. Simplify user interface. Add custom actions. Menus can be activated by keystrokes. Accessibility Touch-typists and heads-down data entry. Sometimes need different menus for each form. File Help Contents Search About Rolling Thunder File Help Edit Add Customer Delete Customer Ctrl+D Modify Customer Data

Creating Menus View | Toolbars | Customize Drag and Drop Multilevel menu. Sublevels/hierarchy. Each level is a separate menu with its own name. Menu choices Each entry has a name. Access key: & (e.g., &File). Status Bar Text Actions Submenu. Run any code.

Toolbars Switchboard Weekly Sales Analysis Build graphs Print reports Why toolbars? Single click for complex actions. Commands available across the application / shortcuts. Position and customization by user. Toolbar components Button Text Icon/graphic (bitmap) Tool Tip Status Bar description Action Identify report Ask for single or multiple pages. Preview or print. Switchboard Weekly Sales Analysis Build graphs Print reports Export data to spreadsheet

Menus and Toolbars

Creating Toolbars View | Toolbars | New Customizing Add new button. Select from DBMS list. Bring up query/form/report. Run code. Change icon. Modify existing icon. Replace icon. Create your own icon and paste it on the button. Place text label on button. Tool tips are vital. Status bar for description.

Icons 16 by 16 pixels 16 colors Outline in black Bright and shaded Dither to mix colors Outline in black

Activating Toolbars and Menus Install a menu Form: Attach a bar using the form properties. Code On Activate On Deactivate Modify from code Add or remove options Enable/Disable (dim) Set myBar = CommandBars(”Custom1") If user = ”Clerk" Then myBar.Visible = True Else CommandBars(”Database").Reset myBar.Enabled = False End If With myBar .Controls.Add Type:=msoControlButton, Id:=3 .Controls(1).Enabled = False End With myBar.Visible = True

Help Sally’s Pet Store--Contents On-line help replaces manuals Context sensitive: Pressing F1 key provides information on topic with current focus Hypertext links to related topics Sequential topics Descriptions Examples Definitions / Glossary Contents / overview Index / keywords Full-text search Windows 95 & Win-NT Sally’s Pet Store--Contents Copyright Notice The Firm Introduction Processes Entering Data Sales Animal Health Breeds (and other terms)

HTML Help Get the Microsoft HTML Help Workshop: http://msdn.microsoft.com/library/tools/htmlhelp/ Create each of the following Help project files Use separate directory HTML topic files Standard HTML with some additions for keywords Topic Header and Text File Graphics and multimedia Avoid monster sizes Contents files Can auto-generate from heading tags (<H1>, <H2>, …) Index files Use Help workshop to set keywords within each topic

HTML Help Workshop

HTML Project Hints Project Options Project Title Default file (first page) Can create new files with File - New Be sure to Add/Remove Topic files to project list Edit – Compiler Information to add keywords to HTML file Concentrate on creating useful help content On large projects, hire/train someone to manage help Add useful features Keep content up to date Manage/organize all the files

Context-Sensitive Help Set the help file name in the form properties. Set the topic number (Context Id) for each form or control.

Context Sensitive HTML Help Create a Topic file for pop-up topics Create a header file to link the topic names to numbers Use HTML API to set the filenames .topic Filename1 Description .topic Filename2 … #define Filename1 10000 #define Filename2 20000 …

Appendix: Oracle PL/SQL: Data Types Primary Data Types NUMBER(precision, scale) precision: Number of digits scale: Round-off point NUMBER(7,4): 123.4567 INTEGER Default: NUMBER(4) BOOLEAN Yes/No CHAR Fixed length string VARCHAR2 Variable length string LONG, LONG RAW Binary data DATE

Appendix: Oracle PL/SQL Structure CREATE OR REPLACE PACKAGE myPackage AS PROCEDURE myProcedure(oldProjectID IN NUMBER); END myPackage; CREATE OR REPLACE PACKAGE BODY myPackage AS DECLARE myGlobalVar NUMBER; PROCEDURE myProcedure(oldProjectID IN NUMBER) IS DELCARE myLocalVar NUMBER; BEGIN myLocalVar := oldProjectID; IF END IF COMMIT; END myProcedure; End myPackage;

Appendix: PL/SQL Operators

Appendix: PL/SQL IF-THEN-ELSE-ELSEIF DECLARE X NUMBER(10,2); BEGIN -- retrieve the balance IF (BALANCE > 0) THEN X = BALANCE*1.10; ELSE X = 0.0; END IF; END; Use ELSEIF for case statements. IF (ACCOUNT = ‘P’) THEN -- do personal accounts ELSEIF (ACCOUNT = ‘C’) THEN -- do corporate accounts ELSEIF (ACCOUNT = ‘S’) THEN -- do small business ELSE -- handle error END IF; Watch the semicolons!

Appendix: PL/SQL Loops (Start statement) LOOP … EXIT; EXIT WHEN (condition); END LOOP; WHILE (condition) LOOP … END LOOP; FOR (variable) IN low...high LOOP … END LOOP;

Appendix: Procedures or Subroutines PROCEDURE DropOldAccounts (CutDate DATE) IS -- local variables are defined here BEGIN -- First copy the data to a backup table INSERT INTO OldAccounts SELECT * FROM Account WHERE AccountID NOT IN (SELECT AccountID FROM Order WHERE Odate > CutDate); -- Copy additional tables… -- Delete from Account automatically cascades to others DELETE FROM Account WHERE AccountID NOT IN END DropOldAccounts;

Appendix: SQL Cursors DECLARE CURSOR c1 IS SELECT Name, Salary, DateHired FROM Employee; varTotal Employee.Salary%TYPE; BEGIN varTotal = 0; OPEN c1; FOR recEmp in c1 LOOP varTotal := varTotal + recEmp.Salary; END LOOP; CLOSE c1; -- Now do something with the varTotal END;

Appendix: Error Handling PROCEDURE myProc ( ) IS DECLARE -- declare all local variables BEGIN -- SQL statements here EXCEPTION WHEN OTHERS THEN -- you can specify a particular error -- but OTHERS captures all errors -- PL/SQL code to execute if an error arises END myProc;

Database Management Systems Chapter 7 Database Integrity and Transactions

Programming Environment DBMS Tables Create code (1) Within the query system (2) In forms and reports (3) Hosted in external programs Queries (1) If ( . . ) Then SELECT . . . Else . . . UPDATE . . . End If External Program Forms & Reports (3) C++ if (. . .) { // embed SQL SELECT … } (2) If (Click) Then MsgBox . . . End If

User-Defined Function CREATE FUNCTION EstimateCosts (ListPrice Currency, ItemCategory VarChar) RETURNS Currency BEGIN IF (ItemCategory = ‘Clothing’) THEN RETURN ListPrice * 0.5 ELSE RETURN ListPrice * 0.75 END IF END

Function to Perform Conditional Update CREATE FUNCTION IncreaseSalary (EmpID INTEGER, Amt CURRENCY) RETURNS CURRENCY BEGIN IF (Amt > 50000) THEN RETURN -1 -- error flag END UPDATE Employee SET Salary = Salary + Amt WHERE EmployeeID = EmpID; RETURN Amt

Looking Up Data CREATE FUNCTION IncreaseSalary (EmpID INTEGER, Amt CURRENCY) RETURNS CURRENCY DECLARE CURRENCY MaxAmount; BEGIN SELECT MaxRaise INTO MaxAmount FROM CompanyLimits WHERE LimitName = ‘Raise’; IF (Amt > 50000) THEN RETURN -1 -- error flag END UPDATE Employee SET Salary = Salary + Amt WHERE EmployeeID = EmpID; RETURN Amt;

Data Trigger Events INSERT DELETE UPDATE BEFORE AFTER Oracle additions: Tables ALTER, CREATE, DROP User LOGOFF, LOGON Database SERVERERROR, SHUTDOWN, STARTUP

Statement v. Row Triggers UPDATE Employee SET Salary = Salary + 10000 WHERE EmployeeID=442 OR EmployeeID=558 SQL After Update On table Before Update On table Triggers for overall table Update Row 442 After Update Row 442 … other rows Before Update Row 442 time Triggers for each row

Data Trigger Example CREATE TRIGGER LogSalaryChanges AFTER UPDATE OF Salary ON Employee REFERENCING OLD ROW as oldrow NEW ROW AS newrow FOR EACH ROW INSERT INTO SalaryChanges (EmpID, ChangeDate, User, OldValue, NewValue) VALUES (newrow.EmployeeID, CURRENT_TIMESTAMP, CURRENT_USER, oldrow.Salary, newrow.Salary);

Canceling Data Changes in Triggers CREATE TRIGGER TestDeletePresident BEFORE DELETE ON Employee REFERENCING OLD ROW AS oldrow FOR EACH ROW WHEN (oldrow.Title = ‘President’) SIGNAL _CANNOT_DELETE_PRES;

Cascading Triggers Sale(SaleID, SaleDate, …) SaleItem(SaleID, ItemID, Quantity, …) AFTER INSERT UPDATE Inventory SET QOH = QOH – newrow.Quantity Inventory(ItemID, QOH, …) AFTER UPDATE WHEN newrow.QOH < newrow.Reorder INSERT {new order} INSERT {new OrderItem} Order(OrderID, OrderDate, …) OrderItem(OrderID, ItemID, Quantity, …)

Trigger Loop Employee(EID, Salary) AFTER UPDATE IF newrow.Salary > 100000 THEN Add Bonus END BonusPaid(EID, BonusDate, Amount) AFTER UPDATE Or INSERT IF newrow.Bonus > 50000 THEN Reduce Bonus Add Options END StockOptions(EID, OptionDate, Amount, SalaryAdj) AFTER UPDATE Or INSERT IF newrow.Amount > 100000 THEN Reduce Salary END

Transactions Some transactions result in multiple changes. These changes must all be completed successfully, or the group must fail. Protection for hardware and communication failures. example: bank customer transfers money from savings account to checking account. Decrease savings balance Increase checking balance Problem if one transaction and machine crashes. Possibly: give users a chance to reverse/undo a transaction. Performance gain by executing transactions as a block. Savings Accounts Inez: 5340.92 4340.92 $1000 Checking Accounts Inez: 1424.27 Transaction 1. Subtract $1000 from savings. (machine crashes) 2. Add $1000 to Checking. (money disappears)

Defining Transactions The computer needs to be told which changes must be grouped into a transaction. Turn on transaction processing. Signify a transaction start. Signify the end. Success: save all changes Failure: cancel all changes Must be set in module code Commit Rollback

SQL Transaction Code CREATE FUNCTION TransferMoney(Amount Currency, AccountFrom Number, AccountTo Number) RETURNS NUMBER curBalance Currency; BEGIN DECLARE HANDLER FOR SQLEXCEPTION ROLLBACK; Return -2; -- flag for completion error END; START TRANSACTION; -- optional SELECT CurrentBalance INTO curBalance FROM Accounts WHERE (AccountID = AccountFrom); IF (curBalance < Amount) THEN RETURN -1; -- flag for insufficient funds END IF UPDATE Accounts SET CurrentBalance = CurrentBalance – Amount WHERE AccountID = AccountFrom; SET CurrentBalance = CurrentBalance + Amount WHERE AccountID = AccountTo; COMMIT; RETURN 0; -- flag for success

SAVEPOINT SAVEPOINT StartOptional start commit Required elements Risky steps time Partial rollback START TRANSACTION; SELECT … UPDATE … SAVEPOINT StartOptional; If error THEN ROLLBACK TO SAVEPOINT StartOptional; END IF COMMIT;

Concurrent Access Concurrent Access Force sequential Two processes Multiple users or processes changing the same data at the same time. Final data will be wrong! Force sequential Locking Delayed, batch updates Two processes Receive payment ($200) Place new order ($150) Initial balance $800 Result should be $800 -200 + 150 = $750 Interference result is either $600 or $950 Customers Receive Payment Place New Order ID Balance Jones $800 $600 $950 1) Read balance 800 2) Subtract pmt -200 4) Save new bal. 600 3) Read balance 800 5) Add order 150 6) Write balance 950

Pessimistic Locks: Serialization One answer to concurrent access is to prevent it. When a transaction needs to alter data, it places a SERIALIZABLE lock on the data used, so no other transactions can even read the data until the first transaction is completed. SET TRANSACTION SERIALIZABLE, READ WRITE Customers Receive Payment Place New Order ID Balance Jones $800 $600 1) Read balance 800 2) Subtract pmt -200 4) Save new bal. 600 3) Read balance Receive error message that it is locked.

SQL Pessimistic Lock CREATE FUNCTION ReceivePayment ( AccountID NUMBER, Amount Currency) RETURNS NUMBER BEGIN DECLARE HANDLER FOR SQLEXCEPTION ROLLBACK; RETURN -2; END SET TRANSACTION SERIALIZABLE, READ WRITE; UPDATE Accounts SET AccountBalance = AccountBalance - Amount WHERE AccountNumber = AccountID; COMMIT; RETURN 0;

Deadlock Deadlock Many solutions Data A Data B 1) Lock Data A 3) Wait for Data B Deadlock Two (or more) processes have placed locks on data and are waiting for the other’s data. Many solutions Random wait time Global lock manager Two-phase commit - messages Data A Data B 2) Lock Data B 4) Wait for Data A

Lock Manager

Optimistic Locks Assume that collisions are rare Improved performance, fewer resources Allow all code to read any data (no locks) When code tries to write a new value Check to see if the existing value is different from the one you were given earlier If it is different, someone changed the database before you finished, so it is a collision--raise an error Reread the value and try again

Optimistic Locks for Simple Update (1) Read the balance (2) Add the new order value (3) Write the new balance (4) Check for errors (5) If there are errors, go back to step (1).

Optimistic Locks with SQL CREATE FUNCTION ReceivePayment ( AccountID NUMBER, Amount Currency) RETURNS NUMBER oldAmount Currency; testEnd Boolean = FALSE; BEGIN DO UNTIL testEnd = TRUE SELECT Amount INTO oldAmount WHERE AccountNumber = AccountID; … UPDATE Accounts SET AccountBalance = AccountBalance - Amount WHERE AccountNumber = AccountID AND Amount = oldAmount; COMMIT; IF SQLCODE = 0 and nrows > 0 THEN testEnd = TRUE; RETURN 0; END IF -- keep a counter to avoid infinite loops END

ACID Transactions Atomicity: all changes succeed or fail together. Consistency: all data remain internally consistent (when committed) and can be validated by application checks. Isolation: The system gives each transaction the perception that it is running in isolation. There are no concurrent access issues. Durability: When a transaction is committed, all changes are permanently saved even if there is a hardware or system failure.

SQL 99/200x Isolation Levels READ UNCOMMITTED Problem: might read dirty data that is rolled back Restriction: not allowed to save any data READ COMMITTED Problem: Second transaction might change or delete data Restriction: Need optimistic concurrency handling REPEATABLE READ Problem: Phantom rows SERIALIZABLE Provides same level of control as if all transactions were run sequentially. But, still might encounter locks and deadlocks

Phantom Rows ItemID QOH Price 111 5 15 113 6 7 117 12 30 118 4 119 22 SELECT SUM(QOH) FROM Inventory WHERE Price BETWEEN 10 and 20 ItemID QOH Price 111 5 15 113 6 7 117 12 30 118 4 119 22 120 8 17 Included in first query UPDATE Inventory SET Price = Price/2 WHERE … Additional rows will be included in the second query SELECT SUM(QOH) FROM Inventory WHERE Price BETWEEN 10 and 20

Generated Keys Create an order for a new customer: Customer Table CustomerID, Name, … Create an order for a new customer: (1) Create new key for CustomerID (2) INSERT row into Customer (3) Create key for new OrderID (4) INSERT row into Order Order Table OrderID, CustomerID, …

Methods to Generate Keys The DBMS generates key values automatically whenever a row is inserted into a table. Drawback: it is tricky to get the generated value to use it in a second table. A separate key generator is called by a programmer to create a new key for a specified table. Drawback: programmers have to write code to generate a key for every table and each row insertion. Overall drawbacks: neither method is likely to be transportable. If you change the DBMS, you will have to rewrite the procedures to generate keys.

Auto-Generated Keys Create an order for a new customer: INSERT row into Customer Get the key value that was generated Verify the key value is correct. How? INSERT row into Order Major problem: Step 2 requires that the DBMS return the key value that was most recently generated. How do you know it is the right value? What happens if two transactions generate keys at almost the same time on the same table?

Key-Generation Routine Create an order for a new customer: Generate a key for CustomerID INSERT row into Customer Generate a key for OrderID INSERT row into Order This method ensures that unique keys are generated, and that you can use the keys in multiple tables because you know the value. But, none of it is automatic. It always requires procedures and sometimes data triggers.

Database Cursors Purpose Why? Track through table or query one row at a time. Data cursor is a pointer to active row. Why? Performance. SQL cannot do everything. Complex calculations. Compare multiple rows. Year Sales 1998 104,321 1999 145,998 2000 276,004 2001 362,736 1998 104,321 1999 145,998 2000 276,004 2001 362,736

Database Cursor Program Structure DECLARE cursor1 CURSOR FOR SELECT AccountBalance FROM Customer; sumAccount, balance Currency; SQLSTATE Char(5); BEGIN sumAccount = 0; OPEN cursor1; WHILE (SQLSTATE = ‘00000’) FETCH cursor1 INTO balance; IF (SQLSTATE = ‘00000’) THEN sumAccount = sumAccount + balance; END IF END CLOSE cursor1; -- display the sumAccount or do a calculation

Cursor Positioning with FETCH DECLARE cursor2 SCROLL CURSOR FOR SELECT … OPEN cursor2; FETCH LAST FROM cursor2 INTO … Loop… FETCH PRIOR FROM cursor2 INTO … End loop CLOSE cursor2; FETCH positioning options: FETCH NEXT next row FETCH PRIOR prior row FETCH FIRST first row FETCH LAST last row FETCH ABSOLUTE 5 fifth row FETCH RELATIVE -3 back 3 rows

Problems with Multiple Users Original Data Modified Data Name Sales Alice 444,321 Carl 254,998 Donna 652,004 Ed 411,736 Name Sales Alice 444,321 Bob 333,229 Carl 254,998 Donna 652,004 Ed 411,736 New row is added--while code is running. The SQL standard can prevent this problem with the INSENSITIVE option: DECLARE cursor3 INSENSITIVE CURSOR FOR … But, this is an expensive approach, because the DBMS usually makes a copy of the data. Instead, avoid moving backwards.

Changing Data with Cursors DECLARE cursor1 CURSOR FOR SELECT Year, Sales, Gain FROM SalesTotal ORDER BY Year FOR UPDATE OF Gain; priorSales, curYear, curSales, curGain BEGIN priorSales = 0; OPEN cursor1; Loop: FETCH cursor1 INTO curYear, curSales, curGain UPDATE SalesTotal SET Gain = Sales – priorSales WHERE CURRENT OF cursor1; priorSales = curSales; Until end of rows CLOSE cursor1; COMMIT; END Year Sales Gain 2000 151,039 2001 179,332 2002 195,453 2003 221,883 2004 223,748

Dynamic Parameterized Cursor Queries DECLARE cursor2 CURSOR FOR SELECT ItemID, Description, Price FROM Inventory WHERE Price < :maxPrice; maxPrice Currency; BEGIN maxPrice = … -- from user or other query OPEN cursor2; -- runs query with current value Loop: -- Do something with the rows retrieved Until end of rows CLOSE cursor2; END Parameters enable you to control the rows retrieved dynamically from within the procedure code. The value is applied when the cursor is opened.

Sally’s Pet Store Inventory Inventory method 1: calculate the current quantity on hand by totaling all purchases and sales every time the total is needed. Drawback: performance Inventory method 2: keep a running balance in the inventory table and update it when an item is purchased or sold. Drawback: tricky code Also, you need an adjustment process for “inventory shrink”

Inventory QuantityOnHand Merchandise ItemID Description QuantityOnHand ListPrice Category Add items purchased Subtract items sold Adjust for shrink SaleItem SaleID ItemID Quantity SalePrice

Inventory Events Add a row. Delete a row. Update Quantity. For a new sale, a row is added to the SaleItem table. A sale or an item could be removed because of a clerical error or the customer changes his or her mind. A SaleItem row will be deleted. An item could be returned, or the quantity could be adjusted because of a counting error. The Quantity is updated in the SaleItem table. An item is entered incorrectly. ItemID is updated in the SaleItem table. SaleItem SaleID ItemID Quantity SalePrice Add a row. Delete a row. Update Quantity. Update ItemID.

New Sale: Insert SaleItem Row CREATE TRIGGER NewSaleItem AFTER INSERT ON SaleItem REFERENCING NEW ROW AS newrow FOR EACH ROW UPDATE Merchandise SET QuantityOnHand = QuantityOnHand – newrow.Quantity WHERE ItemID = newrow.ItemID;

Delete SaleItem Row CREATE TRIGGER DeleteSaleItem AFTER DELETE ON SaleItem REFERENCING OLD ROW AS oldrow FOR EACH ROW UPDATE Merchandise SET QuantityOnHand = QuantityOnHand + oldrow.Quantity WHERE ItemID = oldrow.ItemID;

Quantity Changed Event CREATE TRIGGER UpdateSaleItem AFTER UPDATE ON SaleItem REFERENCING OLD ROW AS oldrow NEW ROW AS newrow FOR EACH ROW UPDATE Merchandise SET QuantityOnHand = QuantityOnHand + oldrow.Quantity – newrow.Quantity WHERE ItemID = oldrow.ItemID;

ItemID or Quantity Changed Event CREATE TRIGGER UpdateSaleItem AFTER UPDATE ON SaleItem REFERENCING OLD ROW AS oldrow NEW ROW AS newrow FOR EACH ROW BEGIN UPDATE Merchandise SET QuantityOnHand = QuantityOnHand + oldRow.Quantity WHERE ItemID = oldrow.ItemID; SET QuantityOnHand = QuantityOnHand – newRow.Quantity WHERE ItemID = newrow.ItemID; COMMIT; END

Database Management Systems Chapter 8 Data Warehouses and Data Mining

Sequential Storage and Indexes We picture tables as simple rows and columns, but they cannot be stored this way. It takes too many operations to find an item. Insertions require reading and rewriting the entire table. ID LastName FirstName DateHired 1 Reeves Keith 1/29/98 2 Gibson Bill 3/31/98 3 Reasoner Katy 2/17/98 4 Hopkins Alan 2/8/98 5 James Leisha 1/6/98 6 Eaton Anissa 8/23/98 7 Farris Dustin 3/28/98 8 Carpenter Carlos 12/29/98 9 O'Connor Jessica 7/23/98 10 Shields Howard 7/13/98

Operations on Sequential Tables Read entire table Easy and fast Sequential retrieval Easy and fast for one order. Random Read/Sequential Very weak Probability of any row = 1/N 1,000,000 rows means 500,000 retrievals per lookup! Delete Easy Insert/Modify Row Prob. # Reads A 1/N 1 B 1/N 2 C 1/N 3 D 1/N 4 E 1/N 5 … 1/N i

Insert into Sequential Table ID LastName FirstName DateHired 8 Carpenter Carlos 12/29/98 6 Eaton Anissa 8/23/98 7 Farris Dustin 3/28/98 2 Gibson Bill 3/31/98 4 Hopkins Alan 2/8/98 5 James Leisha 1/6/98 9 O'Connor Jessica 7/23/98 3 Reasoner Katy 2/17/98 1 Reeves Keith 1/29/98 10 Shields Howard 7/13/98 Insert Inez: Find insert location. Copy top to new file. At insert location, add row. Copy rest of file. ID LastName FirstName DateHired 8 Carpenter Carlos 12/29/98 6 Eaton Anissa 8/23/98 7 Farris Dustin 3/28/98 2 Gibson Bill 3/31/98 11 Inez Maria 1/15/99 5 James Leisha 1/6/98 9 O'Connor Jessica 7/23/98 3 Reasoner Katy 2/17/98 1 Reeves Keith 1/29/98 10 Shields Howard 7/13/98

Binary Search Given a sorted list of names. How do you find Jones. Sequential search Jones = 10 lookups Average = 15/2 = 7.5 lookups Min = 1, Max = 14 Binary search Find midpoint (14 / 2) = 7 Jones > Goetz Jones < Kalida Jones > Inez Jones = Jones (4 lookups) Max = log2 (N) N = 1000 Max = 10 N = 1,000,000 Max = 20 Adams Brown Cadiz Dorfmann Eaton Farris 1 Goetz Hanson 3 Inez 4 Jones 2 Kalida Lomax Miranda Norman 14 entries

Pointers When data is stored on drive (or RAM). Operating System allocates space with a function call. Provides location/address. Physical address Virtual address (VSAM) Imaginary drive values mapped to physical locations. Relative address Distance from start of file. Other reference point. Address Data Volume Track Cylinder/Sector Byte Offset Drive Head Key value Address / pointer

Indexed Sequential Storage Address Common uses Large tables. Need many sequential lists. Some random search--with one or two key columns. Mostly replaced by B+-Tree. A11 A22 A32 A42 A47 A58 A63 A67 A78 A83 ID LastName FirstName DateHired 1 Reeves Keith 1/29/98 2 Gibson Bill 3/31/98 3 Reasoner Katy 2/17/98 4 Hopkins Alan 2/8/98 5 James Leisha 1/6/98 6 Eaton Anissa 8/23/98 7 Farris Dustin 3/28/98 8 Carpenter Carlos 12/29/98 9 O'Connor Jessica 7/23/98 10 Shields Howard 7/13/98 ID Pointer 1 A11 2 A22 3 A32 4 A42 5 A47 6 A58 7 A63 8 A67 9 A78 10 A83 LastName Pointer Carpenter A67 Eaton A58 Farris A63 Gibson A22 Hopkins A42 James A47 O'Connor A78 Reasoner A32 Reeves A11 Shields A83 Indexed for ID and LastName

Linked List Separate each element/key. Pointers to next element. Pointers to data. Starting point. 8 Carpenter Carlos 12/29/98 A67 2 Gibson Bill 3/31/98 A22 6 Eaton Anissa 8/23/98 A58 7 Farris Dustin 3/28/98 A63 Carpenter B87 B29 A67 Gibson B38 00 A22 Eaton B29 B71 A58 Farris B71 B38 A63

B-Tree < Key Data >= Store key values Utilize binary search (or better). Trees Nodes Root Leaf (node with no children) Levels / depth Degree (maximum number of children per node) < Key Data >= Hanson Dorfmann Kalida Brown Farriis Inez Miranda Adams Cadiz Eaton Goetz Inez Jones Lomax Norman A B C D E F G H I J K L M N

Index Options: Bitmaps and Statistics Bitmap index A compressed index designed for non-primary key columns. Bit-wise operations can be used to quickly match WHERE criteria. Analyze statistics By collecting statistics about the actual data within the index, the DBMS can optimize the search path. For example, if it knows that only a few rows match one of your search conditions in a table, it can apply that condition first, reducing the amount of work needed to join tables.

Problems with Indexes Each index must be updated when rows are inserted, deleted or modified. Changing one row of data in a table with many indexes can result in considerable time and resources to update all of the indexes. Steps to improve performance Index primary keys Index common join columns (usually primary keys) Index columns that are searched regularly Use a performance analyzer

Data Warehouse Predefined reports Interactive data analysis Operations Daily data transfer OLTP Database 3NF tables Data warehouse Star configuration Flat files

Data Warehouse Goals Existing databases optimized for Online Transaction Processing (OLTP) Online Analytical Processing (OLAP) requires fast retrievals, and only bulk writes. Different goals require different storage, so build separate dta warehouse to use for queries. Extraction, Transformation, Transportation (ETT) Data analysis Ad hoc queries Statistical analysis Data mining (specialized automated tools)

Extraction, Transformation, and Transportation (ETT) Customers Convert Client to Customer Apply standard product numbers Convert currencies Fix region codes Data warehouse: All data must be consistent. Transaction data from diverse systems.

OLTP v. OLAP

Multidimensional Cube Pet Store Item Sales Amount = Quantity*Sale Price Category Customer Location Time Sale Date

Sales Date: Time Hierarchy Year Roll-up To get higher-level totals Levels Quarter Month Drill-down To get lower-level details Week Day

Amount=SalePrice*Quantity Star Design Dimension Tables Products Sales Date Fact Table Sales Quantity Amount=SalePrice*Quantity Customer Location

Snowflake Design Dimension tables can join to other dimension tables. City CityID ZipCode City State Merchandise Sale ItemID Description QuantityOnHand ListPrice Category SaleID SaleDate EmployeeID CustomerID SalesTax Customer CustomerID Phone FirstName LastName Address ZipCode CityID OLAPItems SaleID ItemID Quantity SalePrice Amount Dimension tables can join to other dimension tables.

OLAP Computation Issues Compute Quantity*Price in base query, then add to get $23.00 If you use Calculated Measure in the Cube, it will add first and multiply second to get $45.00, which is wrong.

OLAP Data Browsing

Microsoft Pivot Table

OLAP in SQL 99 GROUP BY two columns Category Month Amount Bird 1 $135.00 2 $45.00 3 $202.50 6 $67.50 7 $90.00 9 Cat $396.00 $113.85 $443.70 4 $2.25 GROUP BY two columns Gives you totals for each month within each category. You do not get super-aggregate totals for the category, or the month, or the overall total. SELECT Category, Month(SaleDate) AS Month, Sum(Quantity*SalePrice) AS Amount FROM Sale INNER JOIN (Merchandise INNER JOIN SaleItem ON Merchandise.ItemID = SaleItem.ItemID) ON Sale.SaleID = SaleItem.SaleID GROUP BY Category, Month(SaleDate);

SQL ROLLUP SELECT Category, Month…, Sum … FROM … GROUP BY ROLLUP (Category, Month...) Category Month Amount Bird 1 135.00 Bird 2 45.00 … Bird (null) 607.50 Cat 1 396.00 Cat 2 113.85 Cat (null) 1293.30 (null) (null) 8451.79

Missing Values Cause Problems If there are missing values in the groups, it can be difficult to identify the super-aggregate rows. Category Month Amount Bird 1 135.00 Bird 2 45.00 … Bird (null) 32.00 Bird (null) 607.50 Cat 1 396.00 Cat 2 113.85 Cat (null) 1293.30 (null) (null) 8451.79 Missing date Super-aggregate

GROUPING Function SELECT Category, Month…, Sum …, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY ROLLUP (Category, Month...) Category Month Amount Gc Gm Bird 1 135.00 0 0 Bird 2 45.00 0 0 … Bird (null) 32.00 0 0 Bird (null) 607.50 1 0 Cat 1 396.00 0 0 Cat 2 113.85 0 0 Cat (null) 1293.30 1 0 (null) (null) 8451.79 1 1

CUBE Option Category Month Amount Gc Gm Bird 1 135.00 0 0 SELECT Category, Month, Sum, GROUPING (Category) AS Gc, GROUPING (Month) AS Gm FROM … GROUP BY CUBE (Category, Month...) Category Month Amount Gc Gm Bird 1 135.00 0 0 Bird 2 45.00 0 0 … Bird (null) 32.00 0 0 Bird (null) 607.50 1 0 Cat 1 396.00 0 0 Cat 2 113.85 0 0 Cat (null) 1293.30 1 0 (null) 1 1358.8 0 1 (null) 2 1508.94 0 1 (null) 3 2362.68 0 1 (null) (null) 8451.79 1 1

GROUPING SETS: Hiding Details SELECT Category, Month, Sum FROM … GROUP BY GROUPING SETS ( ROLLUP (Category), ROLLUP (Month), ( ) ) Category Month Amount Bird (null) 607.50 Cat (null) 1293.30 … (null) 1 1358.8 (null) 2 1508.94 (null) 3 2362.68 (null) (null) 8451.79

SQL OLAP Analytical Functions VAR_POP variance VAR_SAMP STDDEV_POP standard deviation STDEV_SAMP COVAR_POP covariance COVAR_SAMP CORR correlation REGR_R2 regression r-square REGR_SLOPE regression data (many) REGR_INTERCEPT

SQL RANK Functions Jones 18,000 1 1 Smith 16,000 2 2 Black 16,000 2 2 SELECT Employee, SalesValue RANK() OVER (ORDER BY SalesValue DESC) AS rank DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense FROM Sales ORDER BY SalesValue DESC, Employee; Employee SalesValue rank dense Jones 18,000 1 1 Smith 16,000 2 2 Black 16,000 2 2 White 14,000 4 3 DENSE_RANK does not skip numbers

SQL OLAP Windows SELECT Category, SaleMonth, MonthAmount, AVG(MonthAmount) OVER (PARTITION BY Category ORDER BY SaleMonth ASC ROWS 2 PRECEDING) AS MA FROM qryOLAPSQL99 ORDER BY SaleMonth ASC; Category SaleMonth MonthAmount MA Bird 200101 1500.00 Bird 200102 1700.00 Bird 200103 2000.00 1600.00 Bird 200104 2500.00 1850.00 … Cat 200101 4000.00 Cat 200102 5000.00 Cat 200103 6000.00 4500.00 Cat 200104 7000.00 5500.00

Ranges: OVER Sum1 computes total from beginning through current row. SELECT SaleDate, Value SUM(Value) OVER (ORDER BY SaleDate) AS running_sum, SUM(Value) OVER (ORDER BY SaleDate RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum2, SUM (Value) OVER (ORDER BY SaleDate RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS remaining_sum; FROM … Sum1 computes total from beginning through current row. Sum2 does the same thing, but more explicitly lists the rows. Sum3 computes total from current row through end of query.

LAG and LEAD Functions LAG or LEAD: (Column, # rows, default) SELECT SaleDate, Value, LAG (Value 1,0) OVER (ORDER BY SaleDate) AS prior_day LEAD (Value 1, 0) OVER (ORDER BY SaleDate) AS next_day FROM … ORDER BY SaleDate Prior is 0 from default value SaleDate Value prior_day next_day 1/1/2003 1000 0 1500 1/2/2003 1500 1000 2000 1/3/2003 2000 1500 2300 … 1/31/2003 3500 3200 0 Not part of standard yet? But are in SQL Server and Oracle.

Data Mining Goal: To discover unknown relationships in the data that can be used to make better decisions. Transactions and operations Reports Queries Specific ad hoc questions Aggregate, compare, drill down OLAP Databases Unknown relationships Data Mining

Exploratory Analysis Data Mining usually works autonomously. Supervised/directed Unsupervised Often called a bottom-up approach that scans the data to find relationships Some statistical routines, but they are not sufficient Statistics relies on averages Sometimes the important data lies in more detailed pairs

Common Techniques Classification/Prediction/Regression Association Rules/Market Basket Analysis Clustering Data points Hierarchies Neural Networks Deviation Detection Sequential Analysis Time series events Websites Textual Analysis Spatial/Geographic Analysis

Classification Examples Which borrowers/loans are most likely to be successful? Which customers are most likely to want a new item? Which companies are likely to file bankruptcy? Which workers are likely to quit in the next six months? Which startup companies are likely to succeed? Which tax returns are fraudulent?

Classification Process Clearly identify the outcome/dependent variable. Identify potential variables that might affect the outcome. Supervised (modeler chooses) Unsupervised (system scans all/most) Use sample data to test and validate the model. System creates weights that link independent variables to outcome. Income Married Credit History Job Stability Success 50000 Yes Good 25000 Bad No 75000

Classification Techniques Regression Bayesian Networks Decision Trees (hierarchical) Neural Networks Genetic Algorithms Complications Some methods require categorical data Data size is still a problem

Association/Market Basket Examples What items are customers likely to buy together? What Web pages are closely related? Others? Classic (early) example: Analysis of convenience store data showed customers often buy diapers and beer together. Importance: Consider putting the two together to increase cross-selling.

Association Details (two items) Rule evaluation (A implies B) Support for the rule is measured by the percentage of all transactions containing both items: P(A ∩ B) Confidence of the rule is measured by the transactions with A that also contain B: P(B | A) Lift is the potential gain attributed to the rule—the effect compared to other baskets without the effect. If it is greater than 1, the effect is positive: P(A ∩ B) / ( P(A) P(B) ) P(B|A)/P(B) Example: Diapers implies Beer Support: P(D ∩ B) = .6 P(D) = .7 P(B) = .5 Confidence: P(B|D) = .857 = P(D ∩ B)/P(D) = .6/.7 Lift: P(B|D) / P(B) = 1.714 = .857 / .5

Association Challenges If an item is rarely purchased, any other item bought with it seems important. So combine items into categories. Some relationships are obvious. Burger and fries. Some relationships are meaningless. Hardware store found that toilet rings sell well only when a new store first opens. But what does it mean? Item Freq. 1 “ nails 2% 2” nails 1% 3” nails 4” nails Lumber 50% Item Freq. Hardware 15% Dim. Lumber 20% Plywood Finish lumber

Cluster Analysis Large intercluster distance Examples Are there groups of customers? (If so, we can cross-sell.) Do the locations for our stores have elements in common? (So we can search for similar clusters for new locations.) Do our employees (by department?) have common characteristics? (So we can hire similar, or dissimilar, people.) Problem: Many dimensions and large datasets Large intercluster distance Small intracluster distance

Geographic/Location Examples Challenge: Map data, multiple overlays Customer location and sales comparisons Factory sites and cost Environmental effects Challenge: Map data, multiple overlays

Database Management Systems Chapter 9 Database Administration

Data Administration Data and information are valuable assets. There are many databases and applications in an organization. Someone has to be responsible for organizing, controlling, and sharing data. Data Administrator (DA)

Data Administrator (DA) Provide centralized control over the data. Data definition. Format Naming convention Data integration. Selection of DBMS. Act as data and database advocate. Application ideas. Decision support. Strategic uses. Coordinate data integrity, security, privacy, and control.

Database Administrator (DBA) Install and upgrade DBMS. Create user accounts and monitor security. In charge of backup and recovery of the database. Monitor and tune the database performance. Coordinate with DBMS vendor and plan for changes. Maintain DBMS-specific information for developers.

Database Structure Database The schema is a namespace often assigned to users so that table names do not have to be unique across the entire database. The catalog is a container with the goal of making it easier to find schema, but is probably not supported by any DBMS yet. Users and Permissions Catalog: (very rare) Schema Table Columns Data types Constraints Views Triggers Routines and Modules …

Metadata Data about data Example: a system table that contains a list of user tables. SQL standard uses the information_schema views that retrieve data from the definition_schema Information_Schema Examples (61 total views) Schemata Tables Domains Views Table_Privileges Referential_Constraints Check_Constraints Triggers Trigger_Table_Usage Parameters Routines SELECT Table_Name, Table_Type FROM Information_Schema.Tables WHERE table_name LIKE ‘Emp%’

Database Administration Planning Determine hardware and software needs. Design Estimate space requirements, estimate performance. Implementation Install software, create databases, transfer data. Operation Monitor performance, backup and recovery. Growth and Change Monitor and forecast storage needs. Security Create user accounts, monitor changes.

Database Planning Estimation Data storage requirements Time to develop Cost to develop Operations costs

Managing Database Design Teamwork Data standards Data repository Reusable objects CASE tools Networks / communication Subdividing projects Delivering in stages User needs / priorities Version upgrades Normalization by user views Distribute individual sections Combine sections Assign forms and reports

Database Implementation Standards for application programming. User interface. Programming standards. Layout and techniques. Variable & object definition. Test procedures. Data access and ownership. Loading databases. Backup and recovery plans. User and operator training.

Database Operation and Maintenance Monitoring usage Size and growth Performance / delays Security logs User problems Backup and recovery User support Help desk Training classes

Database Growth and Change Detect need for change Size and speed Structures / design Requests for additional data. Difficulties with queries. Usage patterns Forecasts Delays in implementing changes Time to recognize needs. Time to get agreement and approval. Time to install new hardware. Time to create / modify software.

Backup and Recovery Backups are crucial! Offsite storage! Changes OrdID Odate Amount ... 192 2/2/01 252.35 … 193 2/2/01 998.34 … Backups are crucial! Offsite storage! Scheduled backup. Regular intervals. Record time. Track backups. Journals / logs Checkpoint Rollback / Roll forward OrdID Odate Amount ... 192 2/2/01 252.35 … 193 2/2/01 998.34 … 194 2/2/01 77.23 ... Snapshot OrdID Odate Amount ... 192 2/2/01 252.35 … 193 2/2/01 998.34 … 194 2/2/01 77.23 … 195 2/2/01 101.52 … Journal/Log

Database Security and Privacy Physical security Protecting hardware Protecting software and data. Logical security Unauthorized disclosure Unauthorized modification Unauthorized withholding Security Threats Employees / Insiders Disgruntled employees “Terminated” employees Dial-up / home access Programmers Time bombs Trap doors Visitors Consultants Business partnerships Strategic sharing EDI Hackers--Internet

Data Privacy Who owns data? Customer rights. International complications. Do not release data to others. Do not read data unnecessarily. Report all infractions and problems. Privacy tradeoffs Marketing needs Government requests Employee management

Physical Security Hardware Data and software Disaster planning Preventing problems Fire prevention Site considerations Building design Hardware backup facilities Continuous backup (mirror sites) Hot sites Shell sites “Sister” agreements Telecommunication systems Personal computers Data and software Backups Off-site backups Personal computers Policies and procedures Network backup Disaster planning Write it down Train all new employees Test it once a year Telecommunications Allowable time between disaster and business survival limits.

Physical Security Provisions Backup data. Backup hardware. Disaster planning and testing. Prevention. Location. Fire monitoring and control. Control physical access.

Managerial Controls “Insiders” Consultants and Business alliances Hiring Termination Monitoring Job segmentation Physical access limitations Locks Guards and video monitoring Badges and tracking Consultants and Business alliances Limited data access Limited physical access Paired with employees

Logical Security Unauthorized disclosure. Unauthorized modification. Unauthorized withholding. Disclosure example Letting a competitor see the strategic marketing plans. Modification example Letting employees change their salary numbers. Withholding example Preventing a finance officer from retrieving data needed to get a bank loan.

User Identification User identification Accounts Passwords Individual Groups Passwords Do not use “real” words. Do not use personal (or pet) names. Include non-alphabetic characters. Use at least 6 (8) characters. Change it often. Too many passwords! Alternative identification Finger / hand print readers Voice Retina (blood vessel) scans DNA typing Hardware passwords The one-minute password. Card matched to computer. Best method for open networks / Internet.

Basic Security Ideas Limit access to hardware Monitor usage 3 5 phone company Limit access to hardware Physical locks. Video monitoring. Fire and environment monitors. Employee logs / cards. Dial-back modems Monitor usage Hardware logs. Access from network nodes. Software and data usage. Background checks Employees Consultants 2 Jones 1111 Smith 2222 Olsen 3333 Araha 4444 phone company 4 Dialback modem User calls modem Modem gets name, password Modem hangs up phone Modem calls back user Machine gets final password 1

Access Controls Operating system DBMS access controls Access to directories Read View / File scan Write Create Delete Access to files Edit DBMS usually needs most of these Assign by user or group. DBMS access controls Read Data Update Data Insert Data Delete Data Open / Run Read Design Modify Design Administer Owners and administrator Need separate user identification / login to DBMS.

SQL Security Commands GRANT INSERT ON Bicycle TO OrderClerks GRANT privileges REVOKE privileges Privileges include SELECT DELETE INSERT UPDATE Objects include Table Table columns (SQL 92+) Query Users include Name/Group PUBLIC GRANT INSERT ON Bicycle TO OrderClerks REVOKE DELETE ON Customer FROM Assemblers

WITH GRANT OPTION GRANT SELECT ON Bicycle TO MarketingChair Enables the recipient to also grant the specified privilege to other users. It passes on part of your authority.

Roles Assign permissions to the role. Role: SalesClerk New hire: ItemID Description Price QOH 111 Dog Food 0.95 53 222 Cat Food 1.23 82 333 Bird Food 3.75 18 Items: SELECT Customers: SELECT, UPDATE Sales: SELECT, UPDATE, INSERT CustomerID LastName FirstName Phone 1111 Wilson Peta 2222 1112 Pollock Jackson 3333 1113 Locke Jennifer 4444 SalesID SaleDate CustomerID 111 03-May- 1112 112 04-May- 113 05-May- 1113 Assign permissions to the role. Role: SalesClerk New hire: Add role to person

Using Queries for Control Permissions apply to entire table or query. Use query to grant access to part of a table. Example Employee table Give all employees read access to name and phone (phonebook). Give managers read access to salary. SQL Grant Revoke Employee(ID, Name, Phone, Salary) Query: Phonebook SELECT Name, Phone FROM Employee Security Grant Read access to Phonebook for group of Employees. Grant Read access to Employee for group of Managers. Revoke all access to Employee for everyone else (except Admin).

Separation of Duties Supplier Purchasing manager can add new suppliers, but cannot add new orders. SupplierID Name … 673 Acme Supply 772 Basic Tools 983 Common X Referential integrity PurchaseOrder Clerk must use SupplierID from the Supplier table, and cannot add a new supplier. OrderID SupplierID 8882 772 8893 673 8895 009

Securing an Access Database Set up a secure workgroup Create a new Admin user. Enable security by setting a password Remove the original Admin user. Run the Security Wizard in the database to be secured. Assign user and group access privileges in the new database. Encrypt the new database. Save it as an MDE file.

Encryption Protection for open transmissions Single key (AES) AES Plain text message Protection for open transmissions Networks The Internet Weak operating systems Single key (AES) Dual key Protection Authentication Trap doors / escrow keys U.S. export limits 64 bit key limit Breakable by brute force Typical hardware:2 weeks Special hardware: minutes AES Key: 9837362 Encrypted text Single key: e.g., AES Encrypted text AES Key: 9837362 Plain text message

Dual Key Encryption Alice Bob Message Transmission Message Encrypt+T+M Encrypt+M Encrypt+T Private Key 13 Bob Use Alice’s Private key Public Keys Alice 29 Bob 17 Private Key 37 Use Bob’s Private key Use Bob’s Public key Use Alice’s Public key Using Bob’s private key ensures it came from him. Using Alice’s public key means only she can read it.

Sally’s Pet Store: Security Management Sally/CEO Sales Staff Store manager Sales people Business Alliances Accountant Attorney Suppliers Customers Products Sales Purchases Receive products Animals Animal Healthcare Employees Hiring/Release Hours Pay checks Accounts Payments Receipts Management Reports Operations Users

Sally’s Pet Store: Purchases *Basic Supplier data: ID, Name, Address, Phone, ZipCode, CityID R: Read W: Write A: Add

Database Management Systems Chapter 10 Distributed Databases

Distributed Databases SELECT Sales FROM Britain.Sales UNION FROM France.Sales FROM Italy.Sales Definition Advantages / Uses Problems / Complications Client-Server / SQL Server Microsoft Access Germany Britain France Italy

Distributed Database Definition Multiple independent databases Each DBMS is a complete DBMS (engine, queries, locking, transactions, etc.) Usually on different machines. Usually in different locations. Connected by a network. Might be different environments Hardware Operating System DBMS Software Database Apollo Database Zeus England France Database Athena United States

Distributed Database Rules C.J. Date Rule 0: Transparency: the user should not know or care that the database is distributed. Local autonomy. No reliance on a central site. Continuous operation. Location independence. Fragmentation independence (physical storage). Replication independence. Distributed query processing. Distributed transaction management. Hardware independence. Operating system independence. Network independence. DBMS independence.

Distributed Features Each database can continue to run even if portion fails. Data and hardware can be moved without affecting operations or users. Expanding operations. Performance issues. System expansion and upgrades. Add new section without affecting others. Upgrade hardware, network and DBMS.

Advantages and Applications local transactions Business operations are often distributed Work and data are segmented by department. Work and data are segmented by geographical location. Improved performance Most updates and queries are performed locally. Maintain local control and responsibility over data. Can still combine data across the system. Scalability and expansion Add on, not replacement. future expansion

Creating a Distributed Database Design administration plan. Choose hardware and DBMS vendor, and network. Set up network and DBMS connections. Choose locations for data. Choose replication strategy. Create backup plan and strategy. Create local views and synonyms. Perform stress test: loads and failures.

Distributed Query Processing Networks are slow Drives: 20 - 60 MB per sec. LANs: 1-10 MB per sec (10-100 mbps). WANs: 0.01 - 5 MB per sec. Faster is possible but expensive! SANs: 10-100 MB per sec. Goal is to minimize transmissions. Each system must be capable of evaluating queries--preferably SQL. Results depend heavily on how the system joins tables. WAN 0.1 - 5 MB 10-100 MB LAN 10 - 20 MB Disk drive

Distributed Query Processing Example NY: Customers: 1 M rows LA: Production: 10 M rows Chicago: Sales: 20 M rows Query: List customers who bought blue products on March 1 Bad idea #1 Transfer all rows to Chicago Then JOIN and select. Better idea #2 (probably) Transfer blue products from LA to Chicago Better idea #3 Get sale items on March 1 Get blue products from LA Send C# to NY NY Customers(C#, …) 1,000,000 C# list from desired P# Chicago Matching Customer data Sales(S#, C#, Sdate) 20,000,000 SaleItem(S#, P#,…) 50,000,000 P# sold on March 1 Blue P# sold on March 1 LA Products(P#, Color…) 10,000,000

Data Replication Goals Problems Decision support systems. Market research & data corrections. Goals Minimize transmissions Improve performance Support heavy multiuser access. Problems Updating copies Bulk transmissions Site unavailable Concurrency Easier for two people to change the same data at the same time. Decision support systems. Data warehouse. Britain Britain: Customers & Sales France: Customers & Sales Spain: Customers & Sales Periodic updates Spain Britain: Customers & Sales France: Customers & Sales Spain: Customers & Sales Update data.

Concurrency and Locks Each DBMS must maintain lock facility. To update, each DBMS must utilize and recognize other lock mechanisms and return codes. Each DBMS must have a deadlock resolution protocol that recognizes the distributed databases. Random wait. Optimistic updates. Two-phase commit. DBMS #1 Accounts Jones 8898 Transaction A Locked Waiting Transaction B Waiting Locked DBMS #2 Accounts Jones 3561

Transactions & Two-Phase Commit Two (or more) separate lock managers. DBMS initiating update serves as the coordinator. Two phases Coordinator sends message and data to all machines to “get ready.” Local machines save data in logs, verify update status and return message. If all locals report OK, then coordinator writes log and instructs others to proceed. If any fail, it sends Rollback message. Database 1 Initiate Transaction 1. Prepare to commit. All agree? 2. Commit Database 2 Lock tables. Save log. Database 3 Update all tables.

Distributed Transaction Managers Resource Manager Resource Manager DBMS DBMS Transaction Manager Transaction Processing Monitor Resource Manager DBMS The distributed transaction coordinator/transaction processing monitor handles the transaction decisions and coordinates across the participating systems.

Distributed Design Questions

Distributed Databases In Oracle Schema.Table@Location Scott.Emp@hq.acme.com Database Links Full database names. CONNECT command. Linking through synonyms. CREATE SYNONYM … Central control over permissions. Linking through Views/queries. CREATE VIEW AS … Can assign local permissions. Linking through stored procedures. DELETE … Strong control over actions. Server database Synonym: Employee Procedure: DELETE FROM Employee WHERE ... View User can only run procedure. No other access. user permissions

Client-Server Server Server Shared Database Front-end User Interface

LAN File Server Not a distributed database. Performance improvements. Data file stored on server. Server is passive, appears as giant disk drive to PC. PC processes all data. Retrieves all needed data across the network. Performance improvements. Indexes are crucial. Store some data on each PC (replication). Store applications on PC (graphics & forms). Convert to SQL-Server DBMS data file Application Shared Data All data from all tables are read by PC, which performs JOIN and WHERE test. If available, reads index first. SELECT Name, SaleDate FROM Customer INNER JOIN Sales ON Customer.C# = Sales.C# WHERE SaleDate BETWEEN #1-Mar-97# AND #9-Mar-97#;

LAN File Server: Slow File Server MyFile.mdb CustID Name … Forms CustID Name … 115 Jenkins … 125 Juarez ... Order ... Application and query transferred. DBMS software transferred. One row at a time transferred, until all rows are examined. SELECT * FROM Customer WHERE City = “Sandy”

Client-Server Databases File Server One machine machine is dominant (server) and handles data for many clients. Client machines handle front-end tasks and small data tables that are not shared. DBMS SQL Server Shared Data Return matching data. Send SQL statement. SELECT . . . application

ADO and Direct Connections Server Computer Database Server The Database vendor provides its own data transport (e.g,. Oracle or SQL Server) installed on the server and the client. ADO provides a driver that connects your application to the transport services. ODBC can serve as the data transport if nothing else is available DBMS transport SELECT … Results Client Computer DBMS transport ADO Visual Basic application

Three-Tier Client-Server Database Servers Databases. Transactions. Legacy applications. Server Databases Client front-end Middle Locate databases Business rules Program code Database links. Business rules. Program code. Middleware Application. Front-end. User Interface. Client

Database Independence on the Client Original DBMS New DBMS ADO ADO Application

Database Independence with Queries Independent Application Query: works with any DBMS SELECT SaleID, SaleDate, CustomerID, CustomerName FROM SaleCustomer Saved Oracle Query SELECT SaleID, SaleDate, CustomerID, LastName || ‘, ‘ || FirstName AS CustomerName FROM Sale, Customer WHERE Sale.CustomerID=Customer.CustomerID Saved SQL Server Query SELECT SaleID, SaleDate, CustomerID, LastName + ‘, ‘ + FirstName AS CustomerName FROM Sale INNER JOIN Customer ON Sale.CustomerID = Customer.CustomerID

The Internet as Client-Server information Internet Router Router Server request Client Browser Web Server http://server.location/page HTML pages Forms Graphics

HTML Limited Clients <HTML> <HEAD> <TITLE>My main page</TITLE></HEAD> <BODY BACKGROUND=“graphics/back0.jpg”> <P>My text goes in paragraphs.</P> <P>Additional tags set <B>boldface</B> and <I>Italic</I>. <P>Tables are more complicated and use a set of tags for rows and columns.</P> <TABLE BORDER=1> <TR><TD>First cell</TD><TD>Second cell</TD></TR> <TR><TD>Next row</TD><TD>Second column</TD></TR> </TABLE> <P>There are form tags to create input forms for collecting data. But you need CGI program code to convert and use the input data.</P> </BODY> </HTML>

HTML Output

Web Server Database Fundamentals Request Server/Form.html 3 Client/Browser Database Page = Template + Result Data 2 1 2 3 CGI String DBMS HTML Form Result 1 Query Form Web Server Result Page 1 2 HTML form Query Template + Code Form.html Program code

Database Example: Client Side Request Server/Form.html Server Initial form 1 Results Call ASP page 3 2

Client-Server Data Transfer Order Form Order ID 1015 Customer Jones, Martha Order Date 12-Aug What if there are 10,000 customers? How much time to load the combo box? How do you refresh/reload the combo box? Alternatives?

Latency Server Generate form Receive form data Transmission delay time Form received Client User delay

XML: Transferring Data Order: OrderID, OrderDate, ShippingCost, Comment Item: ItemID, Description, Quantity, Cost Item: ItemID, Description, Quantity, Cost Item: ItemID, Description, Quantity, Cost Many XML files contain hierarchical data.

XML: Schema Definition xsd <?xml version="1.0" encoding="utf-8"?> <xs:schema id="OrderList" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"> <xs:element name="OrderList" msdata:IsDataSet="true"> <xs:complexType> <xs:choice maxOccurs="unbounded"> <xs:element name="Order"> <xs:sequence> <xs:element name="OrderID" type="xs:string" minOccurs="0" /> <xs:element name="OrderDate" type="xs:date" minOccurs="0" /> <xs:element name="ShippingCost" type="xs:string" minOccurs="0" /> <xs:element name="Comment" type="xs:string" minOccurs="0" /> <xs:element name="Items" minOccurs="0" maxOccurs="unbounded"> <xs:element name="ItemID" nillable="true" minOccurs="0" maxOccurs="unbounded"> <xs:simpleContent msdata:ColumnName="ItemID_Text" msdata:Ordinal="0"> <xs:extension base="xs:string"> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element> <xs:element name="Description" nillable="true" minOccurs="0" maxOccurs="unbounded"> <xs:simpleContent msdata:ColumnName="Description_Text" msdata:Ordinal="0"> Partial file, generated by .NET xsd.exe

XML Data Example XML: extensible markup language <?xml version="1.0"?> <!DOCTYPE OrderList SYSTEM "orderlist.dtd"> <OrderList> <Order> <OrderID>1</OrderID> <OrderDate>3/6/2004</OrderDate> <ShippingCost>$33.54</ShippingCost> <Comment>Need immediately.</Comment> <Items> <ItemID>30</ItemID> <Description>Flea Collar-Dog-Medium</Description> <Quantity>208</Quantity> <Cost>$4.42</Cost> <ItemID>27</ItemID> <Description>Aquarium Filter & Pump</Description> <Quantity>8</Quantity> <Cost>$24.65</Cost> </Items> </Order> </OrderList> XML: extensible markup language

XML Example in Explorer

Java and JDBC Connection con = DriverManager.getConnection( "jdbc.myDriver:myDBName", “myLogin”, “myPassword”); Statement smt = con.CreateStatement(); ResultSet rst = smt.executeQuery( “SELECT AnimalID, Name, Category, Breed FROM Animal”); while (rst.next()) { int iAnimal = rst.getInt(“AnimalID”); String sName = rst.getString(“Name”); String sCategory = rst.getString(“Category”); String sBreed = rst.getString(“Breed”); \\ Now do something with these four variables }