Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev

Slides:



Advertisements
Similar presentations
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Advertisements

1 Review #1 l Intro stuff –What is a database, 4 parts, 3 users, etc. l Architecture –Data independence –Three levels, two mappings –Jobs of the DBA.
Physical Database Monitoring and Tuning the Operational System.
DAT702.  Standard Query Language  Ability to access and manipulate databases ◦ Retrieve data ◦ Insert, delete, update records ◦ Create and set permissions.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.
MySQL. Dept. of Computing Science, University of Aberdeen2 In this lecture you will learn The main subsystems in MySQL architecture The different storage.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Data Driven Designs 99% of enterprise applications operate on database data or at least interface databases. Most common DBMS are Microsoft SQL Server,
Use of ICT in Data Management AS Applied ICT. Back to Contents Back to Contents.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Unit-8 Introduction Of MySql. Types of table in PHP MySQL supports various of table types or storage engines to allow you to optimize your database. The.
CS 440 Database Management Systems Stored procedures & OR mapping 1.
Introduction to MySQL  Working with MySQL and MySQL Workbench.
uses of DB systems DB environment DB structure Codd’s rules current common RDBMs implementations.
Introduction to Database Programming with Python Gary Stewart
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT395, Week 4 Database Design and ER Modeling This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
CCT395, Week 12 Storage, Grid Computing, Review This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
INF1343, Winter 2011 Data Modeling and Database Design Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
INF1343, Week 6 Implementing a Database with SQL This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
CCT395, Week 11 Review, Permissions Physical Storage and Performance This presentation is licensed under Creative Commons Attribution License, v. 3.0.
CCT395, Week 6 Implementing a Database with SQL and a Case Study Exercise This presentation is licensed under Creative Commons Attribution License, v.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Notice: MySQL is a registered trademark of Sun Microsystems, Inc. MySQL Conference & Expo 2011 Michael “Monty” Widenius Oleksandr “Sanja”
Database Design and Implementation
CCT395, Week 2 Introduction to SQL Yuri Takhteyev September 15, 2010
INF1343, Winter 2012 Data Modeling and Database Design Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
INF1343, Week 4 Database Design and ER Modeling This presentation is licensed under Creative Commons Attribution License, v To view a copy of this.
CCT396, Fall 2011 Database Design and Implementation Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
CCT395, Week 5 Translating ER into Relations; Normalization This presentation is licensed under Creative Commons Attribution License, v To view a.
INF1343, Winter 2011 Data Modeling and Database Design Yuri Takhteyev University of Toronto This presentation is licensed under Creative Commons Attribution.
Databases and DBMSs Todd S. Bacastow January
Database Design and Implementation
Translating ER into Relations; Normalization
Review Databases and Objects
Module 11: File Structure
Databases Key Revision Points.
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
Special Topics in CCIT: Databases
CCT1343, Week 2 Single-Table SQL Yuri Takhteyev University of Toronto
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Database Design and Implementation
Data Modeling and Database Design
Physical Database Design for Relational Databases Step 3 – Step 8
Database Design and Implementation
Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev
Translation of ER-diagram into Relational Schema
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
SQL 101.
Database Implementation Issues
Teaching slides Chapter 8.
The Relational Model Textbook /7/2018.
Lecture 19: Data Storage and Indexes
Data Model.
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited)
Data Management Innovations 2017 High level overview of DB
CMPE/SE 131 Software Engineering March 9 Class Meeting
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Indexing, Access and Database System Architecture
Review #1 Intro stuff What is a database, 4 parts, 3 users, etc.
Database Implementation Issues
Presentation transcript:

Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev Faculty of Information University of Toronto This presentation is licensed under Creative Commons Attribution License, v. 3.0. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/. This presentation incorporates images from the Crystal Clear icon collection by Everaldo Coelho, available under LGPL from http://everaldo.com/crystal/.

Week 12 Storage, Structure, and Performance

The Final Project 1. Congratulation! 2. Moving Forward - https://github.com/yuri/padawan - Try Python’s MySQLdb module - Python web frameworks - Object-relational mappers (e.g., with Django)

The Final Exam Time & Place: Room 538 (5th floor of Bissel, requires code) Scope - Drawing ER diagrams - Implementing ER design as SQL - SQL queries - including joins and grouping - Miscellaneous - see website for sample questions

Persistent Easy to read Easy to update Cost-effective Storage Persistent Easy to read Easy to update Cost-effective

image sources http://en. wikipedia. org/wiki/File:Hard_disk_dismantled image sources http://en.wikipedia.org/wiki/File:Hard_disk_dismantled.jpghttp://en.wikipedia.org/wiki /File:Memory_module_DDRAM_20-03-2006.jpg http://en.wikipedia.org/wiki/File:Largetape.jpg

caching

“hierarchical storage management”

Structure Finding a “storable” representation - Not losing information - Allowing for easy retrieval - Allowing for easy update - Minimizing storage size

CSV simple! relatively small but what to do with NULLs? and how would it perform?

CSV 657807938,559562982,23.47 755280276,889208095,590.32 934625720,459538801,44.66 852067113,660539228,1684.77 + another billion rows

CSV 657807938,559562982,23.47⏎ 755280276,889208095,590.32⏎ 934625720,459538801,44.66⏎ 852067113,660539228,1684.77⏎ + another billion rows

CSV 657807938,559562982,23. 47⏎755280276,889208095 ,590.32⏎934625720,4595 38801,44.66⏎852067113, 660539228,1684.77⏎.....

Fixed-Length 657807938559562982002347 755280276889208095059032 934625720459538801004466 852067113660539228168477 + another billion rows start of Nth row = N * length of row

Fixed-Length 657807938559562982002347 755280276889208095059032 934625720459538801004466 852067113660539228168477 + another billion rows How do we add a row? How do we delete a row? How do we find a row?

Sort it? 755280270029178799023934 755280276889208095059032 755280279890089234000020 755290123939191721000129 + another billion rows Finding is easier! Inserting is harder!

Complexity Analysis “Big-O” analysis Best sorting methods: O(n log(n)) Searching a list: O(n) Searching a sorted list: O(log (n)) Inserting into a sorted list: O(n)

And the second field? 755280270029178799023934 755280276889208095059032 755280279890089234000020 755290123939191721000129 + another billion rows Sort by the 2nd field? But avoid duplication! (Essentially an index.)

Trees 755280276,889208095,590.32 and other items that start with 75 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 755280276,889208095,590.32 and other items that start with 75 (does not need to be ordered)

Fixed: E.g., ISAM Balanced: E.g., B+ Trees Fixed vs Balanced Fixed: E.g., ISAM Balanced: E.g., B+ Trees A: 4% M: 9.5% Q: 0

Balanced Tree Antonov Aaronson Zhang O'Neil Zhukov Thomson Silva Peters Kim

Balanced Tree Antonov O'Neil Thomson Aaronson Silva Peters Kim Zhang Zhukov

Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang Zhukov

Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang Zhukov

Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang Zhukov

Balanced Tree Antonov Aaronson Silva Kim Thomson O'Neil Zhang Peters Zhukov

Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters Zhukov

Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters Zhukov

Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters Zhukov

Balanced Tree Silva Antonov Zhang Aaronson Kim Thomson Zhukov O'Neil Peters

Storage Engines CVS yes, it's an option MyISAM the default before MySQL 5.5 relatively simple InnoDB more features the new default (since 5.5)

The Extra Features Foreign Key Constraints not in MyISAM! Transactions start transaction; update table1 ...; update table2 ...; commit; Downside: complexity

More Engines Memory no harddrive → faster Blackhole doesn't actually store anything Federated allows remote tables Etc.

Picking an Engine create table superhero ( id integer, primary key (id), name char(100) ) engine=MyISAM;

Creating an Index Easy retrieval by field create index username_index - usually automatic for PK - optional for other fields - downsides: additional space harder inserts create index username_index on user(username);

Distributed Performance Revisited Distributing the work: one big machine vs. many small ones Some things are easier to split Distributing RDBMS is hard (hence the interest in “NoSQL”)

(Anything from this semester.) Questions? (Anything from this semester.)