Download presentation
Presentation is loading. Please wait.
Published byAugust Bridges Modified over 6 years ago
1
Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev
Faculty of Information University of Toronto This presentation is licensed under Creative Commons Attribution License, v To view a copy of this license, visit This presentation incorporates images from the Crystal Clear icon collection by Everaldo Coelho, available under LGPL from
2
Week 12 Storage, Structure, and Performance
3
The Final Project 1. Congratulation! 2. Moving Forward
- - Try Python’s MySQLdb module - Python web frameworks - Object-relational mappers (e.g., with Django)
4
The Final Exam Time & Place: Room 538 (5th floor of Bissel, requires code) Scope - Drawing ER diagrams - Implementing ER design as SQL - SQL queries - including joins and grouping - Miscellaneous - see website for sample questions
5
Persistent Easy to read Easy to update Cost-effective
Storage Persistent Easy to read Easy to update Cost-effective
7
image sources http://en. wikipedia. org/wiki/File:Hard_disk_dismantled
image sources /File:Memory_module_DDRAM_ jpg
8
caching
9
“hierarchical storage management”
10
Structure Finding a “storable” representation - Not losing information
- Allowing for easy retrieval - Allowing for easy update - Minimizing storage size
11
CSV simple! relatively small but what to do with NULLs?
and how would it perform?
12
CSV , ,23.47 , ,590.32 , ,44.66 , , + another billion rows
13
CSV , ,23.47⏎ , ,590.32⏎ , ,44.66⏎ , , ⏎ + another billion rows
14
CSV , ,23. 47⏎ , ,590.32⏎ , ,44.66⏎ , , ⏎.....
15
Fixed-Length + another billion rows start of Nth row = N * length of row
16
Fixed-Length + another billion rows How do we add a row? How do we delete a row? How do we find a row?
17
Sort it? + another billion rows Finding is easier! Inserting is harder!
18
Complexity Analysis “Big-O” analysis Best sorting methods: O(n log(n))
Searching a list: O(n) Searching a sorted list: O(log (n)) Inserting into a sorted list: O(n)
19
And the second field? + another billion rows Sort by the 2nd field? But avoid duplication! (Essentially an index.)
20
Trees 755280276,889208095,590.32 and other items that start with 75
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 , ,590.32 and other items that start with 75 (does not need to be ordered)
21
Fixed: E.g., ISAM Balanced: E.g., B+ Trees
Fixed vs Balanced Fixed: E.g., ISAM Balanced: E.g., B+ Trees A: 4% M: 9.5% Q: 0
22
Balanced Tree Antonov Aaronson Zhang O'Neil Zhukov Thomson Silva
Peters Kim
23
Balanced Tree Antonov O'Neil Thomson Aaronson Silva Peters Kim Zhang
Zhukov
24
Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov
25
Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov
26
Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov
27
Balanced Tree Antonov Aaronson Silva Kim Thomson O'Neil Zhang Peters
Zhukov
28
Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov
29
Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov
30
Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov
31
Balanced Tree Silva Antonov Zhang Aaronson Kim Thomson Zhukov O'Neil
Peters
32
Storage Engines CVS yes, it's an option MyISAM
the default before MySQL 5.5 relatively simple InnoDB more features the new default (since 5.5)
33
The Extra Features Foreign Key Constraints not in MyISAM! Transactions
start transaction; update table1 ...; update table2 ...; commit; Downside: complexity
34
More Engines Memory no harddrive → faster Blackhole
doesn't actually store anything Federated allows remote tables Etc.
35
Picking an Engine create table superhero ( id integer,
primary key (id), name char(100) ) engine=MyISAM;
36
Creating an Index Easy retrieval by field create index username_index
- usually automatic for PK - optional for other fields - downsides: additional space harder inserts create index username_index on user(username);
37
Distributed Performance Revisited
Distributing the work: one big machine vs. many small ones Some things are easier to split Distributing RDBMS is hard (hence the interest in “NoSQL”)
38
(Anything from this semester.)
Questions? (Anything from this semester.)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.