Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev

Similar presentations


Presentation on theme: "Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev"— Presentation transcript:

1 Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev
Faculty of Information University of Toronto This presentation is licensed under Creative Commons Attribution License, v To view a copy of this license, visit This presentation incorporates images from the Crystal Clear icon collection by Everaldo Coelho, available under LGPL from

2 Week 12 Storage, Structure, and Performance

3 The Final Project 1. Congratulation! 2. Moving Forward
- - Try Python’s MySQLdb module - Python web frameworks - Object-relational mappers (e.g., with Django)

4 The Final Exam Time & Place: Room 538 (5th floor of Bissel, requires code) Scope - Drawing ER diagrams - Implementing ER design as SQL - SQL queries - including joins and grouping - Miscellaneous - see website for sample questions

5 Persistent Easy to read Easy to update Cost-effective
Storage Persistent Easy to read Easy to update Cost-effective

6

7 image sources http://en. wikipedia. org/wiki/File:Hard_disk_dismantled
image sources /File:Memory_module_DDRAM_ jpg

8 caching

9 “hierarchical storage management”

10 Structure Finding a “storable” representation - Not losing information
- Allowing for easy retrieval - Allowing for easy update - Minimizing storage size

11 CSV simple! relatively small but what to do with NULLs?
and how would it perform?

12 CSV , ,23.47 , ,590.32 , ,44.66 , , + another billion rows

13 CSV , ,23.47⏎ , ,590.32⏎ , ,44.66⏎ , , ⏎ + another billion rows

14 CSV , ,23. 47⏎ , ,590.32⏎ , ,44.66⏎ , , ⏎.....

15 Fixed-Length + another billion rows start of Nth row = N * length of row

16 Fixed-Length + another billion rows How do we add a row? How do we delete a row? How do we find a row?

17 Sort it? + another billion rows Finding is easier! Inserting is harder!

18 Complexity Analysis “Big-O” analysis Best sorting methods: O(n log(n))
Searching a list: O(n) Searching a sorted list: O(log (n)) Inserting into a sorted list: O(n)

19 And the second field? + another billion rows Sort by the 2nd field? But avoid duplication! (Essentially an index.)

20 Trees 755280276,889208095,590.32 and other items that start with 75
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 , ,590.32 and other items that start with 75 (does not need to be ordered)

21 Fixed: E.g., ISAM Balanced: E.g., B+ Trees
Fixed vs Balanced Fixed: E.g., ISAM Balanced: E.g., B+ Trees A: 4% M: 9.5% Q: 0

22 Balanced Tree Antonov Aaronson Zhang O'Neil Zhukov Thomson Silva
Peters Kim

23 Balanced Tree Antonov O'Neil Thomson Aaronson Silva Peters Kim Zhang
Zhukov

24 Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov

25 Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov

26 Balanced Tree Antonov Kim O'Neil Aaronson Peters Silva Thomson Zhang
Zhukov

27 Balanced Tree Antonov Aaronson Silva Kim Thomson O'Neil Zhang Peters
Zhukov

28 Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov

29 Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov

30 Balanced Tree Silva Antonov Aaronson Kim Thomson O'Neil Zhang Peters
Zhukov

31 Balanced Tree Silva Antonov Zhang Aaronson Kim Thomson Zhukov O'Neil
Peters

32 Storage Engines CVS yes, it's an option MyISAM
the default before MySQL 5.5 relatively simple InnoDB more features the new default (since 5.5)

33 The Extra Features Foreign Key Constraints not in MyISAM! Transactions
start transaction; update table1 ...; update table2 ...; commit; Downside: complexity

34 More Engines Memory no harddrive → faster Blackhole
doesn't actually store anything Federated allows remote tables Etc.

35 Picking an Engine create table superhero ( id integer,
primary key (id), name char(100) ) engine=MyISAM;

36 Creating an Index Easy retrieval by field create index username_index
- usually automatic for PK - optional for other fields - downsides: additional space harder inserts create index username_index on user(username);

37 Distributed Performance Revisited
Distributing the work: one big machine vs. many small ones Some things are easier to split Distributing RDBMS is hard (hence the interest in “NoSQL”)

38 (Anything from this semester.)
Questions? (Anything from this semester.)


Download ppt "Data Modeling and Database Design INF1343, Winter 2012 Yuri Takhteyev"

Similar presentations


Ads by Google