Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instructor: Craig Duckett Lecture 06: Thursday, October 15 th, 2015 Indexes, Aliases, Distinct, SQL 1 BIT275: Database Design (Fall 2015)

Similar presentations


Presentation on theme: "Instructor: Craig Duckett Lecture 06: Thursday, October 15 th, 2015 Indexes, Aliases, Distinct, SQL 1 BIT275: Database Design (Fall 2015)"— Presentation transcript:

1 Instructor: Craig Duckett Lecture 06: Thursday, October 15 th, 2015 Indexes, Aliases, Distinct, SQL 1 BIT275: Database Design (Fall 2015)

2 2 DUE TONIGHT for those who started the class late! Assignment 1 is DUE TONIGHT for those who started the class late! Uploaded to StudentTracker by MIDNIGHT! MID-TERM EXAM is LECTURE 10, Tuesday, November 3 rd Assignment 2 is due LECTURE 11, Thursday, November 5 th, in StudentTracker by MIDNIGHT

3 3 Tuesday (LECTURE 5) Database Design for Mere Mortals: Chapter 4 Thursday (LECTURE 6) The Language of SQL Chapter 5: Sorting Data Chapter 6: Column Based Logic

4 4 Indexes Aliases Distinct

5 Indexes

6 As your database starts to grow, there's a part of the database design that becomes more and more important, and that's creating and using indexes. An index center database is like an index at the back of a textbook. It will help you find things in that book. Say if I have a 500-page book on database management, and I want to find the content dealing just with date columns, I can look in the back in the index for date. It tells me that's on page 124, and I turn directly to that page. Indexes are all about speed of access. I could have found that content by going through the book page by page. So the index isn't adding any content. It's just helping me find it.

7 Indexes Here's the most essential ideas. See, the data in your tables is inherently unstructured and unordered. Sure, all the data in one row is kept together, but there's no particular enforced order to those rows unless you say otherwise. And yes, when we write queries, we can use the ORDER BY clause to impose a sequence on our results, but that's not the issue here. The issue is getting directly to certain content like getting directly to one specific row in a large table. You might think, well, hang on a second. Didn't we take care of that by adding a primary key to our table? Well, yes and no. The primary key does let us uniquely identify a row, but that's a very different question from does it let us find that row really fast when it's somewhere in the middle of 500,000 other rows. That's the problem we're trying to fix with indexes.

8 Indexes The most basic, the first index that's created, the primary index on any table is what's called the clustered index, and that means pick a column as the clustered index and the database will order the data in that table based on that column. Meaning that on that physical disk itself, where these rows of data are stored as bits and bytes, they're actually sequenced that way. So the clustered index is almost like having a residential phonebook or White Pages where the clustered index in a phonebook would be last name. So everyone is listed in the book in alphabetical order by last name. The actual data is structured and presented that way. You don't have to look in the back of the book to find out where the names beginning with D are. You know where they are. They're towards the front between the Cs and the Es, because the data is ordered that way. The most common clustered index in any database table is the primary key column, whether that's an automatically generated integer or a unique string or anything else. Actually, most database management systems when you define a new table will automatically make the primary key the clustered index unless you say otherwise.

9 Indexes That is usually the best option, because we already chose the primary key as a significant way to get to the individual rows. But if you found yourself accessing the data primarily using another column, you might change that column to be the clustered index instead. See, each table can only have one clustered index. The same way the names in a phonebook can only be sorted one way, but what you can have more of is non-clustered or unclustered indexes. Here's the situation --

10 Indexes Example: Index in SQL is created on existing tables to retrieve the rows quickly. When there are thousands of records in a table, retrieving information will take a long time. Therefore indexes are created on columns which are accessed frequently, so that the information can be retrieved quickly. Indexes can be created on a single column or a group of columns. After creating an INDEX, your database management system will automatically use it the next time you perform a SELECT query whenever Firstname or LastName are invoked.

11 Indexes Why not just run a query? Now it's a very inefficient way to get to your data, and the more rows you have, the worse it gets. It wouldn't be bad with a dozen or a few hundred rows, but once you get to thousands, tens of thousands, hundreds of thousands, it's highly inefficient. If I have to do this kind of query a lot where we're actually querying on an unindexed column, it's going to be a problem. Here's one way I can fix it. I can create a secondary index, a non-clustered index. This is more like having an index at the back of the textbook. It's created as a separate piece by itself with its own meaningful order. So in this case, I'm imagining I have this index existing, it's sorting now by last name ascending, the way that we can't actually do in the regular table, because we're already sorting by CustomerID. We're basically creating a map here of how we can go to LastName. In this case say Smith, find that particular name. It's a much easier to get to in the index, because the last name is ordered alphabetically. Then we find the location of that in the regular table, and we can jump directly to that place. Now it's not quite as quick as using the clustered index as we still do have to read from one place then jump to another, but it's much quicker than a full table scan.

12 Indexes Why don't we just index everything? Why don't we add a non-clustered index to every column so whatever query we write is nice and fast? Because every index has a cost. Indexes are a benefit when reading data, but they're a detriment when writing or changing it, because they must be maintained. So let's say, I've got a table here, pretty typical employee table, and I've got two non-clustered indexes on this table. So the clustered index, the way the data is actually sorted is based on EmployeeID. And I have a non-clustered index on LastName, and a non-clustered index on the FirstName. It means when we do a select, I can go by EmployeeID, I can go by FirstName, I can go by last name, and that will be quite fast.

13 Indexes But when we write to this table, when we insert a row, we're not only writing data into the table itself. And if your clustered index is on an automatically incrementing number we just write to the end which is easy, but every insert we do is going to require another change, a write into every non-clustered index. If I have added this new row of Brenda Daniels, I'll add it to the table, I now need to go ahead and make a change to the non-clustered index for LastName to insert Daniels in there and point to the EmployeeID or the location that she's at. Then do another insert to make sure that Brenda is inserted on the non-clustered index for FirstName. So if you started adding even more of this, if you have five non-clustered indexes as well as your clustered one, that means that one insert which started off as one physical write operation to the database disk is now six write operations, and you have made things significantly more inefficient. So, Rule of Thumb: Use unclustered indexes sparingly and wisely.

14 14 Indexes

15

16 Aliases

17 You can give a table or a column another name by using an alias. This can be a good thing to do if you have very long or complex table names or column names. An alias name could be anything, but usually it is short. SQL Alias Syntax for Tables SELECT column_name(s) FROM table_name AS alias_name; SQL Alias Syntax for Columns SELECT column_name AS alias_name FROM table_name; http://www.w3schools.com/sql/sql_alias.asp http://beginner-sql-tutorial.com/sql-aliases.htm http://www.tutorialspoint.com/sql/sql-alias-syntax.htm http://www.1keydata.com/sql/sqlalias.html SELECT t.column_name FROM table_name t; -or-

18 Aliases Example of aliases for columns:

19 Distinct

20 You can eliminate duplicate values from multiple rows using the DISTINCT keyword. Here's an example: SELECT DISTINCT StateName FROM Authors; SELECT COUNT (DISTINCT StateName) FROM Authors;

21 A Look at SQL File http://faculty.cascadia.edu/cduckett/bit275/lecture_03/world-mysql.sql.txt

22 22 A Look at SQL File Comments http://dev.mysql.com/doc/refman/5.7/en/comments.html

23 23 A Look at SQL File CONTINUED Drop Database: (Optional) But will clean out an old iteration of database if it exists Create Database Use http://dev.mysql.com/doc/refman/5.7/en/drop-database.html http://dev.mysql.com/doc/refman/5.7/en/create-database.html http://dev.mysql.com/doc/refman/5.7/en/charset-database.html https://dev.mysql.com/doc/refman/5.7/en/use.html

24 24 A Look at SQL File CONTINUED Drop Table Create Table ENGINE=MyISAM My Indexed Sequential Access Method (Pre 5.5 default) CHARSET=utf8 Universal Character Set + Transformation Format—8-bit http://dev.mysql.com/doc/refman/5.7/en/drop-table.html http://dev.mysql.com/doc/refman/5.7/en/create-table.html https://dev.mysql.com/doc/refman/5.7/en/myisam-storage-engine.html http://www.rackspace.com/knowledge_center/article/mysql-engines-myisam-vs-innodb http://en.wikipedia.org/wiki/InnoDB http://www.w3schools.com/charsets/ref_html_utf8.asp

25 25 A Look at SQL File CONTINUED INSERT VALUES http://dev.mysql.com/doc/refman/5.7/en/insert.html  Could have used NULL instead of hard-coding the Primary key https://dev.mysql.com/doc/refman/5.0/en/string-types.html

26 26 ICE 06 Join together tables in the World database (from world-mysql.sql.txt ) to create SQL queries that will generate the following result set:result set The list of all of the languages spoken in each country A count of all the languages that might be spoken in each country A count of all the languages that might be spoken within each city Each region and a count of the different countries in each region Each region and total population of the different countries in each region Each region and total number of languages spoken in each region. Do the same for continents instead of regions.


Download ppt "Instructor: Craig Duckett Lecture 06: Thursday, October 15 th, 2015 Indexes, Aliases, Distinct, SQL 1 BIT275: Database Design (Fall 2015)"

Similar presentations


Ads by Google