Presentation is loading. Please wait.

Presentation is loading. Please wait.

Programming Techniques :: Searching Data

Similar presentations


Presentation on theme: "Programming Techniques :: Searching Data"— Presentation transcript:

1 Programming Techniques :: Searching Data
Last modified: 25th June 2019

2 www.drfrostmaths.com ? Everything is completely free.
Why not register? Registering on the DrFrostMaths platform allows you to save all the code and progress in the various Computer Science mini-tasks. It also gives you access to the maths platform allowing you to practise GCSE and A Level questions from Edexcel, OCR and AQA. With Computer Science questions by: Your code on any mini-tasks will be preserved. Note: The Tiffin/DFM Computer Science course uses JavaScript as its core language. Most code examples are therefore in JavaScript. Using these slides: Green question boxes can be clicked while in Presentation mode to reveal. Slides are intentionally designed to double up as revision notes for students, while being optimised for classroom usage. The Mini-Tasks on the DFM platform are purposely ordered to correspond to these slides, giving your flexibility over your lesson structure. ?

3 Databases DrFrostMaths runs off a database:
Each row represents an entry in the table, in this case an individual school. We previously saw that this is an example of a record. A database consists of a number of tables… Each table consists of a number of fields of various types. For example, sid is the school identifying number (an integer), the name is a string, and so on.

4 SQL SQL (Structured Query Language) is a language to search, edit, insert and delete data from databases. It is the global standard for querying databases. Fro Note: SQL is just the language to query the database: it is not the database itself or the software used to store/manage the data, which is known as a Relational Database Management System (RDBMS). There are many different well-known RDBMSs, including MySQL (which is what DrFrostMaths uses), SQL Server and Oracle. SELECT sid, name, domain, phase, totalpoints FROM drfrostmaths_school The most basic syntax for a SQL query is as follows: SELECT fields FROM table SELECT means we’re retrieving data. We’d used UPDATE for modifying data and DELETE for deleting data. A NULL value means there is no value for this field for this row.

5 Special Fields SELECT * FROM drfrostmaths_school
* is a special ‘wildcard’ character, which matches all fields.

6 Aggregate Functions SELECT COUNT(sid) FROM drfrostmaths_school
(technically not in the GCSE syllabus, but super important) SELECT COUNT(sid) FROM drfrostmaths_school COUNT(sid) gives the count of the number of values (not necessarily distinct) in the sid column. Aggregate functions, such as COUNT, produce a table of just one row, in this case containing your count. Note that COUNT doesn’t include NULL values. COUNT(*) will give a count of rows in the table unless your table has rows where the value for all fields NULL. The phase field had some NULL values (see previous slide) so the count was less. SELECT SUM(totalpoints), AVG(totalpoints) FROM drfrostmaths_school Other aggregate functions include SUM (the total), AVG (the mean), MAX (the maximum) The totalpoints field is the total number of points each school has. So SUM(totalpoints) gives the total points of all schools.

7 Aggregate Functions – Advanced Notes
Original table: (This really absolutely is not in the GCSE syllabus!) SELECT SUM(totalpoints) FROM drfrostmaths_school SUM(totalpoints) isn’t one of the original fields of our table, but a field we’ve created (based on an existing field), so it doesn’t have a field name. SELECT SUM(totalpoints) AS tot FROM drfrostmaths_school We can use the AS keyword to give new fields (usually aggregate functions) a name, known as an alias. This might seems useless on its own, but is useful if we want to use our table as a subquery within a more complicated query, and need to refer to this field.

8 ? Aggregate Functions – Advanced Notes Original table:
(This really absolutely is not in the GCSE syllabus!) You may wonder what happens if you combine an aggregate function with a normal field (here sid). sid -10 1 2 3 4 SUM(totalpoints) The fields do not have the same number of rows, so what would happen if we selected sid and SUM(totalpoints)? ? SELECT SUM(totalpoints), sid FROM drfrostmaths_school The answer is: nothing interesting. Due to the inconsistent number of rows, the overall table is restricted to one row, and sid is just restricted to the value in the first row for that field. You shouldn’t in general combine aggregate functions with normal fields. This wouldn’t matter so much if all the sid values were the same value, and this indeed will happen when we look at the GROUP BY keywords later.

9 Distinct Values SELECT phase FROM drfrostmaths_school
(again, not in GCSE syllabus) SELECT phase FROM drfrostmaths_school Selecting a field gives the values in all rows for that field, including any duplicate values! SELECT DISTINCT phase FROM drfrostmaths_school Putting DISTINCT before a field name restricts it to only distinct (i.e. non-duplicated) values. Note: The DISTINCT keyword will be ignored if you select more than one field, as the values for each field in each row would no longer be lined up. SELECT COUNT(DISTINCT country) FROM drfrostmaths_school We can combine DISTINCT with aggregate functions. The above query counts the number of distinct values in the country field.

10 Conditions using WHERE
SELECT name, totalpoints FROM drfrostmaths_school WHERE totalpoints > 10000 Using the WHERE keyword, followed by a condition, allows us to restrict our new table to rows which satisfy that condition.

11 Multiple conditions SELECT name, totalpoints FROM drfrostmaths_school WHERE totalpoints > AND phase = ‘Primary’ We can have multiple conditions which are combine with AND or OR, in exactly the same manner we would combine Boolean conditions with normal programming languages. SELECT SUM(totalpoints), COUNT(*) FROM drfrostmaths_school WHERE country != ‘England’ AND totalpoints > 0 Use != or <> for “not equal to”. We could also use: NOT(country = ‘England’) We can combine aggregate functions like SUM and COUNT with conditions. This query gets the total points earned by schools and number of schools outside of England, for those who have scored more than 0 points.

12 Sorting Results (weirdly, ordering results is not in the GCSE syllabus) SELECT name, totalpoints FROM drfrostmaths_school ORDER BY totalpoints DESC We specify the field we want to sort on, and whether to put in ascending order (ASC) or descending order (DESC).

13 Test Your Understanding So Far
All fields in drfrostmaths_school: Write queries to get results for the following: 1 The postcodes of all schools in Wales. ? SELECT postcode FROM drfrostmaths_school WHERE country = ‘Wales’ 2 The average age of students leaving the school. ? SELECT AVG(maxage) FROM drfrostmaths_school 3 All fields for all schools scoring between 1000 and 2000 points. SELECT * FROM drfrostmaths_school WHERE totalpoints >= 1000 AND totalpoints <= 2000 ? 4 The number of all-girl primary schools. There are just 3, vs mixed! ? SELECT COUNT(*) FROM drfrostmaths_school WHERE gender = ‘Girls’ AND phase = ‘Primary’

14 Using % within WHERE (This *is* in the GCSE syllabus) Sometimes you might want to match rows where the value ‘starts with’ a string or ‘ends with’ a string, or contains a string somewhere within it. For example, when registering on DrFrostMaths, there is an ‘autocomplete’ facility when entering the name of your school: SELECT name, sid, thumb FROM drfrostmaths_school WHERE name LIKE ‘%Tiff%’ % indicates “0 or more of any character”. So %Tiff% means ‘Any value where ‘Tiff’ is potentially surrounded by some characters’. In other words, any value containing “Tiff”! If you use any wildcard symbols such as %, you need to use the keyword LIKE. Using LIKE also makes your matches case-insensitive. For example “tiffin” would match. Advanced: You can use the ‘_’ symbol for “any single character” (whereas % means ‘0 or more’)

15 Test Your Understanding
Describe what the following queries will do: 1 SELECT name FROM drfrostmaths_school WHERE name LIKE ‘T%’ ? All schools whose name begin with the letter T. 2 ... WHERE name LIKE ‘A%a’ ? All schools starting and ending with the letter A. 3 ... WHERE name LIKE ‘% Academy’ ? All schools ending with the word ‘Academy’. N ... WHERE name LIKE ‘_i%’ ? All schools whose second letter is ‘i’. N ... WHERE name LIKE ‘_%_’ ? All schools whose name is at least 2 letters long.

16 Advanced :: GROUP BY (Not in GCSE syllabus) Sometimes you might want to use aggregrate functions like SUM and COUNT, but where the rows are first grouped in some way. For example, we might want to find the number of schools by country. SELECT country, COUNT(*) FROM drfrostmaths_school GROUP BY country Name Country Hogwarts England Tiffin School England Country COUNT(*) England 2 Name Country Hogwarts England Tiffin School England Beauxbatons Academy France La Baguette Ecole France British School of Paris France Durmstrang Bulgaria Name Country Beauxbatons Academy France La Baguette Ecole France British School of Paris France Country COUNT(*) France 3 Name Country Durmstrang Bulgaria Country COUNT(*) Bulgaria 1 The table is first split by whatever field we’re grouping by. The fields are then selected (including any aggregrate functions) for each table. Country COUNT(*) England 2 France 3 Bulgaria 1 The tables are then put back together.

17 Further Examples Q “Get the total number of points for schools by gender” ? SELECT gender, SUM(totalpoints) FROM drfrostmaths_school GROUP BY gender “Get the most points earned for a school by town, restricted to English towns beginning with the letter ‘K’” Q ? SELECT town, MAX(totalpoints), COUNT(*) FROM drfrostmaths_school WHERE country = 'England' AND town LIKE 'K%' GROUP BY town

18 Advanced :: Joins (Not in GCSE syllabus) Sometimes two database tables might have a field in common. For example drfrostmaths_user, containing all user accounts, has a field sid with the id number of the student’s school. drfrostmaths_school also has this field! drfrostmaths_user drfrostmaths_school uid firstname surname sid 3011 Jamie Oliver 204 3012 Jamie Theakston 203 3013 Freddie Kruger 37 sid name 37 Evil High 203 Grange Hill 204 Hogwarts “Produce a table of rows with the name of each student and the name of the school they attend.” firstname surname name Jamie Oliver Hogwarts Jamie Theakston Grange Hill Freddie Kruger Evil High SELECT firstname, surname, name FROM drfrostmaths_school s INNER JOIN drfrostmaths_user u ON u.sid = s.sid And merges the rows using sid as the common column. Note we’ve named the tables s and u (known as aliases), so that we can be explicit about which table we’re using the sid field from. INNER JOIN joins the first table to the second…

19 Advanced :: Joins (Not in GCSE syllabus) “Find the names of any school who has a student called Jamie.” ? SELECT DISTINCT s.name FROM drfrostmaths_school s INNER JOIN drfrostmaths_user u ON u.sid = s.sid WHERE u.firstname = ‘Jamie’ Note that without the DISTINCT keyword, then were Abingdon School for example to have 5 students called Jamie, “Abingdon School” would appear 5 times in our table. DISTINCT ensures each name only appears once.

20 Advanced :: Joins a b 5 ‘cat’ 7 ‘dog’ 11 ‘rabbit’ b c ‘cat’ 0 ‘cat’ 1
(Not in GCSE syllabus) Table B Table A INNER JOIN ON A.b = B.b a b 5 ‘cat’ 7 ‘dog’ 11 ‘rabbit’ b c ‘cat’ 0 ‘cat’ 1 ‘rabbit’ 2 ‘rabbit’ 3 ‘rabbit’ 4 ‘dog’ 5 INNER JOIN works by giving all possible combinations of rows in table A merged with rows in table B, subject to them having the same value for the field we’re joining on, as per this example. While for example the 3rd row in table A can ‘join’ with three different rows in table B, in practice we’d usually only expect one thing to match in the second table. a b c 5 ‘cat’ 0 5 ‘cat’ 1 7 ‘dog’ 5 11 ‘rabbit’ 2 11 ‘rabbit’ 3 11 ‘rabbit’ 4

21 N Super Advanced :: Nested Queries
“Find the names of all schools with at least 1000 registered users.” Initially, let’s group users in the main user table by school, and get the count within each school: SELECT sid, COUNT(*) FROM drfrostmaths_user GROUP BY sid We can use a join to also get the names of these schools: SELECT u.sid, COUNT(*) AS cnt, s.name FROM drfrostmaths_user u INNER JOIN drfrostmaths_school s ON u.sid = s.sid GROUP BY u.sid Unfortunately, because SQL evaluates the WHERE clause before the SELECT, we can’t just plop “WHERE cnt > 1000” near the end of our query (as the alias cnt won’t have been declared yet). We instead have to nest the table outputted by the above query within another SELECT statement! * Note that any nested table must be given an alias, in this case s2. SELECT name, cnt FROM (SELECT u.sid, COUNT(*) as cnt, s.name FROM drfrostmaths_user u INNER JOIN drfrostmaths_school s ON u.sid = s.sid GROUP BY u.sid) s2 WHERE s2.cnt>1000

22 Advanced :: Database Efficiency
Databases can get very large. The table on DrFrostMaths which stores students’ answers for example is 27 million rows! Even simple queries like: SELECT * FROM drfrostmaths_useranswer WHERE uid=‘19483’ (to find all the answers by that student) would take a long time if we had to check all 27 million rows… The solution is to create ‘indexes’ on particular fields of a table. These are separate (usually tree-like) data structures which efficiently store the values for that field, allowing algorithms such as binary search (which we will cover separately). It’s possible to have an index of multiple columns. For example, I often have to make points calculations for a particular school in a particular time range. There’s therefore an index on the school id (sid) combined with the date the question was answered (dateanswered) uid 200 400 600 50 100 75 Row 274 Row 276 This tells us the indexes use BTREEs as their data structure.

23 Review ? ? ? ? ? What is the difference between the wildcards * and %?
‘*’ represents ‘all fields’, e.g. SELECT * FROM mytable. ‘%’ is used within a string to match any text, e.g. SELECT * FROM mytable WHERE fullname LIKE ‘J%’ would match anyone whose name starts with J. How would select people from a table (people) where either their age is above 40 or they earn more than £100,000? SELECT * FROM people WHERE age > 40 OR earnings > How would we then sort these people by name alphabetically ascending? SELECT * FROM people WHERE age > 40 OR earnings > SORT BY name ASC How would we determine the total earnings of people aged under 40? SELECT SUM(earnings) FROM people WHERE age < 40 How could we work out the average earnings by country? (Not in GCSE syllabus) SELECT country, AVG(earnings) FROM people GROUP BY country ? ? ? ? ?

24 Coding Mini-Tasks Return to the DrFrostMaths site to complete the various mini-coding tasks on searching data.


Download ppt "Programming Techniques :: Searching Data"

Similar presentations


Ads by Google