Tarik Booker California State University, Los Angeles October 21, 2014 CS 122 Week 5: Functions Tarik Booker California State University, Los Angeles October 21, 2014
What We Will Cover… Functions (interspersed in Chapter 3, Chapter 5) What is a function? String Functions p 117 - 120 Date and Time Functions p 120 - 123 Control Flow Functions p 125 Comparison Functions p 125-126 Numerical Functions (not in book) Aggregate Functions Chapter 5
Definition of a Function What is a function? Relation between a set of inputs, and outputs Each input is related to exactly one output Only one answer! Typically an output f corresponds to an input x f(x) (say “f of x”) Seen this before? Ex: f(x) = x2 What’s the answer? Inputs are sometimes referred to as argument(s) of the function
SQL Functions? Built-in code in MySQL used to perform a specific task. Consists of: Zero or more input values A named function identifier One output value With SQL Functions, we can: Format and extract data Use mathematical computing Convert data
Function Syntax How do I write a function in MySQL? Function_name(argument_list) Argument_list = [arg1, arg2, arg3, …] Note: (For MySQL in Windows, at least) function_name is case insensitive Ex: SELECT LENGTH(‘Hello’); The answer is 5. Why? SELECT NOW(); What is the answer? Why?
Function Syntax (2) Tips: Don’t try to memorize every function! Different functions are used by different DBMS! Vendor lock-in Look them up (in book or internet) Check your parentheses!!! Keep careful watch on parentheses, or: Type out parentheses first, then add arguments Nested Functions Can type each function separately in Notepad++ Substitute functions back in
Function Syntax (3) You can use functions in most SQL clauses: Yes: SELECT, ON, WHERE, GROUP BY (later), HAVING (later), ORDER BY NO: FROM
Strings and String Functions In SQL, strings start with an INDEX OF 1, NOT ZERO!!!!! THIS IS NOT LIKE C, C++, C#, Java!!! (201-203 people, take heed!) Ex: ‘hello’ in SQL Starts with index value 1 ‘hello’ in C,C++,Java starts with index value of 0 SQL: ‘hello’ = ‘1, 2, 3, 4, 5’ SUBSTRING(‘hello’, 2, 5) What is the answer? ‘ello’
String Functions Common String Functions in MySQL: CHAR_LENGTH(str) Character Length LENGTH(str) String Length CONCAT (str1, str2, …) Concatenate CONCAT_WS(separator, str1, str2, …) Concatenate with Separator (a string) FORMAT (number, decimal_places) Numerical Formatting LEFT(str, len), RIGHT(str, len) Take the L or R of a String of Length len LOCATE(substr, str), LOCATE (substr, str, pos) Return the position of a substring (0 if not matching) LOWER(str), UPPER(str) Convert to lowercase or uppercase
String Functions (2) SUBSTRING(str, pos, len) Get the Substring at that Position and Length TRIM(str), LTRIM(str), RTRIM(str) Remove spaces (both or on a side) Can find more at: http://dev.mysql.com/doc/refman/5.6/en/string-functions.html
Date and Time Functions CURDATE() Current Date DATEDIFF(date1, date2) Difference between 2 dates (in days) DATE_ADD(date, INTERVAL value type) Add an INTERVAL to a date (as date) DATE_SUB(date, INTERVAL value type) Subtract an INTERVAL from a date DATE_FORMAT(date, format_string) Select a format for a date SELECT date_format(‘2014-01-31’, ‘%m%d%y’) DAY(date) Returns the day (number) WEEK(date) Returns the week (52 total) MONTH(date) Returns the month (number) YEAR(date) Returns the year (number)
Date and Time Functions (2) DAYOFMONTH(date) DAYOFWEEK(date) DAYOFYEAR(date) NOW() Date and time TO_DAYS(date) What happens if you: SELECT TO_DAYS(‘2014-10-21’); From Year 0. What is that? There are many more date/time functions!!! http://dev.mysql.com/doc/refman/5.6/en/functions.html http://dev.mysql.com/doc/refman/5.6/en/date-and-time-functions.html
The INTERVAL Keyword You can perform math on dates using the INTERVAL keyword! Ex: Using the ADDDATE function: SELECT EntryDate, DATE_ADD(entrydate, INTERVAL 10 year) from Artists SELECT EntryDate, DATE_SUB(entrydate, INTERVAL 10 year) from Artists Can use DAY, MONTH, and YEAR in your intervals SELECT artistname, DATEDIFF(‘2007-12-31 23:59:59’, entrydate) from Artists;
Control Flow Functions IF(condition, true_expression, false_expression) SELECT IF(M.Lastname = ‘Sanders’, ‘yes’, ‘no’) FROM Members M; IFNULL(expr1, expr2) If expr1 is NOT NULL, IFNULL() returns expr1. Otherwise, expr2. SELECT IFNULL(webaddress, “NONE”) FROM Artists;
Comparison Functions GREATEST(comma separated list of values) LEAST(comma separated list of values) What happens if any of the values are null?
Numerical Functions ABS (value) Absolute Value CEIL(value), CEILING(value), FLOOR(value) Ceiling/Floor SIN(radians), COS(radians), TAN(radians) Trigonometric Functions PI() pi POW(X,Y) Exponent (XY) SQRT(X) Square Root ROUND(value), ROUND(value, decimal_places) TRUNCATE(value, decimal_places)
Numerical Functions (2) Formatting vs. Truncating Formatting Rounds the Value Truncating Cuts off the Value (at the position) SELECT FORMAT (3.14159, 4); Ans: 3.1416 SELECT TRUNCATE (3.14159, 4); Ans: 3.1415 Note: FORMAT is a string function and returns a string!! TRUNCATE is a mathematical function, and returns a number!! http://dev.mysql.com/doc/refman/5.6/en/numeric-functions.html
Datatype Conversion MySQL is not very strict about data types! SELECT T.LengthSeconds/60 FROM Tracks T; Lengthseconds is actually an int! This is not C++/Java!
Nested Functions Nested Functions are Quite Common in SQL Nested Functions = Functions within Functions! For Strings: SELECT CONCAT(LEFT(M.Firstname, 1), LEFT(M.LastName, 1)) FROM Members M Dates: SELECT DAY (Curdate()); Control Flow Functions: IF(Grade>90, ‘A’, IF(Grade>80, ‘B’, IF(Grade >70, ‘C’,’NC’))); SELECT Lastname, Birthday, IF(YEAR(Birthday) <1965, ‘old’, IF(YEAR(Birthday)<1975,’medium’, ‘young’)) FROM Members;
Functions and Column Aliases You can use column aliases in any of these clauses: GROUP BY (later) HAVING (MySQL yes, PostgresSQL no) ORDER BY Note: You can’t refer to column aliases in the following clauses: SELECT FROM WHERE
Functions and Column Aliases (2) Note: If you have a long function expression and want to use it in the WHERE clause as well, a column alias won’t work! This will NOT WORK: SELECT ArtistName, YEAR(EntryDate) AS Y FROM Artists WHERE Y > 2000; You have to manually duplicate (copy and paste) the expression in the WHERE clause: This WILL WORK: WHERE YEAR(EntryDate) > 2000;
Aggregate Functions Aggregate – to combine (many parts added together) Aggregate Functions / Non-aggregate Functions Similarities: Both take (some type) of input Both perform operations on the input Both have a single output Differences: Input to aggregate function is a set of data
Aggregate Functions (2) Column Non-aggregate function: SELECT UPPER(LastName) FROM Members; Runs once for each row Aggregate Function: SELECT COUNT(*) All rows are part of input for one function! Row Row Row Column Row Row Row
Aggregate Functions (3) Five Main Aggregate Functions: COUNT(*) COUNT(expression) AVG(expression) MIN(expression) MAX(expression) SUM(expression) Others, but we won’t cover in class (you are not responsible for them)
Aggregate Arguments Every aggregate function takes a single argument May be another function, but not another aggregate function! Argument also may be from a CASE Statement
Aggregate Arguments (2) Ex: SELECT SUM(lengthseconds BETWEEN 60 AND 120) FROM Tracks; This actually produces the same results as: SELECT COUNT(TrackNum) FROM Tracks WHERE Lengthseconds BETWEEN 60 AND 120; Why? What does the first query do? The second? Why do they give the same result?
COUNT COUNT(*) COUNT (expressions) Counts the number of rows in a table Excludes NULLS (doesn’t count them) SELECT COUNT(*) AS ‘Number of Titles’ FROM Titles; Returns 6 COUNT (expressions) Counts the number of rows in a table (same) SELECT COUNT(titleID) AS ‘Number of Titles’ FROM Titles;
More COUNT Examples SELECT COUNT(*) FROM Artists; SELECT COUNT(artistID) FROM Artists; SELECT COUNT(WebAddress) FROM Artists; SELECT COUNT(*) FROM Artists WHERE !(WebAddress is NULL); WHERE WebAddress is NULL; Note : Why does this give you 0 (zero)? SELECT COUNT(WebAddress) FROM Artists WHERE Webaddress IS NULL;
Warnings with COUNT! Be Careful what you Aggregate! Ex: Show the number of artists who have recorded in the alternative genre: Wrong: SELECT COUNT(A.ArtistID) FROM Artists A JOIN Titles T ON (A.ArtistID = T.ArtistID) WHERE T.Genre = ‘Alternative’; Why? This counts duplicates! Correct: SELECT COUNT(DISTINCT A.ArtistID)
SUM (Aggregate Function) SUM(expression) Sums all data under expression Excludes NULLs (doesn’t count NULL as 0) Ex: Take a total of all runtimes: SELECT SUM(Lengthseconds) AS ‘Total Length’ FROM Tracks; Ex (2): Figure out how long the studio “MakeTrax” has been recording music(for tax purposes). Present the output in minutes – call it “MakeTraxMinutes” SELECT SUM(Lengthseconds/60) AS MakeTraxMinutes FROM Tracks JOIN Titles USING(TitleID) JOIN Studios USING(StudioID) WHERE StudioName = 'maketrax';
SUM (Aggregate Function) Steps Beginning Steps: Where the studios at? Studios Where the track lengths at? Tracks What's between? Titles So join Studios, titles, tracks. What do I join on? StudioID TitleID Then? My results. Lengthseconds. Do I want it in seconds? No. Minutes: Lengthseconds / 60 What's my filter condition? The studio name must be Maketrax. How do I check for studio name? There's a column called 'studioname'. Therefore Must check studioname = 'maketrax' Query Construction: What's my order? SELECT, FROM, WHERE What do I select? Results: Lengthseconds / 60. What do I call it? MakeTraxMinutes What do I join from? Tracks JOIN Titles USING(TitleID) JOIN Studios USING(StudioID) What do I check my filter condition with? WHERE clause. What? studio name How? WHERE studioname = 'maketrax' I also need to sum my results: Results are SELECT Lengthseconds/60 - So SELECT SUM(Lengthseconds/60) Final Query: SELECT SUM(Lengthseconds / 60) AS MakeTraxMinutes FROM Tracks JOIN Titles USING(TitleID) JOIN Studios USING(StudioID) WHERE Studioname = 'Maketrax';
AVG (Aggregate Function) AVG (expression) Takes the mean (average) Averages all data under expression Excludes NULLs (doesn’t count NULL as 0) Ex: Average all track lengths that are listed: SELECT AVG(LengthSeconds) FROM Tracks; Ex2: What’s the average length of track produced by studio “MakeTrax”?
AVG (Aggregate Function) (2) Find the average age in years of all members: SELECT AVG(DATEDIFF(CURDATE(), Birthday))/365.25 FROM Members; Why? DATEDIFF – Years between now and birthday Values are returned in days Take the AVG of that Divide by 365.25 365.25? What happens every 4 years?
Excluding Data From Aggregation To include or exclude items from being aggregated, use the WHERE clause Ex: Count the number of male members: SELECT COUNT(*) FROM Members WHERE Gender = ‘M’; Ex: Count the number of female members: WHERE Gender = ‘F’;
AVG Example create table AvgDemo( id int primary key, name varchar(30) not null, sex char, age int); insert into AvgDemo values(1, "Sue", 'f', null); insert into AvgDemo values(2, "Fred", 'm', 20); insert into AvgDemo values(3, "Mary", 'f', 30); insert into AvgDemo values(4, "Marion", null, 40); insert into AvgDemo values(5, "Bob", 'm', 50); insert into AvgDemo values(6, "Jack", 'm', 60); // in a real data base, we wouldn’t use an “age” field // -why not?
AVG Example (2) Lets run: SELECT AVG(age) FROM avgdemo; Do we need to run: SELECT AVG(Age) FROM AvgDemo WHERE Age is NOT NULL; WHERE Sex=‘F’; WHERE Sex=‘M’;
MIN and MAX MIN(expression) MAX(expression) Returns the minimum value under the expression Ex: Return the minimum runtime SELECT MIN(LengthSeconds) AS ‘Shortest Track’ FROM Tracks; MAX(expression) Returns the maximum value under the expression Ex: Return the maximum runtime SELECT MAX(LengthSeconds) AS ‘Longest Track’
More Aggregate Functions There are many statistical aggregate functions: STDDEV_POP(expression) STDDEV_SAMP(expression) Ex: SELECT STDDEV_POP(Age) FROM AvgDemo; VAR_POP(expression) VAR_SAMP(expression) There are many, many more…
Aggregate Function Usage You can use aggregate functions directly in these clauses: SELECT clause HAVING clause (later) ORDER BY clause
Aggregates and the WHERE Clause The WHERE clause filters data from being aggregated Filters the table produced by the FROM clause The WHERE clause filters data BEFORE it is aggregated (counted, summed, …) So, YOU MAY NOT USE AGGREGATE FUNCTIONS DIRECTLY IN THE WHERE CLAUSE We will account for this later…
Aggregates and the WHERE Clause WRONG!!!!! SELECT TrackTitle FROM Tracks WHERE LengthSeconds > AVG(LengthSeconds); This actually tries to filter out something it hasn’t calculated yet (AVG(LengthSeconds)) DO NOT DO THIS!!!!!
How Aggregate Queries Work: It may be helpful to think of Aggregate Queries as a two-step process: Step 1: Filter out things you don’t want to aggregate. Step 2: Aggregate (Count, sum, avg, min, max, etc.) Because the WHERE Clause operates in the FIRST STEP, you can’t use it in the second step. We will address this later…