Lecture 9 Using Structured Query Language (SQL) Jeffery S. Horsburgh Hydroinformatics Fall 2012 This work was funded by National Science Foundation Grant EPS
2 of 34 Objectives Retrieve and use data from data models used in Hydrology such as the Observations Data Model (ODM) Introduce the syntax of Structured Query Language (SQL) for common query types Construct SQL queries to retrieve data
3 of 34 What is SQL? Special purpose programming language for managing data in relational database management systems (RDBMS) Adopted by the American National Standards Institute (ANSI) and the International Standards Organization (ISO) as the standard data access language Set of standard commands + proprietary extensions – “SELECT” – “INSERT” – “UPDATE” – “DELETE” – … Mostly human readable
4 of 34 Ways to Execute SQL Commands Through a database client application like SQL Server Management Studio Via code (e.g., Visual Basic, C#, Java, etc.) that sends a query to a database and returns results
5 of 34 Microsoft SQL Server and SQL Server Management Studio
6 of 34 Little Bear River ODM Connection Info ServerName: hydroserver.uwrl.usu.edu Database: LittleBearRiverODM UserName: Hydroinformatics Password: F4ll2012!!
7 of 34 Observations Data Model
8 of 34 Selecting Data “SELECT” is used to query the database and retrieve data that match specified criteria SELECT is the core of SQL and covers the vast majority of queries SELECT statement syntax: SELECT Field_1, Field_2, Field_n FROM TableName
9 of 34 Example Select Queries Select all fields from a table: SELECT * FROM Sites – The “*” means – give me all of the fields Retrieve only selected fields from a table: SELECT SiteID, SiteCode, SiteName FROM Sites
10 of 34 Adding Criteria to SELECT Queries The “WHERE” clause specifies which data values or records will be returned based on criteria Conditional operators used with the WHERE clause: = Equal > Greater than < Less than <= Less than or equal >= Greater than or equal <> Not equal to LIKE Match a substring, with “%” as a wildcard character IN/NOT IN Supply a list of items to test BETWEEN Test between two given values …
11 of 34 Adding Criteria to SELECT Queries Syntax for adding criteria to a SELECT query: SELECT Field_1, Field_2, Field_n FROM TableName WHERE Field_1 = SomeCondition AND/OR Field_2 = AnotherCondition
12 of 34 Adding Criteria to SELECT Queries Example: “Which sites in the database are north of degrees latitude?” SELECT * FROM Sites WHERE Latitude > Latitude > ?
13 of 34 Multiple Criteria and Boolean Operators AND – both sides must be true OR – either side can be true SELECT * FROM Sites WHERE SiteID = 1 AND SiteID = 2 Returns no results (0 records) SELECT * FROM Sites WHERE SiteID = 1 OR SiteID = 2 Returns 2 records
14 of 34 Aggregate Functions Compute against a column of numeric data MIN – Returns the smallest value in a given selection MAX – Returns the largest value in a given selection SUM – Returns the sum of numeric values in a given selection AVG – Returns the average of numeric values in a given selection COUNT – Returns the total number of values in a given selection COUNT(*) – Returns the number of records in a table
15 of 34 Aggregation Example 1 Example: “Give me the average quality controlled (QualityControlLevelID = 1) turbidity (VariableID = 6) value in the Little Bear River at Mendon Road (SiteID = 1). SELECT AVG(DataValue) FROM DataValues WHERE SiteID = 1 AND VariableID = 6 AND QualityControlLevelID = 1
16 of 34 GROUP BY Clause Used with Aggregate functions Groups records into sets for which the aggregate function should be evaluated When using aggregate functions, every selected field must be part of either an aggregate function or a “GROUP BY” clause
17 of 34 Aggregation Example with GROUP BY ValueIDSiteIDVariableIDDateTimeDataValue 1131/1/ /2/ /1/ /2/ SiteIDVariableIDAvgDataValue SELECT AVG(DataValue) AS AvgDataValue FROM DataValues GROUP BY SiteID, VariableID DataValues Result
18 of 34 Example GROUP BY Clause “Give me the average value of quality controlled (QualityControlLevelID=1) turbidity (VariableID=6) for each Site.” SELECT SiteID, AVG(DataValue) FROM DataValues WHERE VariableID = 6 AND QualityControlLevelID = 1 GROUP BY SiteID The “GROUP BY” clause ensures that the query calculates an average value for each unique SiteID.
19 of 34 Sorting Results Sort query results base on one or more fields Example: “Give me quality controlled (QualityControlLevelID = 1) water temperature observations (VariableID = 36) for SiteIDs 1 and 2 in 2008, order the results by LocalDateTime in ascending order.” SELECT * FROM DataValues WHERE (SiteID = 1 OR SiteID = 2) AND VariableID = 36 AND QualityControlLevelID = 1 AND LocalDateTime >= '1/1/2008' AND LocalDateTime < '1/1/2009' ORDER BY SiteID, LocalDateTime ASC
20 of 34 Selecting from More than One Table The “JOIN” statement makes queries relational Joins allow you to select information from more than one table using one SELECT statement JOIN syntax: SELECT LeftTable.Field1, LeftTable.Field1, RightTable.Field1, RightTable.Field2 FROM LeftTable Join_Type RightTable ON JoinCondition
21 of 34 Join Example SiteIDSiteName 1Little Bear River 2Logan River ValueIDSiteIDVariableIDDataValue SiteIDSiteNameVariableIDDataValue 1Little Bear River Logan River SELECT Sites.SiteID, Sites.SiteName, Observations.VariableID, Observations.DataValue FROM Sites INNER JOIN Observations ON Sites.SiteID = Observations.SiteID Sites (LeftTable) Observations (RightTable) Result
22 of 34 Types of Joins INNER JOIN: Takes every record in the LeftTable and looks for 1 or more matches in the RightTable based on the JoinCondition. All matched records are added to the result. OUTER JOIN: Brings two tables together but includes data even if the JoinCondition does not find matching records – 3 Variations: LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN
23 of 34 Example Using Joins “What are the names of the variables that have been measured in the Little Bear River at Mendon Road?” SELECT DISTINCT Sites.SiteCode, Sites.SiteName, Variables.VariableName FROM Sites INNER JOIN DataValues ON Sites.SiteID = DataValues.SiteID INNER JOIN Variables ON DataValues.VariableID = Variables.VariableID WHERE Sites.SiteCode = 'USU-LBR-Mendon' ORDER BY VariableName ASC “DISTINCT” ensures that I only get unique combinations A nice discussion on Joins -
24 of 34 Quick Summary: Formulating a SQL Statement 1. Identify the field(s) containing the source data SELECT Field_1, Field_2, Field_n 2. Identify the table(s) where the fields are located FROM Table_1 3. Specify criteria to narrow the results WHERE Field_1 = SomeCriteria 4. Determine the order to present records in the results ORDER BY Field_1 ASC
25 of 34 Challenge Questions “How many observations of quality controlled (QualityControlLevelID = 1) water temperature (VariableID = 36) are there in the Little Bear River at Mendon Road (SiteID = 1)?” “What are the maximum and minimum values of quality controlled water temperature in the Little Bear River at Mendon Road?” “Given that we have continuous raw data up to the present time in the database, how far behind is Jeff’s team in creating quality controlled water temperature data for this site?”
26 of 34 Solutions SELECT COUNT(*) FROM DataValues WHERE SiteID = 1 AND VariableID = 36 AND QualityControlLevelID = 1 SELECT MAX(DataValue) AS MaxTemp, MIN(DataValue) AS MinTemp FROM DataValues WHERE SiteID = 1 AND VariableID = 36 AND QualityControlLevelID = 1 AND DataValue <> SELECT MAX(LocalDateTime) AS LastDateTime FROM DataValues WHERE SiteID = 1 AND VariableID = 36 AND QualityControlLevelID = 1
Advanced SQL Functionality
28 of 34 Advanced SQL Functions (1) Arithmetic functions Example: “Add a constant value to water level measurements to convert from gage height to water surface elevation.” SELECT LocalDateTime, DataValue AS Elevation FROM DataValues WHERE SiteID = 1 AND VariableID = 13 AND QualityControlLevelID = 1 ORDER BY LocalDateTime ASC
29 of 34 Advanced SQL Functions (2) Mathematical Functions ABSDEGREESRAND ACOSEXPROUND ASINFLOORSIGN ATANLOGSIN ATN2LOG10SQRT CEILINGPISQUARE COSPOWERTAN COTRADIANS
30 of 34 Advanced SQL Functions (3) Date/time functions Example: “Give me the average water temperature (VariableID = 36) in the Little Bear River at Mendon Road (SiteID = 1) for each day of the year.” SELECT DATEPART(mm,LocalDateTime) AS theMonth, DATEPART(dd,LocalDateTime) AS theDay, AVG(DataValue) AS AvgTemp FROM DataValues WHERE SiteID = 1 AND VariableID = 36 AND QualityControlLevelID = 1 AND DataValue <> GROUP BY DATEPART(mm,LocalDateTime), DATEPART(dd,LocalDateTime) ORDER BY theMonth, theDay
31 of 34 Other Things You can do with SQL Create databases Create tables Insert data into tables Update exiting records Delete records, tables, databases Create users and permissions …
32 of 34 SQL Queries Made Easier Using the SQL Server Query Designer
33 of 34 Summary SQL provides a very powerful standard language for querying table-based data SQL enables you to quickly isolate subsets of data SQL is mostly standardized, with some vendor- specific extensions Most database functions can be automated using SQL
34 of 34 Resources for Learning SQL Microsoft Developer Network (MSDN) SQL Reference – us/library/bb510741%28v=sql.105%29.aspx us/library/bb510741%28v=sql.105%29.aspx Google Various books – but may want to start with one that is specific to the RDBMS you plan to use