Using SQL to Prepare Data for Analysis

Slides:



Advertisements
Similar presentations
Introduction to Structured Query Language (SQL)
Advertisements

Fundamentals, Design, and Implementation, 9/e COS 346 Day 11.
Introduction to Structured Query Language (SQL)
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 2-1 David M. Kroenke’s Chapter Two: Introduction to Structured Query.
Fundamentals, Design, and Implementation, 9/e Chapter 6 Introduction to Structured Query Language (SQL)
Structured Query Language Part I Chapter Three CIS 218.
Structured Query Language Chapter Three (Excerpts) DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 6 th Edition.
Structured Query Language Chapter Three DAVID M. KROENKE’S DATABASE CONCEPTS, 2 nd Edition.
David M. Kroenke and David J. Auer Database Processing: Fundamentals, Design and Implementation Chapter Two: Introduction to Structured Query Language.
Mary K. Olson PS Reporting Instance – Query Tool 101.
Concepts of Database Management Sixth Edition
Chapter 3: SQL – Part I Yong Choi School of Business CSU, Bakersfield.
1ISM - © 2010 Houman Younessi Lecture 3 Convener: Houman Younessi Information Systems Spring 2011.
Microsoft Access 2010 Chapter 7 Using SQL.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Structured Query Language
SQL – Part I Yong Choi School of Business CSU, Bakersfield.
SQL for Data Retrieval. Running Example IST2102 Data Preparation Login to SQL server using your account Download three SQL script files from wiki page.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 16 – SQL SEAN J. TAYLOR.
Introduction to SQL Yong Choi School of Business CSU, Bakersfield.
CPS120: Introduction to Computer Science Information Systems: Database Management Nell Dale John Lewis.
ASP.NET Programming with C# and SQL Server First Edition
LOGO 1 Lab_02: Basic SQL. 2 Outline  Database Tables  SQL Statements  Semicolon after SQL Statements?  SQL DML and DDL  SQL SELECT Statement  SQL.
 SQL stands for Structured Query Language.  SQL lets you access and manipulate databases.  SQL is an ANSI (American National Standards Institute) standard.
Database A collection of related data. Database Applications Banking: all transactions Airlines: reservations, schedules Universities: registration, grades.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.
Structure Query Language SQL. Database Terminology Employee ID 3 3 Last name Small First name Tony 5 5 Smith James
Structured Query Language Chris Nelson CS 157B Spring 2008.
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
Database Processing: Fundamentals, Design, and Implementation, 9/e by David M. KroenkeChapter 6/1 Copyright © 2004 Please……. No Food Or Drink in the class.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
SQL for Data Retrieval. Running Example IST2102 Data Preparation Login to SQL server using your account Select your database – Your database name is.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
1 DBS201: Introduction to Structure Query Language (SQL) Lecture 1.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
1/18/00CSE 711 data mining1 What is SQL? Query language for structural databases (esp. RDB) Structured Query Language Originated from Sequel 2 by Chamberlin.
DAVID M. KROENKE’S DATABASE PROCESSING, 11th Edition © 2010 Pearson Prentice Hall 2-1 David M. Kroenke’s Chapter Two: Introduction to Structured Query.
ECMM6018 Enterprise Networking For Electronic Commerce Tutorial 6 CGI/Perl and databases.
Distribution of Marks For Second Semester Internal Sessional Evaluation External Evaluation Assignment /Project QuizzesClass Attendance Mid-Term Test Total.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
LM 5 Introduction to SQL MISM 4135 Instructor: Dr. Lei Li.
Structured Query Language SQL-II IST 210 Organization of Data IST2101.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
Chapter 12 Introducing Databases. Objectives What a database is and which databases are typically used with ASP.NET pages What SQL is, how it looks, and.
CHAPTER 7 DATABASE ACCESS THROUGH WEB
SQL Query Getting to the data ……..
Lab 13 Databases and SQL.
SQL Implementation & Administration
Fundamentals & Ethics of Information Systems IS 201
Database Management  .
SQL FUNDAMENTALS CDSE Days 2018.
Chapter 7 Working with Databases and MySQL
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
Chapter 8 Working with Databases and MySQL
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Chapter 4 Summary Query.
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Database Principles Constructed by Hanh Pham based on slides from: “Database Processing, Fundamentals, Design, and Implementation”, D. Kroenke, D. Auer,
Copyright © 2012 Pearson Education, Inc. Publishing as Prentice Hall
Database systems Lecture 3 – SQL + CRUD
Introduction To Structured Query Language (SQL)
Contents Preface I Introduction Lesson Objectives I-2
Introduction to SQL Server and the Structure Query Language
Presentation transcript:

Using SQL to Prepare Data for Analysis Dr. John Delano

Agenda Administrative Items What is SQL? Select/From/Where Aggregate Functions Group By Joining Multiple Tables Admin Items: Attendance AIS Data Analysis Challenge

What IS SQL?

What is SQL? Structured Query Language (SQL) was developed by the IBM Corporation in the late 1970’s. SQL was endorsed as a U.S. national standard by the American National Standards Institute (ANSI) in 1992 [SQL-92]. Newer versions exist, and they incorporate XML and some object-oriented concepts.

What is SQL? SQL is not a full featured programming language like C, C#, or Java SQL is a data sublanguage for creating and processing database data and metadata. SQL is ubiquitous in enterprise-class DBMS products. SQL programming is a critical skill.

What is SQL? SQL statements can be divided into three categories: Data definition language (DDL) statements Creating tables, relationships, etc. Data manipulation language (DML) statements Used for queries and data modification SQL/Persistent Stored Modules (SQL/PSM) statements Add procedural programming capabilities

Where is SQL Used? Two places where SQL is used: ETL to pull from operational databases and other data sources to put into a data warehouse, and BI for reporting Focus tonight is on ETL, so we’ll look at how we extract data from an operational database and clean data from other data sources in preparation for analysis. It’s all about laying the foundation.

SQL Tools SQL Server Management Studio MySQL Workbench Oracle Database Client Northwind Database SQL

SQL Server Management Studio Server name: itmdb.cedarville.edu Login: datascience Password: Analytics

SQL Server Management Studio Click New Query

SQL Server Management Studio Change this to Northwind

SQL Server Management Studio This is where you will enter your SQL Queries

SQL Server Management Studio To run a query, click Execute

SELECT…FROM…WHERE

SELECT..FROM The fundamental framework for an SQL query is the SQL SELECT statement. SELECT {ColumnName(s)} FROM {TableName(s)} WHERE {Condition(s)} All SQL statements end with a semi-colon (;).

Columns From One Table What you see here is a snippet of the results from this query on the Northwind database that you can download into a SQL Server instance

Column Order Note how the column order changes, based on the order of the select line

Specifying All Columns All columns are retrieved (but cut off on the screen)

Filtering Rows in One Table Note that we need to mark text-based criteria values in single quotes, but numeric values are used without the quotes.

Filtering Rows in One Table Note that we need to mark text-based criteria values in single quotes, but numeric values are used without the quotes.

Filtering Rows -- AND Note that we need to mark text-based criteria values in single quotes, but numeric values are used without the quotes.

Filtering Rows -- OR Note that we need to mark text-based criteria values in single quotes, but numeric values are used without the quotes.

Filtering Rows -- BETWEEN BETWEEN is inclusive of the end values.

Filtering Rows -- LIKE Note that this retrieves Company names that start with A (case-insensitive, because that is the collation sequence used for my database!) % is a wildcard search character Underscore is a single character search

Filtering Rows -- LIKE Wildcard used on both sides means find a value that has x in it somewhere.

Sorting Rows in One Table

Sorting Rows in One Table

Aggregate Functions

Aggregate Functions COUNT SUM AVG MIN MAX

Using Aggregate Functions in SQL

Using Aggregate Functions in SQL Note that we specify a * for Count, because we are aggregating the entire row, not just a column

Using Aggregate Functions in SQL Note that we specify a * for Count, because we are aggregating the entire row, not just a column

Calculated Columns in SQL Note that we specify a * for Count, because we are aggregating the entire row, not just a column

Group By

GROUP BY Note that to you typically specify an aggregate column on the select line and a non-aggregate included in the group by line All non-aggregate fields on the select line, MUST be in the Group By statement.

JOINING TABLES

Why Join? Consider this diagram. What if I want to know which region has the highest number of employees?

JOIN What if we tried this? Doesn’t work, because RegionDescription is not in the Employees table

JOIN Syntax Joins are used in the FROM clause to connect two or more tables together, based on their common “keys” FROM Table1 JOIN Table2 ON Table1.PrimaryKey = Table2.ForeignKey

JOIN Might want to talk about the tendency to want to use MAX here instead of COUNT. Really what we are looking for is the MAXIMUM COUNT, so COUNT has to go first.

Preparing for Data Analysis

Preparing for Data Analysis First, find out how much data you have (run a select count query on each table) Look for any “dirty” or missing data (run a group by/count query on any description fields) Learn how tables are related (look for primary keys/foreign keys)

For Further Study

For Further Study SQL allows us to use the HAVING clause to specify criteria. How is this different than WHERE? You can create your own aggregate function to use in SQL, using C#. For example, can you figure out how to create a concatenation function to join all the string values in a database column into a comma separated list?

For Further Study There are two types of joins: Inner and Outer. What is the difference? SQL Server also provides the ability to write Common Table Expression queries. What are these, and how might you use them?