An Refresher and How-To Profile Data using SQL

Slides:



Advertisements
Similar presentations
CSE 1561 A Brief MySQL Primer Stephen Scott. CSE 1562 Introduction Once you’ve designed and implemented your database, you obviously want to add data.
Advertisements

Data Warehousing - 3 ISYS 650. Snowflake Schema one or more dimension tables do not join directly to the fact table but must join through other dimension.
Concepts of Database Management Sixth Edition
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
® IBM Software Group © 2006 IBM Corporation The Eclipse Data Perspective and Database Explorer This section describes how to use the Eclipse Data Perspective,
Relational DBs and SQL Designing Your Web Database (Ch. 8) → Creating and Working with a MySQL Database (Ch. 9, 10) 1.
Chapter 3 Single-Table Queries
DAY 14: ACCESS CHAPTER 1 Tazin Afrin October 03,
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Analyzing Data For Effective Decision Making Chapter 3.
Using SAS® Information Map Studio
Structured Query Language Chris Nelson CS 157B Spring 2008.
Concepts of Database Management Seventh Edition
 Agenda 2/20/13 o Review quiz, answer questions o Review database design exercises from 2/13 o Create relationships through “Lookup tables” o Discuss.
Created by, Author Name, School Name—State FLUENCY WITH INFORMATION TECNOLOGY Skills, Concepts, and Capabilities.
Week 10 Quiz 9 Answers Group 28 Christine Hallstrom Deena Phadnis.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
T-SQL: Simple Changes That Go a Long Way DAVE ingeniousSQL.com linkedin.com/in/ingenioussql.
Database Fundamental & Design by A.Surasit Samaisut Copyrights : All Rights Reserved.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
SQL Select Statement IST359.
Gold – Crystal Reports Introductory Course Cortex User Group Meeting New Orleans – 2011.
IS201 Agenda: 09/19  Modify contents of the database.  Discuss queries: Turning data stored in a database into information for decision making.  Create.
Thinking in Sets and SQL Query Logical Processing.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
Aggregator Stage : Definition : Aggregator classifies data rows from a single input link into groups and calculates totals or other aggregate functions.
Day 5 - More Complexity With Queries Explanation of JOIN & Examples Explanation of JOIN & Examples Explanation & Examples of Aggregation Explanation &
Jeremy Kingry, eBECS | ADVANCED SQL SERVER FOR ADMINS AND ANALYSTS.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
AP CSP: Cleaning Data & Creating Summary Tables
More SQL: Complex Queries,
Operation Data Analysis Hints and Guidelines
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
MS Access: Creating Advanced Queries
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
T-SQL: Simple Changes That Go a Long Way
T-SQL: Simple Changes That Go a Long Way
The Database Exercises Fall, 2009.
Data Analysis with SQL Window Functions
SQL: Advanced Options, Updates and Views Lecturer: Dr Pavle Mogin
Introduction to Data Warehousing
Star Schema.
Database Management  .
Building and Using Queries
Blazing-Fast Performance:
Introduction to Ms-Access Submitted By- Navjot Kaur Mahi
CIS 336 Competitive Success/snaptutorial.com
CIS 336 Education for Service-- snaptutorial.com.
CIS 336 Teaching Effectively-- snaptutorial.com
Using SQL to Prepare Data for Analysis
ORACLE SQL Developer & SQLPLUS Statements
Structured Query Language (SQL) William Klingelsmith
Structured Query Language
Chapter 8 Working with Databases and MySQL
SQL – Entire Select.
Chapter 4 Summary Query.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Database systems Lecture 3 – SQL + CRUD
Structured Query Language
Retail Sales is used to illustrate a first dimensional model
Data Analysis with SQL Window Functions
Retail Sales is used to illustrate a first dimensional model
Lecture 3 Finishing SQL
M1G Introduction to Database Development
Contents Preface I Introduction Lesson Objectives I-2
Query Functions.
Fordham Connect Train-the-Trainer Training Reports
Aggregate improvement Lost, shrunken, and collapsed Ralph Kimball
T-SQL: Simple Changes That Go a Long Way
Presentation transcript:

An Refresher and How-To Profile Data using SQL SQL Query Review An Refresher and How-To Profile Data using SQL

Goals of the Activity Learn to connect to our IST722 Server and use its databases. Data profiling – “Getting to know your data” Why is it important? How to you use SQL to do it? Why use SQL to do this? Review of SQL Important to the course Mastering SELECT and JOINS Understand the need for data warehousing

Connecting to the IST722 SQL Server in the Labs Server Name ist-cs-dw1.ad.syr.edu Credentials Windows Authentication NOTE: Uses identity of current logged on user, so you must connect from a lab or remote lab computer!

Connecting: Remote Lab Remote Desktop Access to iSchool Labs. Easy to use. Works from anywhere! For when you need to use our software to complete your work for this course, but you cannot get to the computer labs. https://remotelab.ischool.syr.edu

Connecting: Your Own Device IMPORTANT: These instructions are for advanced users. No support will be given to students using this option. Instructions provided as-is. Steps: Install SQL Server Developer Edition. NOTE: It must be this version as SSAS and SSIS are required. Make an Off-Domain Shortcut. https://answers.syr.edu/display/ischool/Connecting+to+Microsoft+ SQL+Server+-+OFF+domain

IST722 Databases on the Server Data Warehouse DB OLTP Source for Sample Data Sources we use in our Project Sample OLTP Retail DB Your workspace for DW data Your workspace for Stage data Netflix movie / DVD rental data Sample Retail data for Labs

What is Data Profiling? The analysis of data sources to be used in the data warehouse. Goals Understand: Structure, content, relationships, and quality of your data and metadata (schema). Recognize the features and limitations of your data source. Checklist, per table: What does a single row in this data set mean? What makes each row unique? (Business Key) What are the relationships among the data? Do you understand the schema? (Column Definitions) A.k.a “Getting Intimate With Your Data”

Data Warehousing is about: empowering business users to make intelligent decisions with their data… …Which is difficult because typically our data is in a format less conducive to this goal.

Business Questions Remote Lab Data Set Questions When was the most recent login? On which days was the Remote Lab Full? What’s the GPA of the last 10 students who logged in? What are the majors of non-ischool students who logged in the last 2 months? How many logins in the month of November 2014? How many freshman used remote lab last semester? How many different / unique Sophomores logged on in December 2014? How many students did not login to remote lab? What was the busiest time of day? Day of week? Which days of the week are busier than the average? How do we go about answering these questions?

SQL SELECT  Reads Data Columns To Display SELECT col1, col2, ... FROM table WHERE condition ORDER BY columns Table to use Only return rows matching this condition Sort row output by data in these columns

SQL SELECT STATEMENT HOW WE “SAY” IT HOW IT IS PROCESSED SELECT (Projection) FROM WHERE ORDER BY FROM WHERE SELECT (Projection) ORDER BY

Examples: On which dates was the Remote Lab Full? When was the most recent login? Before you begin, you’ve got to know your data: What does one row in the table mean? What makes each row unique? What do the columns mean?

JOINS JOINS let you combine data from more than one table into your query output Most of the time you join on PK-FK pairs Any columns of the same type can be joined Most common join is an inner join SELECT * FROM tablea JOIN tableb ON acol = bcol tablea join tableb

Outer Joins For those situations where you need to include rows from one or more tables across the join criteria. In the diagram, let’s assume A == Customers B == Orders

Examples: What’s the GPA of the last 10 students who logged in? What are the majors of non-ischool students who logged in the last 2 months? Is there anyone who used remote lab but is not in the student table?

Aggregates They summarize your data… You no longer get a real row returned, but a summary of rows from the table. Aggregate operators: Count, Count distinct, Sum, Min, Max, Avg GROUP BY Columns which the aggregate operator will summarize by. HAVING Like WHERE only filters after the aggregate has been done.

FULL SQL SELECT STATEMENT HOW WE “SAY” IT HOW IT IS PROCESSED SELECT (Projection) TOP / DISTINCT FROM WHERE GROUP BY HAVING ORDER BY FROM WHERE GROUP BY HAVING SELECT (Projection) ORDER BY TOP / DISTINCT

Examples: How many logins in the month of November 2014? How many undergrads freshman / so / jr / sr used remote lab last semester? How many different / unique Sophomores logged on in December 2014? How many students did not login to remote lab? What was the busiest time of day? Day of week?

Sub Selects The full power of the SELECT statement in that you can use it as a table, column or condition for another SELECT statement. In FROM: SELECT x.* FROM (SELECT * FROM table1) x In Projection: SELECT (SELECT TOP 1 col1 FROM table1 ) col1 FROM table2 y In WHERE: SELECT x.* FROM table1 x WHERE x.col1 IN (SELECT col1 FROM table2 )

Examples Which days of the week are busier than the average (from a count of logins)? For the last semester’s logins for ischool grad students only, list program, total logins per program, total logins for all grads and the percentage total for each program. Example: Program Lgns Total PctOfTot LIS 100 500 20% IM 250 500 50% TNM 150 500 30%

Handling Slow Query Processing Sometimes your source is not responsive enough for data exploration. Fix: Copy source data into your Operational Data Store SELECT * INTO newtable FROM … or INSERT INTO table SELECT * FROM … Set your business keys as primary keys of the table. If performance still lags, Index as required / suggested. This is a temporary solution, just for profiling.

Activity Summary Data Warehousing is about empowering business users to make intelligent decisions with their data. So… How would a business user get these questions answered? This is hard work… and you’re technically savvy. It’s not practical to write an SQL statement for every business question we need answered. That does not scale! We need to find a better way to re-organize this data so that we can accomplish the end goal of empowering business users. That’s rationale behind data warehousing and the essence of what you’ll learn in this course.

An Refresher and How-To Profile Data using SQL SQL Query Review An Refresher and How-To Profile Data using SQL