Types of Joins Farrokh Alemi, Ph.D.

Slides:



Advertisements
Similar presentations
© 2007 by Prentice Hall (Hoffer, Prescott & McFadden) 1 Joins and Sub-queries in SQL.
Advertisements

Chapter 4 Joining Multiple Tables
A Guide to SQL, Seventh Edition. Objectives Use joins to retrieve data from more than one table Use the IN and EXISTS operators to query multiple tables.
1 Combining (with SQL) HRP223 – 2010 October 27, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation.
Introduction to Oracle9i: SQL1 Basic SQL SELECT Statements.
Inner join, self join and Outer join Sen Zhang. Joining data together is one of the most significant strengths of a relational database. A join is a query.
Entity Relationship Diagram Farrokh Alemi Ph.D. Francesco Loaiza, Ph.D. J.D. Vikas Arya.
CPS120: Introduction to Computer Science Information Systems: Database Management Nell Dale John Lewis.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
Chapter 9 Joining Data from Multiple Tables
Programming using C# Joins SQL Injection Stored Procedures
1 Intro to JOINs SQL INNER JOIN SQL OUTER JOIN SQL FULL JOIN SQL CROSS JOIN Intro to VIEWs Simple VIEWs Considerations about VIEWs VIEWs as filters ALTER.
Chapter 4 Multiple-Table Queries
CS146 References: ORACLE 9i PROGRAMMING A Primer Rajshekhar Sunderraman
Chapter 4Introduction to Oracle9i: SQL1 Chapter 4 Joining Multiple Tables.
1 SQL III CIS2450 Advanced Programming Concepts. 2 The Join Operation It is one of the most important features of a relational system that it allows you.
In this session, you will learn to: Query data by using joins Query data by using subqueries Objectives.
IS2803 Developing Multimedia Applications for Business (Part 2) Lecture 5: SQL I Rob Gleasure robgleasure.com.
MySQL Tutorial. Databases A database is a container that groups together a series of tables within a single structure Each database can contain 1 or more.
4 Copyright © Oracle Corporation, All rights reserved. Displaying Data from Multiple Tables.
Database Constraints ICT 011. Database Constraints Database constraints are restrictions on the contents of the database or on database operations Database.
Rob Gleasure robgleasure.com
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Oracle Join Syntax.
Quiz Questions Q.1 An entity set that does not have sufficient attributes to form a primary key is a (A) strong entity set. (B) weak entity set. (C) simple.
Performing Mail Merges
Displaying Data from Multiple Tables
Displaying Data from Multiple Tables
SQL – Column constraints
Dead Man Visiting Farrokh Alemi, PhD Narrated by …
Prof: Dr. Shu-Ching Chen TA: Yimin Yang
SQL Text Manipulation Farrokh Alemi, Ph.D.
Prof: Dr. Shu-Ching Chen TA: Hsin-Yu Ha
Linked List Lesson xx   In this presentation, we introduce you to the basic elements of a linked list.
Graphical Interface for Queries
Displaying Data from Multiple Tables Using Joins
GROUP BY & Subset Data Analysis
SQL for Predicting from Likelihood Ratios
JOINS (Joinining multiple tables)
SQL for Calculating Likelihood Ratios
SQL for Cleaning Data Farrokh Alemi, Ph.D.
SELECT & FROM Commands Farrokh Alemi, PhD
Rank Order Function Farrokh Alemi, Ph.D.
Oracle Join Syntax.
Date Functions Farrokh Alemi, Ph.D.
Prof: Dr. Shu-Ching Chen TA: Haiman Tian
Creating Tables & Inserting Values Using SQL
Procedures Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi
Rob Gleasure robgleasure.com
Relationships as Primary & Foreign Keys
Constructing a Multi-Morbidity Index from Simulated Data
Cursors Organized by Farrokh Alemi, Ph.D. Narrated by Yara Alemi
Dead Patients Visiting
Normalization Organized by Farrokh Alemi, Ph.D.
Indexing & Computational Efficiency
Benchmarking Clinicians using Data Balancing
Propagation Algorithm in Bayesian Networks
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Contents Preface I Introduction Lesson Objectives I-2
Spreadsheets, Modelling & Databases
Rob Gleasure robgleasure.com
Oracle Join Syntax.
Displaying Data from Multiple Tables
Displaying Data from Multiple Tables
Displaying Data from Multiple Tables
Benchmarking Clinicians using Data Balancing
Use of SQL – The Patricia database
JOINS (Joinining multiple tables)
Shelly Cashman: Microsoft Access 2016
Trainer: Bach Ngoc Toan– TEDU Website:
Presentation transcript:

Types of Joins Farrokh Alemi, Ph.D. In this section we discuss how different types of joins work in SQL. This brief presentation was organized by Dr. Alemi.

Link Data across 2 or More Tables The purpose of join commands is to link data across two or more tables. If the data are in more than one table then the tables must be joined before the data are available to the analyst.

Cross Full Left/Right Inner There are four different ways that two tables can be joined. The smallest join is the Inner join. Left or right join increase the size of the resulting table. Full join also increases the size further and Cross join creates the largest resulting table. Joins have a large impact on what records are included in the final table. Every join is a complicated WHERE statement that filters the data in a particular manner.

SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; This slide provides the syntax for the inner join. It is the most common join in SQL code.

SELECT column_name(s) FROM table1 INNER JOIN table2 Unique or Addressed SELECT column_name(s) FROM table1 INNER JOIN table2  ON table1.column_name = table2.column_name; The SELECT portion of the code specifies that column names across the two tables. Column names should be unique across the two tables or must be prefaced with the table name.

SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; The FROM portion of the code specifies the two tables that should be joined.

SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; The reserved words “INNER JOIN” should appear in between the two table names.

SELECT column_name(s) FROM table1 INNER JOIN table2 ON table1.column_name = table2.column_name; This is followed by the ON statement which specifies one field from each table which must be equal before the content of the tables are joined together.

Inner Join Inner join requires that the two variables in two different tables would have exactly the same values. This means that inner join will select the intersection of the two tables. This result in a table that is smaller or same size as the two starting tables. Inner join does not lead to an increase in the table size.

Inner Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021 For example, consider the two tables in this slide, one containing description of diagnosis codes and another reports of encounters that refer to diagnoses. The description table includes text describing the nature of the diagnosis. The encounter table includes no text and just IDs and codes that can be used to connect to the description table. A join can select the text from the “Dx Codes” table and combine it with the data in the encounter table. An inner join will lead to listing of all claims in which the diagnostic code has a corresponding text in diagnosis table.

Inner Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source SELECT c.*, d.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] This is an example of the code that can join these two tables. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Alias Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Alias Alias SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Since table names are often long, to reduce the need to repeat the name of the table as a prefix for each field, one can also introduce aliases in join statements. In this statement, letters d and e are two aliases for the Diagnosis Codes and Encounters tables. Encounters Table Patient ID Provider ID Diagnosis ID Treatment ID Date 1001 12 1 1/12/2020 123 240 5 2 8/13/2012 150 2555 6 9/12/2021

Inner Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Joining the [Dx Codes] and [Encounters] tables will allow us to see a description for each diagnosis. For example, for patient 1001, we read from the encounters table that the diagnosis ID is 1. Then from the Diagnosis Codes table we read that the corresponding description is Acute Myocardial Infarction. Diagnosis ID 1 appears in both tables. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Inner Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source No Matching 6 SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] The situation is not the same for diagnosis id 6. There is no “Diagnosis ID” 6 in the [Dx Codes] table. So the encounter row will be included in the combined table. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021 6

Check Total Rows in Combined & Component Tables Inner Join SELECT d.*, e.* FROM [Dx Codes] d inner join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Check Total Rows in Combined & Component Tables No Match Entire Record Gone Since the description of the diagnosis code is missing, all corresponding claims will also be deleted. Of course this does not make sense. A whole lot of data can be deleted because the diagnosis has no description. Imagine what will happen if we are trying to send a bill for the encounter. To generate the bill we need the description of the diagnosis. We will not have the description of the diagnosis in the combined table. Even worse, the entire record of the visit is gone. We won’t even know that the patient has had a visit. Poof, no description, no data, no bill. Whenever inner joins are used, the analyst must be careful not to inadvertently delete data. Always check the total number of records in the combined table against the records in the component tables. POOF!

All of Left Table Records Listed Left/Right Join Cross Join All of Left Table Records Listed The left and right joins allow the field in one table to be always included and the fields from the other table included only when the IDs match. When the two IDs do not match, the record is still kept but there will be a null value in place of the missing record.

All of Right Table Records Listed Left/Right Join Cross Join All of Right Table Records Listed If the right join is used, then all of the records in the right table are included. Where the record has a match in the left table then that content is included and when the record does not match then a null value is included.

Left/Right Join Cross Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Following with the previous example, in right join, we can display all claims from [Encounters] table and their corresponding text from [Dx Codes] table. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Left/Right Join Cross Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] All of the encounters table records are included. For diagnosis 1 and 5 the description is included from [Dx Codes] table. For the record 6 a null value is included for description and for code. All claims data are still there but the description of the diagnosis is null when the description is not available. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Left/Right Join Cross Join SELECT d.*, e.* FROM [Dx Codes] d right join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall 123 240 5 250 Diabetes mellitus w/out mention of complication 150 2555 6 9/12/2021 Null Right join will lead to listing of all the records in encounter table. Note that diagnosis id 6 is listed even though with description left null.

Left/Right Join Cross Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] In the left join, all records from [Dx Codes] table are included. Diagnoses that do not have an encounter are also included, with the missing encounters having null values. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Left/Right Join Cross Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] The combined table will list all 7 diagnoses. For diagnoses that have encounters, the encounters are listed and for diagnoses that do not have encounters null values are listed. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Left/Right Join Cross Join SELECT d.*, e.* FROM [Dx Codes] d left join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall Null 250 Diabetes mellitus without mention of complication 250.01   Acute MI of anterolateral wall 123 240 5 Diabetes mellitus w/out mention of complication 410.09 Acute myocardial infarction of unspecified source The combined table now has 7 rows, in 4 rows the encounter table is left as null.

All Records Listed All Records Listed Full Join Cross Join All Records Listed All Records Listed If the full join is used, then all of the records in both tables are included. Where both tables match, the information is listed and when one table is missing a match then null values are inserted. Full join includes all the records in both left and right joins.

Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] For code ID 1 and 5, the encounter of patient 1001 and patient 123 are listed. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Code IDs 2, 3, 4, and 7 are included but no encounter information is listed for these codes. Null values are provided. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Cross Join Full Join Dx Codes Table Code ID Code Description 1 410.05 Acute myocardial infarction of anterolateral wall 2 250.00 Diabetes mellitus without mention of complication 3 250.01 4 Acute MI of anterolateral wall 5 Diabetes mellitus w/out mention of complication 7 410.09 Acute myocardial infarction of unspecified source Cross Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] For diagnosis ID 6, the encounter information is listed but the description is left null. Encounters Table Patient ID Provider ID Diagnosis ID Date 1001 12 1 1/12/2020 123 240 5 8/13/2012 150 2555 6 9/12/2021

Full Join SELECT d.*, e.* FROM [Dx Codes] d full join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall Null 250 Diabetes mellitus without mention of complication 250.01   Acute MI of anterolateral wall 123 240 5 Diabetes mellitus w/out mention of complication 150 2555 6 9/12/2021 410.09 Acute myocardial infarction of unspecified source Now the combined table includes null values in both descriptions and encounters.

All Possible Combinations Cross Join Cross Join All Possible Combinations Without any Restrictions In cross join all records of one table are repeated for each record of the other table.

Cross Join Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e ON d.[Code ID] = e.[Diagnosis ID] Note that a cross join does not specify that any fields should match across the two tables.

Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table for 1st Record of Encounter Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 1001 12 1 1/12/2020 410.05 Acute myocardial infarction of anterolateral wall 1/13/2020 250 Diabetes mellitus without mention of complication 1/14/2020 250.01   1/15/2020 Acute MI of anterolateral wall 1/16/2020 Diabetes mellitus w/out mention of complication 1/18/2020 410.09 Acute myocardial infarction of unspecified source The combined table for just the first record of the encounter table will include all 6 descriptions.

Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table for 2nd Encounter From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 123 240 5 8/13/2012 410.05 Acute myocardial infarction of anterolateral wall 8/14/2012 250 Diabetes mellitus without mention of complication 8/15/2012 250.01   8/16/2012 Acute MI of anterolateral wall 8/17/2012 Diabetes mellitus w/out mention of complication 8/19/2012 410.09 Acute myocardial infarction of unspecified source The combined table for the second record of the encounter table will also include all 6 descriptions.

Cross Join SELECT d.*, e.* FROM [Dx Codes] d cross join [Encounter] e Cross Join Combined Table From Encounters Table From Dx Codes Table Patient ID Provider ID Diagnosis ID Date Code Description 123 240 6 8/20/2012 410.05 Acute myocardial infarction of anterolateral wall 8/21/2012 250 Diabetes mellitus without mention of complication 8/22/2012 250.01   8/23/2012 Acute MI of anterolateral wall 8/24/2012 Diabetes mellitus w/out mention of complication 8/25/2012 251 The combined table for the 3rd encounter will also include 6 records, each having a different description

All Possible Combinations Cross Join Cross Join All Possible Combinations Lots of Data Cross join increases the data size considerably. In our example of 3 encounters and 6 descriptions, cross join created a combined table of 3 times 6 or 18 records. In massive data, you will never see cross joins. It would be computationally foolish. In smaller data, one might do a cross join but aggressively reduce some combinations using WHERE command.

Join command Connects two or More tables This presentation was about how the join command connects multiple tables