Alex Ropelewski Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing Bienvenido Vélez

Slides:



Advertisements
Similar presentations
SQL: The Query Language Part 2
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Bioinformatics Data Management Lecture 3 Structured.
Relational Algebra 1 Chapter 5.1 V3.0 Napier University Dr Gordon Russell.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
SQL SQL stands for Structured Query Language SQL allows you to access a database SQL is an ANSI standard computer language SQL can execute queries against.
Database Systems More SQL Database Design -- More SQL1.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 6 th Edition.
SQL Tutorial Introduction to Database. Learning Objectives  Read and write Data Definition grammar of SQL  Read and write data modification statements.
Concepts of Database Management Sixth Edition
Chapter 7: SQL, the Structured Query Language Soid Quintero & Ervi Bongso CS157B.
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Structured Query Language
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Concepts of Database Management, Fifth Edition
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 5 th Edition.
Structured Query Language Chapter Three DAVID M. KROENKE and DAVID J. AUER DATABASE CONCEPTS, 4 th Edition.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Component 4/Unit 6f Topic VI: Create simple querying statements for the database The SELECT statement Clauses Functions Joins Subqueries Data manipulation.
HAP 709 – Healthcare Databases SQL Data Manipulation Language (DML) Updated Fall, 2009.
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Using Special Operators (LIKE and IN)
Concepts of Database Management Seventh Edition
I NTRODUCTION TO SQL II. T ABLE JOINS A simple SQL select statement is one that selects one or more columns from any single table There are two types.
Intro to SQL Management Studio. Please Be Sure!! Make sure that your access is read only. If it isn’t, you have the potential to change data within your.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Essential Computing for Bioinformatics Lecture.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Concepts of Database Management Seventh Edition Chapter 3 The Relational Model 2: SQL.
Course title: Database-ii Chap No: 03 “Advanced SQL” Course instructor: ILTAF MEHDI.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 6 This material was developed by Oregon Health & Science.
WEEK# 12 Haifa Abulaiha November 02,
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
BING 6004: Intro to Computational BioEngineering Spring 2016 Lecture 1: Using Python Expressions and Variables Bienvenido Vélez UPR Mayaguez Reference:
Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning with Python 1 Introduction to Python programming for Bioinformatics.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning.
MARC: Developing Bioinformatics Programs June 2012 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Using Molecular Biology to Teach Computer Science High-level Programming with Python Finding Patterns.
High-level Programming with Python Expressions and Variables Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
1 A Short Introduction to Analyzing Biological Data Using Relational Databases Part II: Creating a Relational Database to Model Biological Data Alex Ropelewski.
MARC: Developing Bioinformatics Programs June 2012 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Using Molecular Biology to Teach Computer Science
A Short Introduction to Analyzing Biological Data Using Relational Databases Part III: Writing Simple (Single Table) Queries to Access Relational Data.
Bioinformatics Data Management
Using Molecular Biology to Teach Computer Science
From 10th Edition and 8th Edition
Chapter # 7 Introduction to Structured Query Language (SQL) Part II.
CS 3630 Database Design and Implementation
Access: SQL Participation Project
Structured Query Language – The Fundamentals
Introduction To Structured Query Language (SQL)
Contents Preface I Introduction Lesson Objectives I-2
Query Functions.
Manipulating Data Lesson 3.
Unit Relational Algebra 1
Presentation transcript:

Alex Ropelewski Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing Bienvenido Vélez University of Puerto Rico at Mayaguez Department of Electrical and Computer Engineering 1 A Short Introduction to Analyzing Biological Data Using Relational Databases Part IV: Using SQL to Summarize Data Across Multiple Tables

The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students from the biological sciences, computer science, and mathematics departments. They have been developed as a part of the NIH funded project “Assisting Bioinformatics Efforts at Minority Schools” (2T36 GM008789). The people involved with the curriculum development effort include: Dr. Hugh B. Nicholas, Dr. Troy Wymore, Mr. Alexander Ropelewski and Dr. David Deerfield II, National Resource for Biomedical Supercomputing, Pittsburgh Supercomputing Center, Carnegie Mellon University. Dr. Ricardo González Méndez, University of Puerto Rico Medical Sciences Campus. Dr. Alade Tokuta, North Carolina Central University. Dr. Jaime Seguel and Dr. Bienvenido Vélez, University of Puerto Rico at Mayagüez. Dr. Satish Bhalla, Johnson C. Smith University. Unless otherwise specified, all the information contained within is Copyrighted © by Carnegie Mellon University. Permission is granted for use, modify, and reproduce these materials for teaching purposes. Most recent versions of these presentations can be found at

Learning Objectives Using JOIN clause to join tables together Using aggregate functions to analyze data Using GROUP BY clause to group and summarize data Using HAVING to select which groups to show in a result 3

SQL The language of relational databases –Data definition/schema creation –Implements relational algebra operations –Data manipulation Insertion Manipulation Updates Removals – A standard (ISO) since

Tables 5 RunNumDateMatrix 17/21/07Pam70 27/20/07Blosom80 Sequences Runs Matches AccessionDescriptionSpecies P14555Group IIA Phospholipase A2Human P81479Phospholipase A2 isozyme IVIndian Green Tree Viper P00623Phospholipase A2Eastern Diamondback Rattlesnake AccessionRunNumeValue P E-32 P E52 P E-33 P E-54 P E-08

Desired Result 6 AccessionDescriptionSpeciesMatrixeValueDate P14555Group IIA Phospholipase A2 HumanPam E-327/21/07 P81479Phospholipase A2 isozyme IV Indian Green Tree Viper Pam E527/21/07 P14555Group IIA Phospholipase A2 HumanBlosom E-337/20/07 P81479Phospholipase A2 isozyme IV Indian Green Tree Viper Blosom E-547/20/07 P00624Phospholipase A2 Eastern Diamondback Rattlesnake Blosum E-087/20/07

SQL: Joining Tables Tables can be joined together based on common attributes: 7 SELECT Matches.Accession, Description, Species, Matrix, eValue, Date FROM Matches INNER JOIN Runs ON Matches.RunNum=Runs.RunNum INNER JOIN Sequences ON Sequences.Accession=Matches.Accession

SQL: JOIN Clause Used to merge two tables together Basic types of joins: –INNER; return tuples where the value of the joined attribute exists in both tables –OUTER; does not require the value of the joined attribute to exists in both tables LEFT OUTER; return all tuples from table listed first even if no match in second table RIGHT OUTER; return all tuples from table listed second even if no match in first table 8

SQL: MIN aggregate function Used to collapse attribute values, reporting the minimum value 9 Matches AccessionRunNumeValue P E-32 P E-52 P E-33 P E-54 P E-08 Select Result SELECT MIN(eValue) FROM Matches eValue 1.20 E-54

SQL: COUNT aggregate function Used to collapse attribute values, reporting the minimum value 10 Matches AccessionRunNumeValue P E-32 P E-52 P E-33 P E-54 P E-08 Select Result SELECT COUNT(eValue) FROM Matches COUNT(eValue) 5

SQL: Aggregate Functions Used to collapse attribute values: –COUNT(attribute) –MIN(attribute) –MAX(attribute) –AVG(attribute) –FIRST(attribute) –LAST(attribute) –SUM(attribute) 11

Analyzing Bioinformatics Data How many times was a sequence found from the database searches? 12 SELECT Accession,COUNT(Accession) FROM Matches GROUP BY Accession Results AccessionCount(Accession) P P P814792

Analyzing Bioinformatics Data How many times was a sequence found from the database searches? Report accessions, sequence descriptions, and number of times found. 13 SELECT Matches.Accession,Sequences.Description, COUNT(Matches.Accession)FROM Matches INNER JOIN Sequences ON Sequences.Accession=Matches.Accession GROUP BY Matches.Accession Results AccessionDescriptionCount(Accession) P00623Phospholipase A21 P14555Group IIA Phospholipase A22 P81479Phospholipase A2 isozyme IV2

Analyzing Bioinformatics Data What sequences were found in only one database search? 14 SELECT Accession,COUNT(Accession) as total FROM Matches GROUP BY Accession HAVING total=1 Results AccessionCount(Accession) P006231

Analyzing Bioinformatics Data What sequences were found in only one database search? Report accessions, sequence descriptions, and number of times found. 15 SELECT Matches.Accession,Sequences.Description, COUNT(Matches.Accession) AS total FROM Matches INNER JOIN Sequences ON Sequences.Accession=Matches.Accession GROUP BY Matches.Accession HAVING total=1 Results AccessionDescriptionTotal P00623Phospholipase A21

Analyzing Bioinformatics Data What sequences were found in only one database search? Report accessions, sequence descriptions and matrix. 16 SELECT Matches.Accession,Sequences.Description, Runs.Matrix FROM Matches INNER JOIN Sequences ON Sequences.Accession=Matches.Accession JOIN Runs ON Matches.RunNum=Runs.RunNum GROUP BY Matches.Accession HAVING COUNT(Matches.Accession)=1 Results AccessionDescriptionMatrix P00623Phospholipase A2Blosum80

Key Concepts The JOIN clause can be used to combine two or more tables into a single result table Aggregate functions can be used to combine attributes from multiple tables and conduct data analysis The GROUP BY clause can be used to form groups of rows and compute attributes for those groups The HAVING clause can be used to select which groups to show in the result 17