Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.

Slides:



Advertisements
Similar presentations
Let’s try Oracle. Accessing Oracle The Oracle system, like the SQL Server system, is client / server. For SQL Server, –the client is the Query Analyser.
Advertisements

Introduction to Perl Learning Objectives: 1. To introduce the features provided by Perl 2. To learn the basic Syntax & simple Input/Output control in Perl.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
Introduction to Perl Software Tools. Slide 2 Introduction to Perl l Perl is a scripting language that makes manipulation of text, files, and processes.
30-Jun-15 SQL A Brief Introduction. SQL SQL is Structured Query Language Some people pronounce SQL as “sequel” Other people insist that only “ess-cue-ell”
Guide To UNIX Using Linux Third Edition
A Guide to MySQL 3. 2 Objectives Start MySQL and learn how to use the MySQL Reference Manual Create a database Change (activate) a database Create tables.
Concepts of Database Management Sixth Edition
Microsoft Access 2010 Chapter 7 Using SQL.
Introduction To Databases IDIA 618 Fall 2014 Bridget M. Blodgett.
DAY 21: MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Akhila Kondai October 30, 2013.
Introduction to UNIX/Linux Exercises Dan Stanzione.
MCB Lecture #3 Sept 2/14 Intro to UNIX terminal.
Advanced File Processing
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
Introduction to Access By Mary Ann Chaney and Alicia Harkleroad.
Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.
ASP.NET Programming with C# and SQL Server First Edition
Concepts of Database Management Seventh Edition
Introduction to SQL Steve Perry
Agenda User Profile File (.profile) –Keyword Shell Variables Linux (Unix) filters –Purpose –Commands: grep, sort, awk cut, tr, wc, spell.
Chapter Four UNIX File Processing. 2 Lesson A Extracting Information from Files.
Guide To UNIX Using Linux Fourth Edition
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
Dedan Githae, BecA-ILRI Hub Introduction to Linux / UNIX OS MARI eBioKit Workshop; Nov , 2014.
Introduction to databases and SQL. What is a database?  A database is an organized way of holding together pieces of information  A database refers.
CS 3630 Database Design and Implementation. Your Oracle Account UserName is the same as your UWP username Followed Not case sensitive Initial.
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
PHP MySQL Introduction. MySQL is the most popular open-source database system. What is MySQL? MySQL is a database. The data in MySQL is stored in database.
Web Scripting [PHP] CIS166AE Wednesdays 6:00pm – 9:50pm Rob Loy.
1 By: Nour Hilal. Microsoft Access is a database software where data is stored in one or more Tables. A Database is a group of related Tables. Access.
Advanced File Processing. 2 Objectives Use the pipe operator to redirect the output of one command to another command Use the grep command to search for.
Chapter Five Advanced File Processing Guide To UNIX Using Linux Fourth Edition Chapter 5 Unix (34 slides)1 CTEC 110.
Concepts of Database Management Seventh Edition
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 Structured Query Language (SQL). 2 Contents SQL – I SQL – II SQL – III SQL – IV.
Chapter 10: The Data Tier We discuss back-end data storage for Web applications, relational data, and using the MySQL database server for back-end storage.
Oracle 11g DATABASE DEVELOPMENT LAB1. Introduction  Oracle 11g Database:-  Oracle 11g database is designed for some features, which helps to the organizations.
Computational Skills Course week 3 Mike Gilchrist NIMR May-July 2011.
Open Source Server Side Scripting ECA 236 Open Source Server Side Scripting MySQL – Inserting Data.
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
A Guide to MySQL 3. 2 Introduction  Structured Query Language (SQL): Popular and widely used language for retrieving and manipulating database data Developed.
Prepared by The Smartpath Information Systems
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Course FAQ’s I do not have any knowledge on SQL concepts or Database Testing. Will this course helps me to get through all the concepts? What kind of.
Getting the most out of the workshop Ask questions!!! Don’t sit next to someone you already know Work with someone with a different skillset and different.
16-Dec-15Advanced Programming Spring 2002 sed and awk Henning Schulzrinne Dept. of Computer Science Columbia University.
Lesson 3-Touring Utilities and System Features. Overview Employing fundamental utilities. Linux terminal sessions. Managing input and output. Using special.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 6 – sed, command-line tools wrapup.
Relational Databases: Basic Concepts BCHB Lecture 21 By Edwards & Li Slides:
– Introduction to the Shell 1/21/2016 Introduction to the Shell – Session Introduction to the Shell – Session 3 · Job control · Start,
Simple Queries DBS301 – Week 1. Objectives Basic SELECT statement Computed columns Aliases Concatenation operator Use of DISTINCT to eliminate duplicates.
MICROSOFT ACCESS – CHAPTER 5 MICROSOFT ACCESS – CHAPTER 6 MICROSOFT ACCESS – CHAPTER 7 Sravanthi Lakkimsety Mar 14,2016.
1 Section 1 - Introduction to SQL u SQL is an abbreviation for Structured Query Language. u It is generally pronounced “Sequel” u SQL is a unified language.
3 A Guide to MySQL.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
CSE 374 Programming Concepts & Tools
Presented by: Teererai Marange
PROGRAMMING THE BASH SHELL PART IV by İlker Korkmaz and Kaya Oğuz
ORACLE SQL Developer & SQLPLUS Statements
ISC440: Web Programming 2 Server-side Scripting PHP 3
Guide To UNIX Using Linux Third Edition
Chapter Four UNIX File Processing.
Computer Science Projects Database Theory / Prototypes
Contents Preface I Introduction Lesson Objectives I-2
Spreadsheets, Modelling & Databases
Tutorial 6 PHP & MySQL Li Xu
CS122 Using Relational Databases and SQL
Presentation transcript:

Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011

Main topics The command line Data and text files Simple command line tools Third party bioinformatics tools Databases and SQL 3GL’s – industrial strength computing Scripts and pipelines What will not appear and why...

Axioms Google it You cannot break your computer Familiarity breeds content Bugs are always your fault (and can always be found - eventually) There are many ways of doing any give task (but one of them may be much better) Computational biologists tend to obfuscate GNU = Gnu is Not Unix

WEEK ONE Basic concepts of computer based data, the command line, and some essential command line utilities.

The terminal window

At the command line D:\projects\current> program -U jackyO -n 25 -i E:\data\experiment-1.txt > output.txt program = program.exe or -rwxr-xr-x 1 migil sysbio 18K Mar 12 16:16 program gcc]$ hts_utils

PATH F:\data\projects\khokha\exon-capture\blast\data-12may11.txt Also the PATH environment variable which tells the computer where to look for programs

Computer data a ‘byte’ = 35 (base 10) = = possible characters in the simplest form of computer data

The ascii ‘alphabet’ = 0 \0(the null terminator) = 9 \tTAB = 10 \n(line feed) = 13 \r(return) = 32 ‘ ‘(space) = 48 ‘0’ = 57 ‘9’ : : = 65 ‘A’ = 90 ‘Z’ : :

@HWUSI-EAS582_0299:3:1:87:82/1 GGCGACGATACATTCGGATGTCTGCCCTATCAACTTTCGAT +HWUSI-EAS582_0299:3:1:87:82/1 CGGTTCAGCAGGAATGCCGAGACCGATATCGTATGCCGTCT +HWUSI-EAS582_0299:3:1:87:1949/1 CGGGCATCTAAGGGCATCACAGACCTGTTATTGCTCGATCT +HWUSI-EAS582_0299:3:1:87:1688/1 CGCTTGTTTCCTGATGTCACATGACAACACAAGATCGATAA +HWUSI-EAS582_0299:3:1:87:1773/1 ATTGTGCAAGTCTCCCAATGTCGATTTAATGAAATCCCTAC +HWUSI-EAS582_0299:3:1:87:1000/1 GGAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGA +HWUSI-EAS582_0299:3:1:87:842/1 TTCGGACACACTGGGCCCAGATGGCTTCTTGGATTTAGGGG +HWUSI-EAS582_0299:3:1:88:1826/1 cWbgggcfgfc`g^gc^ddgg\ggbcd\`OfdZW`d`dJdg A typical text data file

The line ending/platform issue 582_0299:3:1:87:82/1\n\r GATGTCTGCCCTATCAACTTTCGAT\n\r 582_0299:3:1:87:82/1\n\r fbee_M_f]UV^JYRXbSdf`bfff\n\r 582_0299:3:1:87:82/1\n GATGTCTGCCCTATCAACTTTCGAT\n 582_0299:3:1:87:82/1\n fbee_M_f]UV^JYRXbSdf`bfff\n DOS (Windows)Unix/Linux (newer Macs) Programs which read text in a line at a time may give the wrong result if the text file is from an incompatible platform... If you are moving a file from Linux -> Windows run unix2dos first. >unix2dos 582_0299:3:1:87:82/1\n\rGATGTCTGCCCTATCAACTTTCGAT\n\r582_0299:3:1:87.. (really)

Command line utilities for manipulating data files >grep finds lines matching ‘patterns’ >sed swaps one pattern for another >awk operates on ‘fields’ in a record >cat >tail >more/less >unix2dos/dos2unix AND THEN THERE ARE PIPES... ‘|’ AND REDIRECTS... ‘>’

grep A typical fasta sequence file (Xenopus tropicalis mRNAs) : ACAAGCGATCTTGTAGAGCAATTCCAGCAACACTTAAAGGGACTTCTGTCTGTACTTACC AAACTGACAAAAAAGGCCAATCTACTGACAAACTCTTACAAAAAGCAGATTGGCATTGGT GCTCCGAGCAGT >ENSXETG |ENSXETT |nodal2 ATGGCAGCCCTAGGAGCCCTCTTTTTATTTGCCATGGCCTCCCTTGTGCACGGGAAGCCC ATTCATTCAGACAGAAAAGGAGCTAAAATCCCTCTGGCAGGATCTAACCTGGGATACAAG AAATCCAGCAATTCATATGGTTCCAGACTGTCGCAGGGTATGAGATACCCCCCTTCCATG ATGCAGTTATACCAGACTCTGATTTTGGGGAATGATACGGATCTGTCAATCCTGGAATAT CCCATCCTGCAGGAATCTGATGCCGTTCTAAGCCTCATTGCAAAAAGTTGCGATGTAGTG GGCAATCGATGGACATTGTCCTTTGACATGTCTTCTATATCGAGCAGCAATGAGCTGAAA TTGGCCGAGTTGAGGATCCGCCTCCCTTCCTTTGAAAGGTCCCAGGAT >ENSXETG |ENSXETT |neurod4 AGCCCCGGACTCATTGATATTGGGCACAGCGAGTCTGCCTGGGAGCTGTCCAGCACTCCA TGCTCCTGAAATAACTTGGGCAACAAGTCCGATCTGCCCGCTACTCTGTGCCTCCAGCTC AGGCCCGGGGAGAGGGACCCTGCTGAGCAGGACTCAGGACACTGTTTGAAGATCACATCA AATTCTGCTAATATGTCGGAGATAGTCAGTGTGCATGGGTGGATGGAGGAAGCCCTTAGT TCCCAGGATGAGATGGAGAGGAATCAGCGGCAATCTGCCTATGATATCATTTCAGGTCTG AGTCACGAGGAAAGGTGTAGCATAGATGGAGAAGATGATGATGAAGAAGAAGAGGATGGA GAGAAACCAAAAAAGAGGGGACCCAAAAAAAAGAAGATGACCAAGGCTAGACTGGAGAGG TTTCGTGTGCGCAGAGTAAAAGCCAATGCCAGGGAGCGCACCAGAATGCATGGACTTAAT GATGCCCTAGAAAATTTAAGGAGGGTCATGCCTTGCTATTCCAAAACACAAAA >ENSXETG |ENSXETT |camk2g AGACAAGAAACAGTGGAATGTTTGAGAAAGTTCAATGCACGGAGAAAGCTTAAAGGTGCA ATTCTCACAACAATGTTGGTTTCTCGGAATTTTTCTGGAATTGCATTTGGATGCCGAAAA GCTGCATCCACTGTCCCGTGTACCTCTTCAACGGGGGACACTATAACTGGTGTTGGCAGG CAGACCTCCGCCCCTGTTGTGGCCGCCACCAGTGCTGCCAACTTAGTCGAGCAAGCTGCC AAGAGTTTGTTGAACAAGAAGACAGATGGTGTCAAGCCACAGACCAACAACAAAAACAGC ATAATAAGCCCTGCAAAAGAAAACCCCCCATTGCAGACATCAATGGAACCTCAAACAACT GTTGTCCACAATGCAACTGATGGGATAAAAGGATCAACAGAGAGTTGTAACACCACCACT :

grep F:dev\sql>grep ACGTACGT my-sequence-file.fasta GAGAAGAACGTACGTGAGTGCAACCCATTCCTGGACCCGGAGATGGTGCGATTCCTCTGG TATTAAGAAAGAAAAGTTACGTACGTTGATAGACCTTGTAAGTGAAGAGAAGATGTTAGA TTCAACCCAACTTACTATGTTACTATTGCTTCATTCCTTTTCACGTACGTCTGGTCTCAA GGAATTAATAACCAGGATTTTGAAGGGGATTGCTACGTACGTCGCAGGTTATCCGGTGGA : F:dev\sql>grep –c ACGTACGT my-sequence-file.fasta 76 F:dev\sql>grep nodal my-sequence-file.fasta >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal3 >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal1 >ENSXETG |ENSXETT |nodal5.2 >ENSXETG |ENSXETT |nodal5 >ENSXETG |ENSXETT |nodal6 >ENSXETG |ENSXETT |nodal F:dev\sql>grep –c “>” my-sequence-file.fasta F:dev\sql>grep “>” my-sequence-file.fasta > my-sequence-file-def-lines.txt grep reads the input file line by line and reports on each line containing the pattern

sed F:dev\sql>grep nodal my-sequence-file.fasta > tmp.txt F:dev\sql>more tmp.txt >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal3 >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal1 >ENSXETG |ENSXETT |nodal5.2 >ENSXETG |ENSXETT |nodal5 >ENSXETG |ENSXETT |nodal6 >ENSXETG |ENSXETT |nodal F:\dev\sql>sed "s/nodal/xnr/" tmp.txt >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr3 >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr1 >ENSXETG |ENSXETT |xnr5.2 >ENSXETG |ENSXETT |xnr5 >ENSXETG |ENSXETT |xnr6 >ENSXETG |ENSXETT |xnr stream editor: reads input line at a time, and operates on the line as requested – typically to make a substitution. E.g. Replace groups of space with a TAB, etc.

sed F:dev\sql>more tmp.txt >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal3 >ENSXETG |ENSXETT |nodal2 >ENSXETG |ENSXETT |nodal1 >ENSXETG |ENSXETT |nodal5.2 >ENSXETG |ENSXETT |nodal5 >ENSXETG |ENSXETT |nodal6 >ENSXETG |ENSXETT |nodal F:\dev\sql>sed "s/0/_/" tmp.txt >ENSXETG_ |ENSXETT |nodal2 >ENSXETG_ |ENSXETT |nodal3 >ENSXETG_ |ENSXETT |nodal2 >ENSXETG_ |ENSXETT |nodal1 >ENSXETG_ |ENSXETT |nodal5.2 >ENSXETG_ |ENSXETT |nodal5 >ENSXETG_ |ENSXETT |nodal6 >ENSXETG_ |ENSXETT |nodal F:\dev\sql>sed "s/0/_/g" tmp.txt >ENSXETG______25789|ENSXETT______19729|nodal2 >ENSXETG_______9__9|ENSXETT______1973_|nodal3 >ENSXETG______25789|ENSXETT______19728|nodal2 >ENSXETG_______9__8|ENSXETT______19726|nodal1 >ENSXETG______17442|ENSXETT______37932|nodal5.2 >ENSXETG______16779|ENSXETT______36596|nodal5 >ENSXETG______16778|ENSXETT______36593|nodal6 >ENSXETG______23748|ENSXETT______51228|nodal

The pipe F:dev\sql>grep “>” my-sequence-file.fasta > tmp.txt F:\dev\sql>sed "s/nodal/xnr/" tmp.txt >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr3 >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr1 >ENSXETG |ENSXETT |xnr5.2 >ENSXETG |ENSXETT |xnr5 >ENSXETG |ENSXETT |xnr6 >ENSXETG |ENSXETT |xnr F:dev\sql>grep “>” my-sequence-file.fasta | sed "s/nodal/xnr/" >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr3 >ENSXETG |ENSXETT |xnr2 >ENSXETG |ENSXETT |xnr1 >ENSXETG |ENSXETT |xnr5.2 >ENSXETG |ENSXETT |xnr5 >ENSXETG |ENSXETT |xnr6 >ENSXETG |ENSXETT |xnr

(n)awk NeuroPachnis BiolSmith BiolGoldstein BiolMohun BiolLogan BiolSmith NeurobiolBriscoe BiolSmith Awk: manipulates data in structured, tabular, format - reads input one line at a time, but operates on fields. F:dev\sql> nawk -F\t "{ print $4, $7; }" I:\transfer\workshop.txt Ashleigh Leona Eric Guilherme Mary : F:dev\sql> nawk -F\t "{ print $4, $7; }" I:\transfer\workshop.txt | sed AT /“ Ashleigh ahowes AT nimr.mrc.ac.uk Leona lgabrys AT nimr.mrc.ac.uk Eric edang AT nimr.mrc.ac.uk Guilherme gneves AT nimr.mrc.ac.uk Mary mwu AT nimr.mrc.ac.uk :

Now let’s go and get some data... Download the set of gene locus coordinates for your model organism from BioMart (Ensembl).

Catches and other platform specific stuff Paths: folder dividers ‘/’ in mac/linux ‘\’ in windows Quotes marks: you may have to experiment with double and single quotes, and even sometimes the backquote. With sed you may or may not have to quote the command string, i.e. >sed ‘s/nodal/xnr/’ OR >sed “s/nodal/xnr/” or no quotes at all. Unix/Linux is case sensitive: Hello != hello ^C (control-C) is your get out of jail free card!

Next week Download and install MySQL…

WEEK TWO Relational databases and the structured query language (SQL), and an introduction to MySQL.

What is a database? A collection of data tables, or other structures, managed by a computer program and queryable by some sort of (usually) standard language. We will look at relational databases where the data are help in strict column based tables, and queried with SQL – the structured query langauge. This is a standard across a wide range of products: MySQL, Oracle, Sybase, MS-SQLServer PostGres, etc. Database server: the computer program that manages everything Database: a defined area which contains a set of related data tables and other entities: rules, datatypes, etc. Table: a single defined object with a fixed (by you) number of columns and any number of rows Datatypes: out of which data structures are built, i.e. you must define the ‘type’ of your data: integer, float, varchar(), text, etc.

A table date_appliedfirst_namelast_namestatus divisonlabrole_id Neuro.Pachnis4 BiolSmith2 BiolGoldstein5 BiolMohun6 BiolLogan2 BiolSmith1 NeurobiolBriscoe8 BiolSmith1 8 course_applicants

Another table role_idrole 1PhD Student 2Post Doc 3Visiting Worker 4Research Assistant 5Reseach Technician 6Investigator Scientist 7Senior Investigator Scientist 8Principle Investigator research_roles

Creating a table [this is SQL!] create table course_applicants ( date_applied datetime, first_namevarchar(32), last_namevarchar(32), statusvarchar(4) null, varchar(32), divisonvarchar(32), labvarchar(32) null, role_idinteger )

Creating a table create table course_applicants ( cap_date_applied datetime, cap_first_namevarchar(32), cap_last_namevarchar(32), cap_statusvarchar(4) null, cap_ varchar(32), cap_divisonvarchar(32), cap_labvarchar(32) null, cap_role_idinteger )

Creating a table: at the prompt F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >create table course_applicants ( date_applied datetime, etc… >go >(will now create table in the database)

Creating a table: in a script (better) use mydb go create table course_applicants ( date_applied datetime, first_namevarchar(32), last_namevarchar(32), statusvarchar(4) null, varchar(32), divisonvarchar(32), labvarchar(32) null, role_idinteger ) go table_course_applicants.sql F:\dev\sql>MySQL –U myId –P myPass < table_course_applicants.sql F:\dev\sql>(table is now created)

Entering data into a table use mydb go insert course_applicants ( date_applied, first_name, last_name, , divison, role_id ) select getdate(), ‘Mike’, ‘Gilchrist’, ‘Sys Bio’, 8 insert course_applicants ( date_applied, first_name, last_name, , divison, role_id ) select getdate(), ‘Mustafa’, ‘Khokha’, ‘YALE’, 8 go insert_applicants.sql F:\dev\sql>MySQL –U myId –P myPass < insert_applicants.sql F:\dev\sql>(data is now added to table)

Loading a table from a text data file Neuro.Pachnis4 BiolSmith2 BiolGoldstein5 BiolMohun6 BiolLogan2 BiolSmith1 NeurobiolBriscoe8 BiolSmith1 8 computing-course-applicants.txt F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >LOAD DATA LOCAL FILE computing-course-applicants.txt >INTO TABLE course_applicants >go

Using SQL to query/analyse/alter the data F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >select * from course_applicants >go >(outputs data to screen) select * from course_applicants order by date_applied select count(*) ‘applicants’, count(distinct division) ‘divisions’ from course_applicants select lab, count(*) from course_applicants group by lab order by count(*) desc select lab, count(*) from course_applicants group by lab having count(*) > 1 order by lab select * from course_applicants where first_name like ‘J%’ select * from course_applicants where status != ‘OK’

Joining tables for more complex queries select first_name, last_name, role from course_applicants, research_roles where course_applicants.role_id = research_roles.role_id(JOIN) select first_name, last_name, role from course_applicants a, research_roles r (table aliases) where a.role_id = r.role_id(JOIN) select cap_first_name, cap_last_name, rro_role from course_applicants, research_roles where cap_role_id = rro_role_id(JOIN) select rro_role ‘role’, count(*) ‘on course’ from course_applicants, research_roles where cap_role_id = rro_role_id(JOIN) group by rro_role order by rro_role

delete and update statements /* CHANGE THE ROLE DESCRIPTION */ update research_roles set role = ‘CDF (Post Doc)’ where role = ‘Post Doc’ update research_roles set role = ‘CDF (Post Doc)’ where role_id = 2 /* REMOVE PIs */ delete course_applicants where role_id = 8 delete course_applicants from research_roles where course_applicants.role_id = research_roles.role_id and research_roles.role= ‘Principal Investigator’ : and research_roles.rolelike ‘P% I%’

Catches Never use sub-queries: select... from tableA a, tableB b where a.id = b.id and a.id in (select id from tableC where id like ‘NIMR%’ ) SQL can give you the wrong answer in a plausible way... If your query is taking forever you have probably missed out a join condition. You can always add rows of data easily, but adding columns is hard work, and means you will have to re-write code... Long text data (e.g. sequences) can be tough to handle.

Exercises Design a table, or tables, to hold personal details (names, etc) and contact information ( , phone numbers, etc.) for people in an organisation. Download the set of gene loci and exon data for your organism from BioMart, design some tables for this data, load the data from your data files, and then query the data for the number of multi-exon vs single exon genes, and find the gene with the most number of exons, etc.

Next week Download and install BLAST from NCBI Prepare a two minute verbal presentation of you project