Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011.

Similar presentations


Presentation on theme: "Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011."— Presentation transcript:

1 Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011

2 Main topics The command line Data and text files Simple command line tools Third party bioinformatics tools Databases and SQL 3GL’s – industrial strength computing Scripts and pipelines What will not appear and why...

3 Axioms Google it You cannot break your computer Familiarity breeds content Bugs are always your fault (and can always be found - eventually) There are many ways of doing any give task (but one of them may be much better) Computational biologists tend to obfuscate GNU = Gnu is Not Unix

4 WEEK ONE Basic concepts of computer based data, the command line, and some essential command line utilities.

5 The terminal window

6 At the command line D:\projects\current> program -U jackyO -n 25 -i E:\data\experiment-1.txt > output.txt program = program.exe or -rwxr-xr-x 1 migil sysbio 18K Mar 12 16:16 program [migil@INFO-LINUX2 gcc]$ hts_utils

7 PATH F:\data\projects\khokha\exon-capture\blast\data-12may11.txt Also the PATH environment variable which tells the computer where to look for programs

8 Computer data 10001011010010110010001110110101011010100101000010100101... a ‘byte’ 128 64 32 16 8 4 2 1 = 35 (base 10) 00000000= 0 11111111= 255 256 possible characters in the simplest form of computer data

9 The ascii ‘alphabet’ 00000000= 0 \0(the null terminator) 00001001= 9 \tTAB 00001010= 10 \n(line feed) 00001011= 13 \r(return) 00100000= 32 ‘ ‘(space) 00110000= 48 ‘0’ 00111001= 57 ‘9’ : : 01000001= 65 ‘A’ 01011010= 90 ‘Z’ : :

10 @HWUSI-EAS582_0299:3:1:87:82/1 GGCGACGATACATTCGGATGTCTGCCCTATCAACTTTCGAT +HWUSI-EAS582_0299:3:1:87:82/1 ffbffffff^UYbb\ffbee_M_f]UV^JYRXbSdf`bfff @HWUSI-EAS582_0299:3:1:87:1949/1 CGGTTCAGCAGGAATGCCGAGACCGATATCGTATGCCGTCT +HWUSI-EAS582_0299:3:1:87:1949/1 _ebhhghghgedgceg\fggfghggg_]eZ\WWcacMb_ce @HWUSI-EAS582_0299:3:1:87:1688/1 CGGGCATCTAAGGGCATCACAGACCTGTTATTGCTCGATCT +HWUSI-EAS582_0299:3:1:87:1688/1 ffbdd`bfffbf`ZfZddddcbfZfff]ffaddYdeffedd @HWUSI-EAS582_0299:3:1:87:1773/1 CGCTTGTTTCCTGATGTCACATGACAACACAAGATCGATAA +HWUSI-EAS582_0299:3:1:87:1773/1 gfgggggfggggfgggZggfddggggfag\fggafgdgggg @HWUSI-EAS582_0299:3:1:87:1000/1 ATTGTGCAAGTCTCCCAATGTCGATTTAATGAAATCCCTAC +HWUSI-EAS582_0299:3:1:87:1000/1 ggeggegeggcgggfdaaffdgggggfggggggggggdggd @HWUSI-EAS582_0299:3:1:87:842/1 GGAGTATGGTTGCAAAGCTGAAACTTAAAGGAATTGACGGA +HWUSI-EAS582_0299:3:1:87:842/1 gggggggggggggggggggggggfgggggggfggggggggg @HWUSI-EAS582_0299:3:1:88:1826/1 TTCGGACACACTGGGCCCAGATGGCTTCTTGGATTTAGGGG +HWUSI-EAS582_0299:3:1:88:1826/1 cWbgggcfgfc`g^gc^ddgg\ggbcd\`OfdZW`d`dJdg A typical text data file

11 The line ending/platform issue 582_0299:3:1:87:82/1\n\r GATGTCTGCCCTATCAACTTTCGAT\n\r 582_0299:3:1:87:82/1\n\r fbee_M_f]UV^JYRXbSdf`bfff\n\r 582_0299:3:1:87:82/1\n GATGTCTGCCCTATCAACTTTCGAT\n 582_0299:3:1:87:82/1\n fbee_M_f]UV^JYRXbSdf`bfff\n DOS (Windows)Unix/Linux (newer Macs) Programs which read text in a line at a time may give the wrong result if the text file is from an incompatible platform... If you are moving a file from Linux -> Windows run unix2dos first. >unix2dos 582_0299:3:1:87:82/1\n\rGATGTCTGCCCTATCAACTTTCGAT\n\r582_0299:3:1:87.. (really)

12 Command line utilities for manipulating data files http://sourceforge.net/projects/gnuwin32/files/ >grep finds lines matching ‘patterns’ >sed swaps one pattern for another >awk operates on ‘fields’ in a record >cat >tail >more/less >unix2dos/dos2unix AND THEN THERE ARE PIPES... ‘|’ AND REDIRECTS... ‘>’

13 grep A typical fasta sequence file (Xenopus tropicalis mRNAs) : ACAAGCGATCTTGTAGAGCAATTCCAGCAACACTTAAAGGGACTTCTGTCTGTACTTACC AAACTGACAAAAAAGGCCAATCTACTGACAAACTCTTACAAAAAGCAGATTGGCATTGGT GCTCCGAGCAGT >ENSXETG00000025789|ENSXETT00000019729|nodal2 ATGGCAGCCCTAGGAGCCCTCTTTTTATTTGCCATGGCCTCCCTTGTGCACGGGAAGCCC ATTCATTCAGACAGAAAAGGAGCTAAAATCCCTCTGGCAGGATCTAACCTGGGATACAAG AAATCCAGCAATTCATATGGTTCCAGACTGTCGCAGGGTATGAGATACCCCCCTTCCATG ATGCAGTTATACCAGACTCTGATTTTGGGGAATGATACGGATCTGTCAATCCTGGAATAT CCCATCCTGCAGGAATCTGATGCCGTTCTAAGCCTCATTGCAAAAAGTTGCGATGTAGTG GGCAATCGATGGACATTGTCCTTTGACATGTCTTCTATATCGAGCAGCAATGAGCTGAAA TTGGCCGAGTTGAGGATCCGCCTCCCTTCCTTTGAAAGGTCCCAGGAT >ENSXETG00000004295|ENSXETT00000014760|neurod4 AGCCCCGGACTCATTGATATTGGGCACAGCGAGTCTGCCTGGGAGCTGTCCAGCACTCCA TGCTCCTGAAATAACTTGGGCAACAAGTCCGATCTGCCCGCTACTCTGTGCCTCCAGCTC AGGCCCGGGGAGAGGGACCCTGCTGAGCAGGACTCAGGACACTGTTTGAAGATCACATCA AATTCTGCTAATATGTCGGAGATAGTCAGTGTGCATGGGTGGATGGAGGAAGCCCTTAGT TCCCAGGATGAGATGGAGAGGAATCAGCGGCAATCTGCCTATGATATCATTTCAGGTCTG AGTCACGAGGAAAGGTGTAGCATAGATGGAGAAGATGATGATGAAGAAGAAGAGGATGGA GAGAAACCAAAAAAGAGGGGACCCAAAAAAAAGAAGATGACCAAGGCTAGACTGGAGAGG TTTCGTGTGCGCAGAGTAAAAGCCAATGCCAGGGAGCGCACCAGAATGCATGGACTTAAT GATGCCCTAGAAAATTTAAGGAGGGTCATGCCTTGCTATTCCAAAACACAAAA >ENSXETG00000006455|ENSXETT00000014172|camk2g AGACAAGAAACAGTGGAATGTTTGAGAAAGTTCAATGCACGGAGAAAGCTTAAAGGTGCA ATTCTCACAACAATGTTGGTTTCTCGGAATTTTTCTGGAATTGCATTTGGATGCCGAAAA GCTGCATCCACTGTCCCGTGTACCTCTTCAACGGGGGACACTATAACTGGTGTTGGCAGG CAGACCTCCGCCCCTGTTGTGGCCGCCACCAGTGCTGCCAACTTAGTCGAGCAAGCTGCC AAGAGTTTGTTGAACAAGAAGACAGATGGTGTCAAGCCACAGACCAACAACAAAAACAGC ATAATAAGCCCTGCAAAAGAAAACCCCCCATTGCAGACATCAATGGAACCTCAAACAACT GTTGTCCACAATGCAACTGATGGGATAAAAGGATCAACAGAGAGTTGTAACACCACCACT :

14 grep F:dev\sql>grep ACGTACGT my-sequence-file.fasta GAGAAGAACGTACGTGAGTGCAACCCATTCCTGGACCCGGAGATGGTGCGATTCCTCTGG TATTAAGAAAGAAAAGTTACGTACGTTGATAGACCTTGTAAGTGAAGAGAAGATGTTAGA TTCAACCCAACTTACTATGTTACTATTGCTTCATTCCTTTTCACGTACGTCTGGTCTCAA GGAATTAATAACCAGGATTTTGAAGGGGATTGCTACGTACGTCGCAGGTTATCCGGTGGA : F:dev\sql>grep –c ACGTACGT my-sequence-file.fasta 76 F:dev\sql>grep nodal my-sequence-file.fasta >ENSXETG00000025789|ENSXETT00000019729|nodal2 >ENSXETG00000009009|ENSXETT00000019730|nodal3 >ENSXETG00000025789|ENSXETT00000019728|nodal2 >ENSXETG00000009008|ENSXETT00000019726|nodal1 >ENSXETG00000017442|ENSXETT00000037932|nodal5.2 >ENSXETG00000016779|ENSXETT00000036596|nodal5 >ENSXETG00000016778|ENSXETT00000036593|nodal6 >ENSXETG00000023748|ENSXETT00000051228|nodal F:dev\sql>grep –c “>” my-sequence-file.fasta 28937 F:dev\sql>grep “>” my-sequence-file.fasta > my-sequence-file-def-lines.txt grep reads the input file line by line and reports on each line containing the pattern

15 sed F:dev\sql>grep nodal my-sequence-file.fasta > tmp.txt F:dev\sql>more tmp.txt >ENSXETG00000025789|ENSXETT00000019729|nodal2 >ENSXETG00000009009|ENSXETT00000019730|nodal3 >ENSXETG00000025789|ENSXETT00000019728|nodal2 >ENSXETG00000009008|ENSXETT00000019726|nodal1 >ENSXETG00000017442|ENSXETT00000037932|nodal5.2 >ENSXETG00000016779|ENSXETT00000036596|nodal5 >ENSXETG00000016778|ENSXETT00000036593|nodal6 >ENSXETG00000023748|ENSXETT00000051228|nodal F:\dev\sql>sed "s/nodal/xnr/" tmp.txt >ENSXETG00000025789|ENSXETT00000019729|xnr2 >ENSXETG00000009009|ENSXETT00000019730|xnr3 >ENSXETG00000025789|ENSXETT00000019728|xnr2 >ENSXETG00000009008|ENSXETT00000019726|xnr1 >ENSXETG00000017442|ENSXETT00000037932|xnr5.2 >ENSXETG00000016779|ENSXETT00000036596|xnr5 >ENSXETG00000016778|ENSXETT00000036593|xnr6 >ENSXETG00000023748|ENSXETT00000051228|xnr stream editor: reads input line at a time, and operates on the line as requested – typically to make a substitution. E.g. Replace groups of space with a TAB, etc.

16 sed F:dev\sql>more tmp.txt >ENSXETG00000025789|ENSXETT00000019729|nodal2 >ENSXETG00000009009|ENSXETT00000019730|nodal3 >ENSXETG00000025789|ENSXETT00000019728|nodal2 >ENSXETG00000009008|ENSXETT00000019726|nodal1 >ENSXETG00000017442|ENSXETT00000037932|nodal5.2 >ENSXETG00000016779|ENSXETT00000036596|nodal5 >ENSXETG00000016778|ENSXETT00000036593|nodal6 >ENSXETG00000023748|ENSXETT00000051228|nodal F:\dev\sql>sed "s/0/_/" tmp.txt >ENSXETG_0000025789|ENSXETT00000019729|nodal2 >ENSXETG_0000009009|ENSXETT00000019730|nodal3 >ENSXETG_0000025789|ENSXETT00000019728|nodal2 >ENSXETG_0000009008|ENSXETT00000019726|nodal1 >ENSXETG_0000017442|ENSXETT00000037932|nodal5.2 >ENSXETG_0000016779|ENSXETT00000036596|nodal5 >ENSXETG_0000016778|ENSXETT00000036593|nodal6 >ENSXETG_0000023748|ENSXETT00000051228|nodal F:\dev\sql>sed "s/0/_/g" tmp.txt >ENSXETG______25789|ENSXETT______19729|nodal2 >ENSXETG_______9__9|ENSXETT______1973_|nodal3 >ENSXETG______25789|ENSXETT______19728|nodal2 >ENSXETG_______9__8|ENSXETT______19726|nodal1 >ENSXETG______17442|ENSXETT______37932|nodal5.2 >ENSXETG______16779|ENSXETT______36596|nodal5 >ENSXETG______16778|ENSXETT______36593|nodal6 >ENSXETG______23748|ENSXETT______51228|nodal

17 The pipe F:dev\sql>grep “>” my-sequence-file.fasta > tmp.txt F:\dev\sql>sed "s/nodal/xnr/" tmp.txt >ENSXETG00000025789|ENSXETT00000019729|xnr2 >ENSXETG00000009009|ENSXETT00000019730|xnr3 >ENSXETG00000025789|ENSXETT00000019728|xnr2 >ENSXETG00000009008|ENSXETT00000019726|xnr1 >ENSXETG00000017442|ENSXETT00000037932|xnr5.2 >ENSXETG00000016779|ENSXETT00000036596|xnr5 >ENSXETG00000016778|ENSXETT00000036593|xnr6 >ENSXETG00000023748|ENSXETT00000051228|xnr F:dev\sql>grep “>” my-sequence-file.fasta | sed "s/nodal/xnr/" >ENSXETG00000025789|ENSXETT00000019729|xnr2 >ENSXETG00000009009|ENSXETT00000019730|xnr3 >ENSXETG00000025789|ENSXETT00000019728|xnr2 >ENSXETG00000009008|ENSXETT00000019726|xnr1 >ENSXETG00000017442|ENSXETT00000037932|xnr5.2 >ENSXETG00000016779|ENSXETT00000036596|xnr5 >ENSXETG00000016778|ENSXETT00000036593|xnr6 >ENSXETG00000023748|ENSXETT00000051228|xnr

18 (n)awk 306/04/201106/04/2011AshleighHowesOKahowes@nimr.mrc.ac.ukImmunoregO'Garra 206/04/201106/04/2011LeonaGabrysovaOKlgabrys@nimr.mrc.ac.ukImmunoregO'Garra 106/04/201106/04/2011EricDang(ok)edang@nimr.mrc.ac.ukImmunoregO'Garra 07/04/201107/04/2011GuilhermeNevesOKgneves@nimr.mrc.ac.ukMolecular NeuroPachnis 508/04/201108/04/2011MaryWuOKmwu@nimr.mrc.ac.ukSystems BiolSmith 2408/04/201108/04/2011JoseSaldanhaOKjsaldan@nimr.mrc.ac.ukMath BiolGoldstein 808/04/201108/04/2011LaurentDupaysOKldupays@nimr.mrc.ac.ukDev BiolMohun 708/04/201108/04/2011VeroniqueDubocOKvduboc@nimr.mrc.ac.ukDev BiolLogan 2209/04/201109/04/2011MadhuKadekoppalaOKmkadeko@nimr.mrc.ac.ukParasitologyHolder 2109/04/201109/04/2011GeorgeGentschOKggentsc@nimr.mrc.ac.ukSystems BiolSmith 1310/04/201111/04/2011SiggiSatoOKssato@nimr.mrc.ac.ukParasitologyHolder 10/04/201110/04/2011JamesBriscoeOKjbrisco@nimr.mrc.ac.ukDev NeurobiolBriscoe 2610/04/201117/04/2011AlexWatsonOKawatson@nimr.mrc.ac.ukSystems BiolSmith 10/04/2011MustafaKhokhaOKmustafa.khokha@yale.eduYALE Awk: manipulates data in structured, tabular, format - reads input one line at a time, but operates on fields. F:dev\sql> nawk -F\t "{ print $4, $7; }" I:\transfer\workshop.txt Ashleigh ahowes@nimr.mrc.ac.uk Leona lgabrys@nimr.mrc.ac.uk Eric edang@nimr.mrc.ac.uk Guilherme gneves@nimr.mrc.ac.uk Mary mwu@nimr.mrc.ac.ukmwu@nimr.mrc.ac.uk : F:dev\sql> nawk -F\t "{ print $4, $7; }" I:\transfer\workshop.txt | sed "s/@/ AT /“ Ashleigh ahowes AT nimr.mrc.ac.uk Leona lgabrys AT nimr.mrc.ac.uk Eric edang AT nimr.mrc.ac.uk Guilherme gneves AT nimr.mrc.ac.uk Mary mwu AT nimr.mrc.ac.uk :

19 Now let’s go and get some data... Download the set of gene locus coordinates for your model organism from BioMart (Ensembl).

20 Catches and other platform specific stuff Paths: folder dividers ‘/’ in mac/linux ‘\’ in windows Quotes marks: you may have to experiment with double and single quotes, and even sometimes the backquote. With sed you may or may not have to quote the command string, i.e. >sed ‘s/nodal/xnr/’ OR >sed “s/nodal/xnr/” or no quotes at all. Unix/Linux is case sensitive: Hello != hello ^C (control-C) is your get out of jail free card!

21 Next week Download and install MySQL…

22 WEEK TWO Relational databases and the structured query language (SQL), and an introduction to MySQL.

23 What is a database? A collection of data tables, or other structures, managed by a computer program and queryable by some sort of (usually) standard language. We will look at relational databases where the data are help in strict column based tables, and queried with SQL – the structured query langauge. This is a standard across a wide range of products: MySQL, Oracle, Sybase, MS-SQLServer PostGres, etc. Database server: the computer program that manages everything Database: a defined area which contains a set of related data tables and other entities: rules, datatypes, etc. Table: a single defined object with a fixed (by you) number of columns and any number of rows Datatypes: out of which data structures are built, i.e. you must define the ‘type’ of your data: integer, float, varchar(), text, etc.

24 A table date_appliedfirst_namelast_namestatusemaildivisonlabrole_id 06/04/2011AshleighHowesOKahowes@nimr.mrc.ac.ukImmunoregO'Garra1 06/04/2011LeonaGabrysovaOKlgabrys@nimr.mrc.ac.ukImmunoregO'Garra2 06/04/2011EricDang(ok)edang@nimr.mrc.ac.ukImmunoregO'Garra3 07/04/2011GuilhermeNevesOKgneves@nimr.mrc.ac.ukMolecular Neuro.Pachnis4 08/04/2011MaryWuOKmwu@nimr.mrc.ac.ukSystems BiolSmith2 08/04/2011JoseSaldanhaOKjsaldan@nimr.mrc.ac.ukMath BiolGoldstein5 08/04/2011LaurentDupaysOKldupays@nimr.mrc.ac.ukDev BiolMohun6 08/04/2011VeroniqueDubocOKvduboc@nimr.mrc.ac.ukDev BiolLogan2 09/04/2011MadhuKadekoppalaOKmkadeko@nimr.mrc.ac.ukParasitologyHolder7 09/04/2011GeorgeGentschOKggentsc@nimr.mrc.ac.ukSystems BiolSmith1 11/04/2011SiggiSatoOKssato@nimr.mrc.ac.ukParasitologyHolder7 10/04/2011JamesBriscoeOKjbrisco@nimr.mrc.ac.ukDev NeurobiolBriscoe8 17/04/2011AlexWatsonOKawatson@nimr.mrc.ac.ukSystems BiolSmith1 10/04/2011MustafaKhokhaOKmustafa.khokha@yale.eduYALE 8 course_applicants

25 Another table role_idrole 1PhD Student 2Post Doc 3Visiting Worker 4Research Assistant 5Reseach Technician 6Investigator Scientist 7Senior Investigator Scientist 8Principle Investigator research_roles

26 Creating a table [this is SQL!] create table course_applicants ( date_applied datetime, first_namevarchar(32), last_namevarchar(32), statusvarchar(4) null, emailvarchar(32), divisonvarchar(32), labvarchar(32) null, role_idinteger )

27 Creating a table create table course_applicants ( cap_date_applied datetime, cap_first_namevarchar(32), cap_last_namevarchar(32), cap_statusvarchar(4) null, cap_emailvarchar(32), cap_divisonvarchar(32), cap_labvarchar(32) null, cap_role_idinteger )

28 Creating a table: at the prompt F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >create table course_applicants ( date_applied datetime, etc… >go >(will now create table in the database)

29 Creating a table: in a script (better) use mydb go create table course_applicants ( date_applied datetime, first_namevarchar(32), last_namevarchar(32), statusvarchar(4) null, emailvarchar(32), divisonvarchar(32), labvarchar(32) null, role_idinteger ) go table_course_applicants.sql F:\dev\sql>MySQL –U myId –P myPass < table_course_applicants.sql F:\dev\sql>(table is now created)

30 Entering data into a table use mydb go insert course_applicants ( date_applied, first_name, last_name, email, divison, role_id ) select getdate(), ‘Mike’, ‘Gilchrist’, ‘mgilchr@nimr.mrc.ac.uk’, ‘Sys Bio’, 8 insert course_applicants ( date_applied, first_name, last_name, email, divison, role_id ) select getdate(), ‘Mustafa’, ‘Khokha’, ‘mustafa.khokha@yale.edu’, ‘YALE’, 8 go insert_applicants.sql F:\dev\sql>MySQL –U myId –P myPass < insert_applicants.sql F:\dev\sql>(data is now added to table)

31 Loading a table from a text data file 06/04/2011AshleighHowesOKahowes@nimr.mrc.ac.ukImmunoregO'Garra1 06/04/2011LeonaGabrysovaOKlgabrys@nimr.mrc.ac.ukImmunoregO'Garra2 06/04/2011EricDang(ok)edang@nimr.mrc.ac.ukImmunoregO'Garra3 07/04/2011GuilhermeNevesOKgneves@nimr.mrc.ac.ukMolecular Neuro.Pachnis4 08/04/2011MaryWuOKmwu@nimr.mrc.ac.ukSystems BiolSmith2 08/04/2011JoseSaldanhaOKjsaldan@nimr.mrc.ac.ukMath BiolGoldstein5 08/04/2011LaurentDupaysOKldupays@nimr.mrc.ac.ukDev BiolMohun6 08/04/2011VeroniqueDubocOKvduboc@nimr.mrc.ac.ukDev BiolLogan2 09/04/2011MadhuKadekoppalaOKmkadeko@nimr.mrc.ac.ukParasitologyHolder7 09/04/2011GeorgeGentschOKggentsc@nimr.mrc.ac.ukSystems BiolSmith1 11/04/2011SiggiSatoOKssato@nimr.mrc.ac.ukParasitologyHolder7 10/04/2011JamesBriscoeOKjbrisco@nimr.mrc.ac.ukDev NeurobiolBriscoe8 17/04/2011AlexWatsonOKawatson@nimr.mrc.ac.ukSystems BiolSmith1 10/04/2011MustafaKhokhaOKmustafa.khokha@yale.eduYALE 8 computing-course-applicants.txt F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >LOAD DATA LOCAL FILE computing-course-applicants.txt >INTO TABLE course_applicants >go

32 Using SQL to query/analyse/alter the data F:\dev\sql>MySQL –U myId –P myPass >use mydb >go >select * from course_applicants >go >(outputs data to screen) select * from course_applicants order by date_applied select count(*) ‘applicants’, count(distinct division) ‘divisions’ from course_applicants select lab, count(*) from course_applicants group by lab order by count(*) desc select lab, count(*) from course_applicants group by lab having count(*) > 1 order by lab select * from course_applicants where first_name like ‘J%’ select * from course_applicants where status != ‘OK’

33 Joining tables for more complex queries select first_name, last_name, role from course_applicants, research_roles where course_applicants.role_id = research_roles.role_id(JOIN) select first_name, last_name, role from course_applicants a, research_roles r (table aliases) where a.role_id = r.role_id(JOIN) select cap_first_name, cap_last_name, rro_role from course_applicants, research_roles where cap_role_id = rro_role_id(JOIN) select rro_role ‘role’, count(*) ‘on course’ from course_applicants, research_roles where cap_role_id = rro_role_id(JOIN) group by rro_role order by rro_role

34 delete and update statements /* CHANGE THE ROLE DESCRIPTION */ update research_roles set role = ‘CDF (Post Doc)’ where role = ‘Post Doc’ update research_roles set role = ‘CDF (Post Doc)’ where role_id = 2 /* REMOVE PIs */ delete course_applicants where role_id = 8 delete course_applicants from research_roles where course_applicants.role_id = research_roles.role_id and research_roles.role= ‘Principal Investigator’ : and research_roles.rolelike ‘P% I%’

35 Catches Never use sub-queries: select... from tableA a, tableB b where a.id = b.id and a.id in (select id from tableC where id like ‘NIMR%’ ) SQL can give you the wrong answer in a plausible way... If your query is taking forever you have probably missed out a join condition. You can always add rows of data easily, but adding columns is hard work, and means you will have to re-write code... Long text data (e.g. sequences) can be tough to handle.

36 Exercises Design a table, or tables, to hold personal details (names, etc) and contact information (email, phone numbers, etc.) for people in an organisation. Download the set of gene loci and exon data for your organism from BioMart, design some tables for this data, load the data from your data files, and then query the data for the number of multi-exon vs single exon genes, and find the gene with the most number of exons, etc.

37 Next week Download and install BLAST from NCBI Prepare a two minute verbal presentation of you project


Download ppt "Computational Skills Course week 1 Mike Gilchrist NIMR May-July 2011."

Similar presentations


Ads by Google