Download presentation
Presentation is loading. Please wait.
Published byGrant Malone Modified over 9 years ago
1
1 SQL, Databases, and Ensembl Modules
2
2 Please look for next lecture Ensembl API Tutorial: http://www.ensembl.org/info/software/core/core_tutorial.html
3
3 Survey of Databases (if time) Ensembl web interface Ensembl Biomart
4
4 RDBMS Oracle and Sybase (many others) –industry standard, commercial products –development and management tools PostgresSQL –full-featured relational DBMS –open source –found in most linux distributions –handles unusual datatypes well which adds flexibility for future extensions MySQL –open-source relational DBMS –easy to setup and use –Linux/Windows/Mac Each has variations –SQL, datatypes, functions, features
5
5 Structured Query Language (SQL) SQL is the standard language used to create, modify, maintain, and query relational databases. SQL commands are issued within the context of the DBMS interface SQL commands can be passed to the DBMS by other programs (such as perl, PHP, Java, manually, etc.) Many venders have extensions (Sybase-SQL) SQL Example: select FIELD from TABLE where CONDITION select TABLE.FIELD from TABLE where CONDITION select program from software where program like 'blast%';
6
6 SQL datatypes datatype -- describes the data stored in a particular column of a table typically is either numeric or character strings SQL defines subtypes that set different upper limits on the size of text or numerical data Also -- special types such as DATE, MONEY
7
7 SQL datatypes INT FLOAT REAL (larger float) CHAR -- fixed length text string TEXT -- variable length text BLOB -- variable length binary field DECIMAL -- real number stored as character string DATE TIMESTAMP -- value updates every time the record is modified ENUM -- limited set of options (numeric or named) SET -- value is one of a limited set
8
8 Creating a new table in the database CREATE TABLE tablename (columnane type [modifiers], columanme type [modifiers]) create table software_package ( packid INT NOT NULL PRIMARY KEY AUTO_INCREMENT, packname VARCHAR(100), packurl VARCHAR(255), function TEXT, keyword ENUM, os SET, format SET, archfile VARCHAR(255) );
9
9 Inserting data INSERT INTO table(colname1, colname2, columname3) VALUES ('value1','value2','value3')
10
10 Accessing Data -- SELECT SELECT [fields] FROM [table] where [condition] select * from [table] where [condition] select packname, packurl from software select packname, packurl from software where keyword = "sequence alignment"; select packname, packurl from software where packname like "B%";
11
11 Joining multiple tables SELECT can "join" two related tables relationships between tables are created by replicating information ("primary key") from one table as a "foreign key" in another table select attribute from table1, table2 where primarykey1 = foreignkey2 SELECT packname, packurl, reference_date FROM software, reference WHERE software.package_id = reference.package_id AND reference_date >= 1998;
12
12 Example select Clients.Last, Clients.First, Clients.Phone from Clients where Clients.City = “El Paso” order by Clients.Last, Clients.First select Last, First, Phone from Clients where City = "El Paso" order by Last, First
13
13 SQL Basics select * from table_name select count(*) from table_name Find start times of Entertainers where Jameson is the Client Nested query (may not always be supported) –select StartTime from Engagements where ClientID = (select ClientID from Clients where Last = “Jameson”) Join –select Engagements.StartTime, Clients.ClientID from Engagements, Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” –select Engagements.StartTime from Engagements,Clients where Engagements.ClientID = Clients.ClientID and Client.Last = “Jameson” insert into TABLE values (val1, val2, …) insert into TABLE (FIELDa, FIELDb, …FIELDn) values (value1, value2, …) insert into Clients values (“Terry”, “Braun”, 11/10/03,555-6666)
14
14 …as a spreadsheet Agen ts Client s Enter taine rs Enga geme nts Agen ts FirstLastHireD ate Phon e Client ID Agen tID FirstLastPhon e CityEnter taine rID Agen tID FirstLastClient ID Enter taine rID Enga geme ntDat e Start Time Stop Time 100MikeHern ande z 5/17/ 99 553- 3992 9001100Stew art Jame son 553- 3999 El Paso 3000100JohnSlade900130025/3/0 0 3:00 PM 6:00 PM 101GregPierc y 10/1 6/99 790- 3992 9002101Shan non McLai n 555- 5555 Atlan ta 3001101MarkJebav y 900230004/14/ 00 9:00 PM 1:30 AM 102Kath y Ehrlic h 3/2/0 0 351- 4993 9003102Estel a Pund t 541- 4993 El Paso 3002102Teres a Weis s 900330014/2/0 0 1:00 PM 3:30 PM Flattened table structure, and lined up the rows. Appears to be okay…. Except… The relationships in this trivial example are more complicated than a simple flat structure such as a spreadsheet. Example) The first Engagement is between Client 9001, and Entertainer 3002
15
15 SQL Basics delete TABLE where FIELD = value update TABLE set FIELD = value drop DATABASE
16
16 exon transcript_id exon_num sequence_start sequence_stop intron transcript_id intron_num sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source source_id sequence id target_id type sequence chr_name strand genomic_start genomic_stop source source_id refresh target id date gene_name description accession status project id name description date set_table id project_id name date description set_target_join set_id target_id rank cas_rank cas_options
17
17 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name = testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options Sample Data
18
18 Extracting Data from a DB This simple figure provides me with enough understanding of the database structure to extract data
19
19 Local SQL Example: We have gene name (BBS4), and would like to find project name, and the exon number, start/stops select id from target where gene_name = "BBS4" –11 select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = TARGET_ID –3 select project.name from project, set_table where set_table.project_id = project.id and set_table.id = SET_ID –project333 select id from sequence where target_id = TARGET_ID –33 select id from transcript where sequence_id = SEQUENCE_ID –68 select exon_num, sequence_start, sequence_stop from exon where transcript_id = TRANSCRIPT_ID Whew.
20
20 MySQL demo (gscr) mysql -h gscr -u tabraun –p use trapss;
21
21 TrAPSS Database Show tables; Show columns from table TABLE_NAME;
22
22 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name =testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options select id from target where gene_name = "BBS4";
23
23 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id = 23 date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name = testset date description target_set_info set_id target_id = 23 rank = 5 cas_rank cas_options select set_table.id from set_table, target_set_info where target_set_info.set_id = set_table.id and target_set_info.target_id = 23; (3 rows ==> in 3 sets)
24
24 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id = 7 project_id name =testset date description target_set_info set_id = target_id rank = 5 cas_rank cas_options select project.name from project, set_table where set_table.project_id = project.id and set_table.id = 7;
25
25 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id source = Ensembl source_id sequence id target_id = 23 type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name = testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options select id from sequence where target_id = 23;
26
26 exon transcript_id exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id sequence_id = 2216 source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name = testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options select id from transcript where sequence_id = 2216;
27
27 exon transcript_id = 5053 exon_num = 3 sequence_start sequence_stop intron transcript_id intron_num = 3 sequence_start sequence_stop primer_pair id transcript_id left_primer_id right_primer_id transcript id= 5053 sequence_id source = Ensembl source_id sequence id target_id type = nucleotide sequence = ATG… chr_name = 15 strand = 1 genomic_start = 15,123,120 genomic_stop = 16,378,131 source source_id refresh target id date gene_name = BBS4 description accession status project id name = pro1 description date set_table id project_id name = testset date description target_set_info set_id target_id rank = 5 cas_rank cas_options select exon_num, sequence_start, sequence_stop from exon where transcript_id = 5053;
28
28 SQL Examples – One Large Query select distinct exon_num, project.name, sequence_start, sequence_stop from exon, transcript, sequence, target, target_set_info, set_table, project where target.gene_name = "BBS4" and project.name = "pro1" and set_table.name = "testset" and set_table.project_id = project.id and target_set_info.set_id = set_table.id and target_set_info.target_id = target.id and sequence.target_id = target.id and transcript.sequence_id = sequence.id and exon.transcript_id = transcript.id order by exon_num;
29
29 Executing SQL commands from a text file http://www.mysql.org/doc/refman/5.0/en/batch-commands.html However, it is also possible to put your SQL statements in a file and then tell mysql to read its input from that file. To do so, create a text file text_file that contains the statements you wish to execute. Then invoke mysql as shown here: shell> mysql db_name < text_file If you place a USE db_name statement as the first statement in the file, it is unnecessary to specify the database name on the command line: shell> mysql < text_file If you are already running mysql, you can execute an SQL script file using the source or \. command: mysql> source file_name mysql> \. file_name
30
30 Remotely mysql -h gscr.eng.uiowa.edu -u tabraun -p
31
31 Setting up MySQL server -- database application that is running on the machine where the data is stored –runs as a "daemon" on Unix machines –a process that is always on, listening for and responding to requests (from clients) –mysqld -- server client -- programs that connect to the server and request data –can be on same machine as server, or –located on other machines and connected by network –mysql -- client –mysqladmin -- administration tool
32
32 MySLQ help http://dev.mysql.com/ Show tables; Show columns from table TABLE_NAME; Show TrAPSS (time permitting)
33
33 Survey of Databases (if time) Ensembl web interface Ensembl Ensmart
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.