Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples.

Similar presentations


Presentation on theme: "Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples."— Presentation transcript:

1 Bioinformatics Course Day 3 MySQL

2 Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples

3 What are databases? ● DBMS (database management systems) ● Data storage and provision ● Software running on servers ● Designed for high-capacity, high-availability usage ● Relational, object-orientated, hierarchical, network model ● Examples: Oracle, PostgreSQL, MySQL, Sybase, DB2, dBASE, Microsoft SQL Server

4 What do they do? ● Record storing ● Indexing for quick access ● Data organization ● Data processing

5 Application areas ● BioPharma ● E-Commerce ● Education ● Energy ● Finance ● Government ● Media ● Retail ● Telecom ● Transport Anywhere with large data volumes!

6 MySQL Customers ● Bayer ● Sanger ● Ensembl ● Google ● Yahoo ● Ticketmaster ● Deutsche Post ● State of New York ● UNICEF ● Yamaha ● Wikipedia ● BT ● Nokiache Post ● Lufthansa

7 Why MySQL? ● World's most popular open source database ( 8 million active installations and 50,000 downloads per day) ● High-performance ● Reliable ● Ease of use ● Free!

8 What is SQL? ● Structured Query Language ● create, modify, retrieve and manipulate data ● 1970's IBM: Structured English Query Language ("SEQUEL"), later SQL ● simple command set ● intuitive

9 SQL Examples ● Most important: data selection SELECT name,sequence FROM swissprot WHERE name = 'TLR4_HUMAN'; DELETE FROM blast WHERE expect > 1e-20; UPDATE installs SET version = 8 WHERE db = 'uniprot'; ● Data update: INSERT INTO BLAST VALUES('TLR4_HUMAN', 'TLR4_PANPA', 1e-104); ● Data insertion: ● Data deletion:

10 MySQL setup MySQL Server MySQL Client MySQL Client MySQL Client MySQL Client MySQL Client MySQL Client local and remote access

11 MySQL accounts ● Administrator: root ● Users: kahokamp, guest1 (not necessarily the same as login names) ● Passwords: ******* (not necessarily the same as login passwords)

12 Permissions ● Assigned by administrator ● Multiple levels: – Access – Database usage – Select, Insert, Update, Delete, Drop,... ● May depend on host

13 Connection $ mysql -h localhost -u guest -p Command line access:

14 Connection $ mysql -h localhost -u guest -p MySQL client program server hostuser namepassword

15 Connection $ mysql -h localhost -u guest -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 427 to server version: 5.0.18-standard-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>

16 Connection $ mysql -h bioinf.gen.tcd.ie -u guest -p uniprot Remote command line access: preselect a database

17 Connection use DBI; $user = 'guest'; $host = 'bioinf.gen.tcd.ie'; $password = ''; $db = 'uniprot'; $dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password); $statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”; $sth = $dbh->prepare($statement); $rv = $sth->execute; unless ($rv >= 1) { die “No match!”; } ($sequence) = $sth->fetchrow_array; print “$sequence\n”; Using Perl:

18 Connection use DBI; $user = 'guest'; $host = 'bioinf.gen.tcd.ie'; $password = ''; $db = 'uniprot'; $dbh = DBI->connect("DBI:mysql:database=$db;host=$host", $user, $password); $statement = “SELECT sequence FROM swissprot WHERE name = 'TLR4_HUMAN'”; $sth = $dbh->prepare($statement); $rv = $sth->execute; unless ($rv >= 1) { die “No match!”; } ($sequence) = $sth->fetchrow_array; print “$sequence\n”; Using Perl: database connection module access details connection query data retrieval

19 Connection Using the Web (PHPMyAdmin):

20 Orientation mysql> SHOW TABLES; +-------------------+ | Tables_in_uniprot | +-------------------+ | swissprot | +-------------------+ 1 row in set (0.00 sec) mysql> Show what's available:

21 Orientation mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | test | | uniprot | | uniprotKB8 | +--------------------+ 4 rows in set (0.00 sec) mysql> What other databases are there?

22 Orientation mysql> SHOW DATABASES; +--------------------+ | Database | +--------------------+ | information_schema | | test | | uniprot | | uniprotKB8 | +--------------------+ 4 rows in set (0.00 sec) mysql> USE TEST; Database changed mysql> What other databases are there?

23 Organization uniprottest swissprot test1test2 test4test3 MySQL Server databases tables

24 Permissions ● Creation of databases: – Normally only by administrator (root) ● Creation of tables: – All users with according permissions ● Special database 'test': – Normally accessible by all users ● Special user 'guest': – Limited access – Empty password

25 Work flow Create database Create of table(s) Insert data Query database

26 Table creation – Text – Numbers – Dates – Binary data – Sets Table columns need to be defined! Column types:

27 SQL Examples SELECT name,length FROM swissprot; +-------------+--------+ | name | length | +-------------+--------+ | 104K_THEAN | 893 | | 104K_THEPA | 924 | | 108_LYCES | 102 | | 10KD_VIGUN | 75 |... | ZYX_CHICK | 542 | | ZYX_HUMAN | 572 | | ZYX_MOUSE | 564 | +-------------+--------+ 222289 rows in set (0.89 sec)

28 SQL Examples SELECT name,length FROM swissprot LIMIT 10; +-------------+--------+ | name | length | +-------------+--------+ | 104K_THEAN | 893 | | 104K_THEPA | 924 | | 108_LYCES | 102 | | 10KD_VIGUN | 75 | | 110KD_PLAKN | 296 | | 11S2_SESIN | 459 | | 11S3_HELAN | 493 | | 11SB_CUCMA | 480 | | 128UP_DROME | 368 | | 12AH_CLOS4 | 29 | +-------------+--------+ 10 rows in set (0.00 sec)

29 SQL Examples SELECT name,length FROM swissprot LIMIT 222279,10; +-------------+--------+ | name | length | +-------------+--------+ | ZYG12_CAEEL | 774 | | ZYG1_CAEBR | 709 | | ZYG1_CAEEL | 706 | | ZYGBL_HUMAN | 766 | | ZYGBL_MOUSE | 779 | | ZYGBL_PONPY | 766 | | ZYS3_CHLRE | 371 | | ZYX_CHICK | 542 | | ZYX_HUMAN | 572 | | ZYX_MOUSE | 564 | +-------------+--------+ 10 rows in set (0.60 sec)

30 SQL Examples SELECT name,length FROM swissprot ORDER BY length LIMIT 10; +------------+--------+ | name | length | +------------+--------+ | GWA_SEPOF | 2 | | ACI_TRIGI | 3 | | GRWM_HUMAN | 3 | | LUXE_VIBFI | 3 | | TRH_BOMOR | 3 | | TRH_NOTVI | 3 | | TRH_PIG | 3 | | TRH_SHEEP | 3 | | ACH1_ACHFU | 4 | | DCML_PSECH | 4 | +------------+--------+ 10 rows in set (0.00 sec)

31 SQL Examples SELECT * FROM swissprot WHERE length = 2; +-----------+-----------+---------+------------+------------+------------+------------ ------+-----------+------+------------------------------------------------------------ ----------------------------------------------------+--------+------------------------ ---------------+------------------+--------+-------------+------+------------+-------- --+----------------------------------------------------+ | name | accession | version | dataset | created | modified | prot_name | component | type | lineage | tax_id | organism | checksum | length | seq_version | mass | seq_date | sequence | keyword | +-----------+-----------+---------+------------+------------+------------+------------ ------+-----------+------+------------------------------------------------------------ ----------------------------------------------------+--------+------------------------ ---------------+------------------+--------+-------------+------+------------+-------- --+----------------------------------------------------+ | GWA_SEPOF | P83570 | 15 | Swiss-Prot | 2004-01-16 | 2006-02-07 | Neuropeptide GWa | | | Eukaryota; Metazoa; Mollusca; Cephalopoda; Coleoidea; Neocoleoidea; Decapodiformes; Sepioidea; Sepiidae; Sepia | 6610 | Sepia officinalis (Common cuttlefish) | 7378100000000000 | 2 | 1 | 261 | 2003-06-01 | GW | Amidation; Direct protein sequencing; Neuropeptide | +-----------+-----------+---------+------------+------------+------------+------------ ------+-----------+------+------------------------------------------------------------ ----------------------------------------------------+--------+------------------------ ---------------+------------------+--------+-------------+------+------------+-------- --+----------------------------------------------------+ 1 row in set (0.00 sec)

32 SQL Examples SELECT name,sequence,organism,prot_name FROM swissprot WHERE length = 2; +-----------+----------+---------------------------------------+------------------+ | name | sequence | organism | prot_name | +-----------+----------+---------------------------------------+------------------+ | GWA_SEPOF | GW | Sepia officinalis (Common cuttlefish) | Neuropeptide GWa | +-----------+----------+---------------------------------------+------------------+ 1 row in set (0.00 sec)

33 SQL Examples SELECT * FROM swissprot WHERE length = 2 \G *************************** 1. row *************************** name: GWA_SEPOF accession: P83570 version: 15 dataset: Swiss-Prot created: 2004-01-16 modified: 2006-02-07 prot_name: Neuropeptide GWa component: type: lineage: Eukaryota; Metazoa; Mollusca; Cephalopoda;... tax_id: 6610 organism: Sepia officinalis (Common cuttlefish) checksum: 7378100000000000 length: 2 seq_version: 1 mass: 261 seq_date: 2003-06-01 sequence: GW keyword: Amidation; Direct protein sequencing; Neuropeptide 1 row in set (0.01 sec)

34 SQL Examples SELECT name,length FROM swissprot ORDER BY length DESC LIMIT 10; +-------------+--------+ | name | length | +-------------+--------+ | DIG1_CAEEL | 13100 | | SYNE1_HUMAN | 8797 | | ANC1_CAEEL | 8545 | | UNC89_CAEEL | 8081 | | OBSCN_HUMAN | 7968 | | LGRC_BREPA | 7756 | | BPA1_MOUSE | 7389 | | R1AB_CVMJH | 7180 | | R1AB_CVMA5 | 7176 | | R1AB_CVM2 | 7124 | +-------------+--------+ 10 rows in set (0.00 sec)

35 SQL Examples SELECT name,length FROM swissprot WHERE length < 10; +-------------+--------+ | name | length | +-------------+--------+ | GWA_SEPOF | 2 | | ACI_TRIGI | 3 | | GRWM_HUMAN | 3 | | LUXE_VIBFI | 3 |... | UPA7_HUMAN | 9 | | XYLA_STRS8 | 9 | | YBFR_AZOVI | 9 | +-------------+--------+ 365 rows in set (0.01 sec)

36 SQL Examples SELECT COUNT(*) FROM swissprot WHERE length < 10; +----------+ | count(*) | +----------+ | 365 | +----------+ 1 row in set (0.00 sec)

37 SQL Examples SELECT DISTINCT length FROM swissprot WHERE length < 10; +--------+ | length | +--------+ | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | +--------+ 8 rows in set (0.00 sec)

38 SQL Examples SELECT length, COUNT(length) FROM swissprot WHERE length < 10 GROUP BY length; +--------+---------------+ | length | COUNT(length) | +--------+---------------+ | 2 | 1 | | 3 | 7 | | 4 | 22 | | 5 | 30 | | 6 | 18 | | 7 | 50 | | 8 | 103 | | 9 | 134 | +--------+---------------+ 8 rows in set (0.00 sec)

39 SQL Examples CREATE TABLE test.splen SELECT length, COUNT(length) FROM swissprot GROUP BY length; Query OK, 2717 rows affected (0.30 sec) Records: 2717 Duplicates: 0 Warnings: 0

40 SQL Examples SELECT * FROM test.splen ORDER BY `COUNT(length)` DESC LIMIT 10; +--------+---------------+ | length | COUNT(length) | +--------+---------------+ | 379 | 1004 | | 146 | 921 | | 141 | 749 | | 156 | 694 | | 148 | 633 | | 207 | 591 | | 155 | 590 | | 152 | 579 | | 215 | 573 | | 119 | 570 | +--------+---------------+ 10 rows in set (0.01 sec)

41 SQL Examples SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR4\_PA%'; +------------+---------------------------------+ | name | organism | +------------+---------------------------------+ | TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) | | TLR4_PAPAN | Papio anubis (Olive baboon) | +------------+---------------------------------+ 2 rows in set (0.00 sec) Wild-cards: _ (single character) %(multiple characters) Escape with backslash (\)!

42 SQL Examples SELECT name,organism FROM swissprot WHERE NAME LIKE 'tlr4\_PA%'; +------------+---------------------------------+ | name | organism | +------------+---------------------------------+ | TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) | | TLR4_PAPAN | Papio anubis (Olive baboon) | +------------+---------------------------------+ 2 rows in set (0.00 sec) Case-insensitive (unless binary format)!

43 SQL Examples SELECT name,organism FROM swissprot WHERE NAME = 'TLR__PANPA'; Empty set (0.00 sec)

44 SQL Examples SELECT name,organism FROM swissprot WHERE NAME LIKE 'TLR__PANPA'; +------------+---------------------------------+ | name | organism | +------------+---------------------------------+ | TLR4_PANPA | Pan paniscus (Pygmy chimpanzee) | +------------+---------------------------------+ 1 row in set (0.00 sec)

45 SQL Examples SELECT name,length FROM swissprot WHERE NAME REGEXP '^TLR[4-9]\_HUMAN'; +------------+--------+ | name | length | +------------+--------+ | TLR4_HUMAN | 839 | | TLR5_HUMAN | 858 | | TLR6_HUMAN | 796 | | TLR7_HUMAN | 1049 | | TLR8_HUMAN | 1041 | | TLR9_HUMAN | 1032 | +------------+--------+ 6 rows in set (0.00 sec)

46 Normalization ● Optimize database design ● Avoid duplication of data ● Least redundancy in tables

47 Normalization Bad design! Repetition of entries, difficult to index and awkward to search

48 Normalization Alternative Design: Not optimal either: different number of keywords for each entry still very repetitive

49 Normalization Normalized version: Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name

50 Normalization Normalized version: Select name,sequence FROM table1, table2, table3 WHERE keyword = 'Glycoprotein' AND ID = ID1 AND ID2 = name

51 More Info ● MySQL tutorials on the web ● Learning MySQL (O'Reilly) ● http://dev.mysql.com/doc/ (searchable and browsable on-line) http://dev.mysql.com/doc/


Download ppt "Bioinformatics Course Day 3 MySQL. Topics ● Databases ● MySQL ● SQL ● Permissions ● Usage ● Examples."

Similar presentations


Ads by Google