Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

Similar presentations


Presentation on theme: "1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯."— Presentation transcript:

1 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

2 2 Schedule DateTimeSubjectSpeak er 8/13 一 13:30~17:30Perl Basics 蘇中才 8/15 三 13:30~17:30Programming Basics 蘇中才 8/17 五 13:30~17:30Regular expression 蘇中才 8/20 一 13:30~17:30Retrieving Data from Protein Sequence Database 蘇中才 8/22 三 13:30~17:30Perl combines with Genbank, BLAST 蘇中才 8/24 五 13:30~17:30PDB database and structure files 張天豪 8/27 一 8:30~12:30Extracting ATOM information 張天豪 8/27 一 13:30~17:30Mapping of Protein Sequence IDs and Structure IDs 張天豪 8/31 五 13:30~17:30Final and Examination 曾宇鳳

3 3 Reference Books Learning Perl (Perl 學習手冊 ) Beginning Perl for Bioinformatics Bioinformatics Biocomputing and Perl: An Introduction to Bioinformatics Computing Skills and Practice

4 4

5 5 Learning Perl

6 6 Perl Practical Extraction and Report Language Created by Larry Wall in the middle 1980`s. Suitable for “quick-and-dirty” Suitable for string-handling Powerful regular expression

7 7 Preparation Downloading putty.exe / pietty.exe Getting materials for this course:  http://gene.csie.ntu.edu.tw/~sbb/summer-course/ http://gene.csie.ntu.edu.tw/~sbb/summer-course/ Server:  ssh 140.112.28.186  Id : course1 ~ course20  Password:

8 8 Installing Perl on Windows Download package from  http://www.activestate.com/  http://downloads.activestate.com/ActivePerl/Windows/ 5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi http://downloads.activestate.com/ActivePerl/Windows/ 5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi Versions of Perl  Unix, Linux, Windows (ActivePerl), Mac (MacPerl)  http://www.perl.com/

9 9 Text Editors A convenient (text) editor for programming Ultraedit: good for me Notepad: just an editor Vim: UNIX/Linux lover  http://lpi.indicator-online.net/vim.html http://lpi.indicator-online.net/vim.html  http://homepage.ttu.edu.tw/u9106240/page_main/vim _menu.html http://homepage.ttu.edu.tw/u9106240/page_main/vim _menu.html Joe : easy to use for Unix beginner

10 10 Finding Help Best resource finding tool – On-line Resources, use  http://www.perl.com/ http://www.perl.org/ http://www.cpan.org/ HTML Help in ActivePerl Command Line (highly recommended)  perldoc –f # search function  perldoc –q # search FAQ  perldoc # search module  perldoc perldoc

11 11 Perl Basic Starting

12 12 $ vi welcome #! /usr/bin/perl -w print “Hello, world\n”; $ chmod +x welcome $./welcome Hello, world $ perl welcome Hello, world Program: run thyself! [sbb@gene perl]$ ls -al -rw-rw-r-- 1 sbb sbb 20 Jul 2 15:27 welcome [sbb@gene perl]$ chmod +x welcome [sbb@gene perl]$ ls -al -rwxrwxr-x 1 sbb sbb 20 Jul 2 15:27 welcome

13 13 #! /usr/bin/perl -w # The 'forever' program - a (Perl) program, # which does not stop until someone presses Ctrl-C. use constant TRUE => 1; use constant FALSE => 0; while ( TRUE ) { print "Welcome to the Wonderful World of Bioinformatics!\n"; sleep 1; } Using the Perl while construct

14 14 $ chmod +x forever $./forever Welcome to the Wonderful World of Bioinformatics!. Running forever...

15 15 Perl Basic Variables

16 16 Variables Scalar ($)  Number 1; 1.23; 12e34  String “abc”; ‘ABC’ ; “Hello, world!”; Array / List (@) Hash (%)

17 17 Introducing variable containers The simplest type of variable container is the scalar ( 純量 ). In Perl, scalars can hold, for example, a number, a word, a sentence or a disk-file. $name $_address $programming_101 $z $abc $swissprot_to_interpro_mapping $SwissProt2InterProMapping Variable naming is ART !

18 18 scalar #!/usr/bin/perl -w # lower case for user defined ; upper case for system default my $ARGV = “example.pl"; my $number = 1.2; my $string = "Hello, world!"; my $123 = 123;#error my $abc = "123"; my $_123 = '123'; my $O000OoO00 = 1; my $OO00Oo000 = 2; my $OO00OoOOO = 3; $abc = $O000OoO00 * $OO00Oo000 - $OO00OoOOO; print $abc x 4. "\n"; print 5 x 4. "\n"; print 5 * 4. "\n";

19 19 Number Format (range: 1e-100 ~ 1e100 ?)  2000  1.25  -6.5e45 (-6.5*10^45)  123456789  123_456_789 Other format  0377 #octal (decimal 255)  0xFF #hexadecimal  0b11111111#binary

20 20 number $integer = 12; $real = 12.34; $oct = 0377; $bin = 0b11111111; $hex = 0xff; $long = 123456789; $long_ = 123_456_789; $large = 1E100;#1E200 $small = 1E-100;#1E-200 print "integer : $integer\n"; print "real : $real\n"; print "oct=$oct bin=$bin hex=$hex\n"; #printf("oct=0%o bin=0b%b hex=0x%x\n",$oct,$bin,$hex);

21 21 parameters of printf (ref : number) specifierOutputExample c Character a d or i Signed decimal integer 392 e Scientific notation (mantise/exponent) using e character 3.9265e+2 E Scientific notation (mantise/exponent) using E character 3.9265E+2 f Decimal floating point 392.65 g Use the shorter of %e or %f 392.65 G Use the shorter of %E or %f 392.65 o Signed octal 610 s String of characters sample u Unsigned decimal integer 7235 x Unsigned hexadecimal integer 7fa X Unsigned hexadecimal integer (capital letters) 7FA p Pointer address B800:0000 n Nothing printed. The argument must be a pointer to a signed int, where the number of characters written so far is stored. % A % followed by another % character will write % to stdout.

22 22 operator 2 + 3#5 5.1 – 2.4#2.7 3 * 12#36 14 / 2#7 10.2 / 0.3#34 10 / 3#3.333… 10 % 3#1

23 23 Operator Function + Addition - Subtraction, Negative Numbers, Unary Negation * Multiplication / Division % Modulus ** Exponent OperatorFunction =Normal Assignment +=Add and Assign -=Subtract and Assign *=Multiply and Assign /=Divide and Assign %=Modulus and Assign **=Exponent and Assign $number = $number + 100;$number += 100;

24 24 Take a break … modulus  10.5 % 3.2 = ? exponentiation  2^3 = ?

25 25 string Format  Single quotes ‘hello’ ‘hello\nhello’ ‘hello,$name’  Double quotes “hello” “hello\nhello” “hello,$name” Exceptions  ‘\’\\’  “\”\\” #!/usr/bin/perl –w print ‘hello’; print “hello”;

26 26 Backslash escapes Escape Sequences Description or Character Escape Sequences Description or Character \b\b Backspace \@\@ Ampersand \e\e Escape \ 0nnn Any Octal byte \f\f Form Feed \ xnn Any Hexadecimal byte \n\n New line \ cn Any Control character \r\r Carriage Return \l\l Change the next character to lowercase \t\t Tab \u\u Change the next character to uppercase \v\v Vertical Tab \\ Backslash \$\$ Dollar Sign

27 27 conversion between String and number $answer = “Hello ”. “ “. “ world\n”; $answer = “12”. “3”; $answer = “12” * “3”; $answer = “12Hello34” * “3”;#warning !!! $answer = “A”. 3*5; $answer = “A” x (3*5); $answer = “12”x”3”;

28 28 #! /usr/bin/perl -w # The 'tentimes' program - a (Perl) program, # which stops after ten iterations. use constant HOWMANY => 10; $count = 0; while ( $count < HOWMANY ) { print "Welcome to the Wonderful World of Bioinformatics!\n"; $count++; } Variable containers and loops

29 29 $ chmod +x tentimes $./tentimes Welcome to the Wonderful World of Bioinformatics! Running tentimes...

30 30 #! /usr/bin/perl -w # The 'fivetimes' program - a (Perl) program, # which stops after five iterations. use constant TRUE => 1; use constant FALSE => 0; use constant HOWMANY => 5; $count = 0; while ( TRUE ) { $count++; print "Welcome to the Wonderful World of Bioinformatics!\n"; if ( $count == HOWMANY ) { last; } Using the Perl if construct

31 31 #! /usr/bin/perl -w # The 'oddeven' program. use constant HOWMANY => 4; $count = 0; while ( $count < HOWMANY ) { $count++; if ( $count % 2 == 0 ) { print “$count : even\n"; } else # $count % 2 is not zero. { print “$count : odd\n"; } The oddeven program

32 32 Comparison operator ComparisonNumberString Equal==eq Not equal!=ne Less than<lt Greater than>gt Less than or equal<=le Greater than or equal>=ge Comparison cmp

33 33 Variable Interpolation #! /usr/bin/perl -w # The ‘interpolation' program which interpolate variables by variable. $language = “Perl”; $string = “I love $language”; print $string.”\n”; $string = ‘I love $language”; print $string.”\n”; $string = ‘I love ‘.$language; print $string.”\n”; $string = “I love \$language”; print $string.”\n”; $string = “I love $languages”; print $string.”\n”; #${language}s

34 34 @list_of_sequences @totals @protein_structures ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ) @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); Arrays: Associating Data With Numbers

35 35 The @list_of_sequences Array

36 36 print "$list_of_sequences[1]\n"; GCTCAGTTCT $list_of_sequences[1] = 'CTATGCGGTA'; $list_of_sequences[3] = 'GGTCCATGAA'; Working with array elements

37 37 The Grown @list_of_sequences Array

38 38 print "The array size is: ", $#list_of_sequences+1, ".\n"; print "The array size is: ", scalar @list_of_sequences, ".\n"; The array size is: 4. How big is the array?

39 39 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( @sequences, 'CTATGCGGTA' ); print "@sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( 'CTATGCGGTA' ); print "@sequences\n"; CTATGCGGTA Adding elements to an array

40 40 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( @sequences, ( 'CTATGCGGTA', 'CTATTATGTC' ) ); print "@sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA CTATTATGTC @sequence_1 = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequence_2 = ( 'GCTCAGTTCT', 'GACCTCTTAA' ); @combined_sequences = ( @sequence_1, @sequence_2 ); print "@combined_sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA GCTCAGTTCT GACCTCTTAA Adding more elements to an array

41 41 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'TTATTATGTT' ); @removed_elements = splice @sequences, 1, 2; print "@removed_elements\n"; print "@sequences\n"; GCTCAGTTCT GACCTCTTAA TTATTATGTT #clean all elements of an array @sequences = (); Removing elements from an array

42 42 #! /usr/bin/perl -w # The 'slices' program - slicing arrays. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); print "@sequences\n\n"; @seq_slice = @sequences[ 1.. 3 ]; print "@seq_slice\n"; print "@sequences\n\n"; @removed = splice @sequences, 1, 3; print "@sequences\n"; print "@removed\n"; The slices program

43 43 TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC TTATTATGTT ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA Results from slices...

44 44 #! /usr/bin/perl -w # The 'iterateW' program - iterate over an entire array # with 'while'. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); $index = 0; $last_index = $#sequences; while ( $index <= $last_index ) { print "$sequences[ $index ]\n"; ++$index; } Processing every element in an array

45 45 TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC Results from iterateW...

46 46 #! /usr/bin/perl -w # The 'iterateF' program - iterate over an entire array # with 'foreach'. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); foreach $value ( @sequences ) { print "$value\n"; } The iterateF program

47 47 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); @sequences = ( TTATTATGTT, GCTCAGTTCT, GACCTCTTAA, CTATGCGGTA, ATCTGACCTC ); @sequences = qw( TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC ); Making lists easier to work with

48 48 Quoted words #!/usr/bin/perl -w # The ‘quoted_words’ program @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @list_of_sequences = qw/TTATTATGTT GCTCAGTTCT GACCTCTTAA/; @list_of_sequences = qw{TTATTATGTT GCTCAGTTCT GACCTCTTAA}; @list_of_sequences = qw!TTATTATGTT GCTCAGTTCT GACCTCTTAA!; @list_of_sequences = qw[TTATTATGTT GCTCAGTTCT GACCTCTTAA]; @list_of_sequences = qw ; @list_of_sequences = qw#TTATTATGTT GCTCAGTTCT GACCTCTTAA#; print "@list_of_sequences\n"; print "The array size is: ", $#list_of_sequences+1, ".\n";

49 49 pop/push/shift/unshift #!/usr/bin/perl -w #The “array_operator” program @array = 5..9; print "array = [@array]\n"; $item = pop @array; print "item = [$item]\n"; print "array = [@array]\n"; push @array, 9; print "array = [@array]\n"; $item = shift @array; print "item = [$item]\n"; print "array = [@array]\n"; unshift @array, 1..5; print "array = [@array]\n";

50 50 pop/push/shift/unshift array = [5 6 7 8 9] ==========pop========== item = [9] array = [5 6 7 8] ==========push 9========== array = [5 6 7 8 9] ==========shift========== item = [5] array = [6 7 8 9] ==========unshift 1..5========== array = [1 2 3 4 5 6 7 8 9]

51 51 reverse / sort #!/usr/bin/perl -w #The “array_operator1” program @array = qw /5 4 9 8 1 3 6 2 7 10/; print "array = [@array]\n"; @array_reverse = reverse @array; print "reverse array = [@array_reverse]\n"; @array_sorted = sort @array; print "sort array = [@array_sorted]\n"; @array_reversesorted = reverse sort @array; print "reverse sort array = [@array_reversesorted]\n"; @array_sortedreverse = sort reverse @array; print "sort reverse array = [@array_sortedreverse]\n";

52 52 reverse / sort array = [5 4 9 8 1 3 6 2 7 10] ======================================== reverse array = [10 7 2 6 3 1 8 9 4 5] ======================================== sort array = [1 10 2 3 4 5 6 7 8 9] ======================================== reverse sort array = [9 8 7 6 5 4 3 2 10 1] ======================================== sort reverse array = [1 10 2 3 4 5 6 7 8 9]

53 53 split/join #!/usr/bin/perl -w #The “array_operator2” program - join / split $string = "5 4 9 8 1 3 6 2 7 10"; @array = split/ /, $string; print "array = [@array]\n"; $string = join ",", @array; print "array = [$string]\n"; array = [5 4 9 8 1 3 6 2 7 10] array = [5,4,9,8,1,3,6,2,7,10]

54 54 How to map between IP and domain name ? IPDomain name 140.112.28.186gene.csie.ntu.edu.tw 140.112.28.191biominer.csie.ntu.edu.tw 140.112.28.190knn.csie.ntu.edu.tw

55 55 Use 2 array to map between IP and domain name ? @IP 140.112.28.186 140.112.28.191 140.112.28.190 @Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [0] [1] [2] [0] [1] [2]

56 56 How to search a certain ip or domain name ? @IP 140.112.28.186 140.112.28.191 140.112.28.190 @Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [0] [1] [2] [0] [1] [2]

57 57 Why Hash ? %Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [140.112.28.186] [140.112.28.191] [140.112.28.190] KeyValue

58 58 How to get a certain domain name? %Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [140.112.28.186] [140.112.28.191] [140.112.28.190] KeyValue $Domain_name{“140.112.28.186”}

59 59 Examples of Hash

60 60 Hashes: Associating Data With Words %nucleotide_bases %nucleotide_bases = ( A, Adenine, T, Thymine ); %nucleotide_based = ( A => Adenine, T => Thymine); keyvalue

61 61 print "The expanded name for 'A' is $nucleotide_bases{ 'A' }\n"; The expanded name for 'A' is Adenine Working with hash entries

62 62 %nucleotide_bases = ( A, Adenine, T, Thymine ); @hash_names = keys %nucleotide_bases; print "The names in the %nucleotide_bases hash are: @hash_names\n"; The names in the %nucleotide_bases hash are: A T %nucleotide_bases = ( A, Adenine, T, Thymine ); $hash_size = keys %nucleotide_bases; print "The size of the %nucleotide_bases hash is: $hash_size\n"; The size of the %nucleotide_bases hash is: 2 How big is the hash?

63 63 $nucleotide_bases{ 'G' } = 'Guanine'; $nucleotide_bases{ 'C' } = 'Cytosine'; %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); Adding entries to a hash

64 64 The Grown %nucleotide_bases Hash

65 65 delete $nucleotide_bases{ ‘C' }; $nucleotide_bases{ 'C' } = undef; Removing entries from a hash

66 66 #! /usr/bin/perl -w # The ‘slicing_hashes' program – extract a certain subset among a hash %gene_counts = ( Human => 31000, 'Thale cress' => 26000, 'Nematode worm' => 18000, 'Fruit fly' => 13000, Yeast => 6000, 'Tuberculosis microbe' => 4000 ); @counts = @gene_counts{ Human, “Fruit fly”, 'Tuberculosis microbe' }; print "@counts\n"; Slicing hashes 31000 13000 4000

67 67 #! /usr/bin/perl -w # The 'bases' program - a hash of the nucleotide bases. %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); $sequence = 'CTATGCGGTA'; print "\nThe sequence is $sequence, which expands to:\n\n"; while ( $sequence =~ /(.)/g ) { print "\t$nucleotide_bases{ $1 }\n"; } Working with hash entries: a complete example

68 68 The sequence is CTATGCGGTA, which expands to: Cytosine Thymine Adenine Thymine Guanine Cytosine Guanine Thymine Adenine Results from bases...

69 69 #! /usr/bin/perl -w # The 'genes' program - a hash of gene counts. use constant LINE_LENGTH => 60; %gene_counts = ( Human => 31000, 'Thale cress' => 26000, 'Nematode worm' => 18000, 'Fruit fly' => 13000, Yeast => 6000, 'Tuberculosis microbe' => 4000 ); Processing every entry in a hash

70 70 print '-' x LINE_LENGTH, "\n"; while ( ( $genome, $count ) = each %gene_counts ) { print "`$genome' has a gene count of $count\n"; } print '-' x LINE_LENGTH, "\n"; foreach $genome ( sort keys %gene_counts ) { print "`$genome' has a gene count of $gene_counts{ $genome }\n"; } print '-' x LINE_LENGTH, "\n"; The genes program, cont.

71 71 ------------------------------------------------------------ 'Human' has a gene count of 31000 'Tuberculosis microbe' has a gene count of 4000 'Fruit fly' has a gene count of 13000 'Nematode worm' has a gene count of 18000 'Yeast' has a gene count of 6000 'Thale cress' has a gene count of 26000 ------------------------------------------------------------ 'Fruit fly' has a gene count of 13000 'Human' has a gene count of 31000 'Nematode worm' has a gene count of 18000 'Thale cress' has a gene count of 26000 'Tuberculosis microbe' has a gene count of 4000 'Yeast' has a gene count of 6000 ------------------------------------------------------------ Results from genes...

72 72 How to sort by the values ?

73 73 Exercise Protein sequences

74 74 FASTA format >P53_HUMAN (P04637) Cellular tumor antigen p53 (Tumor suppressor p53) (Phosphoprotein p53) (Antigen NY-CO-13) - Homo sapiens (Human). MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD

75 75 Read a FASTA file #!/usr/bin/perl -w my ( $line, $queryname, $queryseq ); while ( $line = <> ) { if ( $line =~ />(.+?)\s.+/) { $queryname = $1 ; } else { chomp $line; $queryseq = $queryseq. $line; }

76 76 Exercise Read more then one sequence Store the protein names and sequences from disorder.fa by 2 array Show all of protein names and sequences. Show the number of proteins and residues. ($len = length $seq;)

77 77 Exercise Read more then one sequence Store the protein names and sequences from disorder.fa by a hash Show the protein names and sequences sorted by protein name Find the longest sequence


Download ppt "1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯."

Similar presentations


Ads by Google