Download presentation
Presentation is loading. Please wait.
Published byKristin Johnston Modified over 9 years ago
1
1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯
2
2 Schedule DateTimeSubjectSpeak er 8/13 一 13:30~17:30Perl Basics 蘇中才 8/15 三 13:30~17:30Programming Basics 蘇中才 8/17 五 13:30~17:30Regular expression 蘇中才 8/20 一 13:30~17:30Retrieving Data from Protein Sequence Database 蘇中才 8/22 三 13:30~17:30Perl combines with Genbank, BLAST 蘇中才 8/24 五 13:30~17:30PDB database and structure files 張天豪 8/27 一 8:30~12:30Extracting ATOM information 張天豪 8/27 一 13:30~17:30Mapping of Protein Sequence IDs and Structure IDs 張天豪 8/31 五 13:30~17:30Final and Examination 曾宇鳳
3
3 Reference Books Learning Perl (Perl 學習手冊 ) Beginning Perl for Bioinformatics Bioinformatics Biocomputing and Perl: An Introduction to Bioinformatics Computing Skills and Practice
4
4
5
5 Learning Perl
6
6 Perl Practical Extraction and Report Language Created by Larry Wall in the middle 1980`s. Suitable for “quick-and-dirty” Suitable for string-handling Powerful regular expression
7
7 Preparation Downloading putty.exe / pietty.exe Getting materials for this course: http://gene.csie.ntu.edu.tw/~sbb/summer-course/ http://gene.csie.ntu.edu.tw/~sbb/summer-course/ Server: ssh 140.112.28.186 Id : course1 ~ course20 Password:
8
8 Installing Perl on Windows Download package from http://www.activestate.com/ http://downloads.activestate.com/ActivePerl/Windows/ 5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi http://downloads.activestate.com/ActivePerl/Windows/ 5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi Versions of Perl Unix, Linux, Windows (ActivePerl), Mac (MacPerl) http://www.perl.com/
9
9 Text Editors A convenient (text) editor for programming Ultraedit: good for me Notepad: just an editor Vim: UNIX/Linux lover http://lpi.indicator-online.net/vim.html http://lpi.indicator-online.net/vim.html http://homepage.ttu.edu.tw/u9106240/page_main/vim _menu.html http://homepage.ttu.edu.tw/u9106240/page_main/vim _menu.html Joe : easy to use for Unix beginner
10
10 Finding Help Best resource finding tool – On-line Resources, use http://www.perl.com/ http://www.perl.org/ http://www.cpan.org/ HTML Help in ActivePerl Command Line (highly recommended) perldoc –f # search function perldoc –q # search FAQ perldoc # search module perldoc perldoc
11
11 Perl Basic Starting
12
12 $ vi welcome #! /usr/bin/perl -w print “Hello, world\n”; $ chmod +x welcome $./welcome Hello, world $ perl welcome Hello, world Program: run thyself! [sbb@gene perl]$ ls -al -rw-rw-r-- 1 sbb sbb 20 Jul 2 15:27 welcome [sbb@gene perl]$ chmod +x welcome [sbb@gene perl]$ ls -al -rwxrwxr-x 1 sbb sbb 20 Jul 2 15:27 welcome
13
13 #! /usr/bin/perl -w # The 'forever' program - a (Perl) program, # which does not stop until someone presses Ctrl-C. use constant TRUE => 1; use constant FALSE => 0; while ( TRUE ) { print "Welcome to the Wonderful World of Bioinformatics!\n"; sleep 1; } Using the Perl while construct
14
14 $ chmod +x forever $./forever Welcome to the Wonderful World of Bioinformatics!. Running forever...
15
15 Perl Basic Variables
16
16 Variables Scalar ($) Number 1; 1.23; 12e34 String “abc”; ‘ABC’ ; “Hello, world!”; Array / List (@) Hash (%)
17
17 Introducing variable containers The simplest type of variable container is the scalar ( 純量 ). In Perl, scalars can hold, for example, a number, a word, a sentence or a disk-file. $name $_address $programming_101 $z $abc $swissprot_to_interpro_mapping $SwissProt2InterProMapping Variable naming is ART !
18
18 scalar #!/usr/bin/perl -w # lower case for user defined ; upper case for system default my $ARGV = “example.pl"; my $number = 1.2; my $string = "Hello, world!"; my $123 = 123;#error my $abc = "123"; my $_123 = '123'; my $O000OoO00 = 1; my $OO00Oo000 = 2; my $OO00OoOOO = 3; $abc = $O000OoO00 * $OO00Oo000 - $OO00OoOOO; print $abc x 4. "\n"; print 5 x 4. "\n"; print 5 * 4. "\n";
19
19 Number Format (range: 1e-100 ~ 1e100 ?) 2000 1.25 -6.5e45 (-6.5*10^45) 123456789 123_456_789 Other format 0377 #octal (decimal 255) 0xFF #hexadecimal 0b11111111#binary
20
20 number $integer = 12; $real = 12.34; $oct = 0377; $bin = 0b11111111; $hex = 0xff; $long = 123456789; $long_ = 123_456_789; $large = 1E100;#1E200 $small = 1E-100;#1E-200 print "integer : $integer\n"; print "real : $real\n"; print "oct=$oct bin=$bin hex=$hex\n"; #printf("oct=0%o bin=0b%b hex=0x%x\n",$oct,$bin,$hex);
21
21 parameters of printf (ref : number) specifierOutputExample c Character a d or i Signed decimal integer 392 e Scientific notation (mantise/exponent) using e character 3.9265e+2 E Scientific notation (mantise/exponent) using E character 3.9265E+2 f Decimal floating point 392.65 g Use the shorter of %e or %f 392.65 G Use the shorter of %E or %f 392.65 o Signed octal 610 s String of characters sample u Unsigned decimal integer 7235 x Unsigned hexadecimal integer 7fa X Unsigned hexadecimal integer (capital letters) 7FA p Pointer address B800:0000 n Nothing printed. The argument must be a pointer to a signed int, where the number of characters written so far is stored. % A % followed by another % character will write % to stdout.
22
22 operator 2 + 3#5 5.1 – 2.4#2.7 3 * 12#36 14 / 2#7 10.2 / 0.3#34 10 / 3#3.333… 10 % 3#1
23
23 Operator Function + Addition - Subtraction, Negative Numbers, Unary Negation * Multiplication / Division % Modulus ** Exponent OperatorFunction =Normal Assignment +=Add and Assign -=Subtract and Assign *=Multiply and Assign /=Divide and Assign %=Modulus and Assign **=Exponent and Assign $number = $number + 100;$number += 100;
24
24 Take a break … modulus 10.5 % 3.2 = ? exponentiation 2^3 = ?
25
25 string Format Single quotes ‘hello’ ‘hello\nhello’ ‘hello,$name’ Double quotes “hello” “hello\nhello” “hello,$name” Exceptions ‘\’\\’ “\”\\” #!/usr/bin/perl –w print ‘hello’; print “hello”;
26
26 Backslash escapes Escape Sequences Description or Character Escape Sequences Description or Character \b\b Backspace \@\@ Ampersand \e\e Escape \ 0nnn Any Octal byte \f\f Form Feed \ xnn Any Hexadecimal byte \n\n New line \ cn Any Control character \r\r Carriage Return \l\l Change the next character to lowercase \t\t Tab \u\u Change the next character to uppercase \v\v Vertical Tab \\ Backslash \$\$ Dollar Sign
27
27 conversion between String and number $answer = “Hello ”. “ “. “ world\n”; $answer = “12”. “3”; $answer = “12” * “3”; $answer = “12Hello34” * “3”;#warning !!! $answer = “A”. 3*5; $answer = “A” x (3*5); $answer = “12”x”3”;
28
28 #! /usr/bin/perl -w # The 'tentimes' program - a (Perl) program, # which stops after ten iterations. use constant HOWMANY => 10; $count = 0; while ( $count < HOWMANY ) { print "Welcome to the Wonderful World of Bioinformatics!\n"; $count++; } Variable containers and loops
29
29 $ chmod +x tentimes $./tentimes Welcome to the Wonderful World of Bioinformatics! Running tentimes...
30
30 #! /usr/bin/perl -w # The 'fivetimes' program - a (Perl) program, # which stops after five iterations. use constant TRUE => 1; use constant FALSE => 0; use constant HOWMANY => 5; $count = 0; while ( TRUE ) { $count++; print "Welcome to the Wonderful World of Bioinformatics!\n"; if ( $count == HOWMANY ) { last; } Using the Perl if construct
31
31 #! /usr/bin/perl -w # The 'oddeven' program. use constant HOWMANY => 4; $count = 0; while ( $count < HOWMANY ) { $count++; if ( $count % 2 == 0 ) { print “$count : even\n"; } else # $count % 2 is not zero. { print “$count : odd\n"; } The oddeven program
32
32 Comparison operator ComparisonNumberString Equal==eq Not equal!=ne Less than<lt Greater than>gt Less than or equal<=le Greater than or equal>=ge Comparison cmp
33
33 Variable Interpolation #! /usr/bin/perl -w # The ‘interpolation' program which interpolate variables by variable. $language = “Perl”; $string = “I love $language”; print $string.”\n”; $string = ‘I love $language”; print $string.”\n”; $string = ‘I love ‘.$language; print $string.”\n”; $string = “I love \$language”; print $string.”\n”; $string = “I love $languages”; print $string.”\n”; #${language}s
34
34 @list_of_sequences @totals @protein_structures ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ) @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); Arrays: Associating Data With Numbers
35
35 The @list_of_sequences Array
36
36 print "$list_of_sequences[1]\n"; GCTCAGTTCT $list_of_sequences[1] = 'CTATGCGGTA'; $list_of_sequences[3] = 'GGTCCATGAA'; Working with array elements
37
37 The Grown @list_of_sequences Array
38
38 print "The array size is: ", $#list_of_sequences+1, ".\n"; print "The array size is: ", scalar @list_of_sequences, ".\n"; The array size is: 4. How big is the array?
39
39 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( @sequences, 'CTATGCGGTA' ); print "@sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( 'CTATGCGGTA' ); print "@sequences\n"; CTATGCGGTA Adding elements to an array
40
40 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequences = ( @sequences, ( 'CTATGCGGTA', 'CTATTATGTC' ) ); print "@sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA CTATTATGTC @sequence_1 = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @sequence_2 = ( 'GCTCAGTTCT', 'GACCTCTTAA' ); @combined_sequences = ( @sequence_1, @sequence_2 ); print "@combined_sequences\n"; TTATTATGTT GCTCAGTTCT GACCTCTTAA GCTCAGTTCT GACCTCTTAA Adding more elements to an array
41
41 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'TTATTATGTT' ); @removed_elements = splice @sequences, 1, 2; print "@removed_elements\n"; print "@sequences\n"; GCTCAGTTCT GACCTCTTAA TTATTATGTT #clean all elements of an array @sequences = (); Removing elements from an array
42
42 #! /usr/bin/perl -w # The 'slices' program - slicing arrays. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); print "@sequences\n\n"; @seq_slice = @sequences[ 1.. 3 ]; print "@seq_slice\n"; print "@sequences\n\n"; @removed = splice @sequences, 1, 3; print "@sequences\n"; print "@removed\n"; The slices program
43
43 TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC TTATTATGTT ATCTGACCTC GCTCAGTTCT GACCTCTTAA CTATGCGGTA Results from slices...
44
44 #! /usr/bin/perl -w # The 'iterateW' program - iterate over an entire array # with 'while'. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); $index = 0; $last_index = $#sequences; while ( $index <= $last_index ) { print "$sequences[ $index ]\n"; ++$index; } Processing every element in an array
45
45 TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC Results from iterateW...
46
46 #! /usr/bin/perl -w # The 'iterateF' program - iterate over an entire array # with 'foreach'. @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); foreach $value ( @sequences ) { print "$value\n"; } The iterateF program
47
47 @sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' ); @sequences = ( TTATTATGTT, GCTCAGTTCT, GACCTCTTAA, CTATGCGGTA, ATCTGACCTC ); @sequences = qw( TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC ); Making lists easier to work with
48
48 Quoted words #!/usr/bin/perl -w # The ‘quoted_words’ program @list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' ); @list_of_sequences = qw/TTATTATGTT GCTCAGTTCT GACCTCTTAA/; @list_of_sequences = qw{TTATTATGTT GCTCAGTTCT GACCTCTTAA}; @list_of_sequences = qw!TTATTATGTT GCTCAGTTCT GACCTCTTAA!; @list_of_sequences = qw[TTATTATGTT GCTCAGTTCT GACCTCTTAA]; @list_of_sequences = qw ; @list_of_sequences = qw#TTATTATGTT GCTCAGTTCT GACCTCTTAA#; print "@list_of_sequences\n"; print "The array size is: ", $#list_of_sequences+1, ".\n";
49
49 pop/push/shift/unshift #!/usr/bin/perl -w #The “array_operator” program @array = 5..9; print "array = [@array]\n"; $item = pop @array; print "item = [$item]\n"; print "array = [@array]\n"; push @array, 9; print "array = [@array]\n"; $item = shift @array; print "item = [$item]\n"; print "array = [@array]\n"; unshift @array, 1..5; print "array = [@array]\n";
50
50 pop/push/shift/unshift array = [5 6 7 8 9] ==========pop========== item = [9] array = [5 6 7 8] ==========push 9========== array = [5 6 7 8 9] ==========shift========== item = [5] array = [6 7 8 9] ==========unshift 1..5========== array = [1 2 3 4 5 6 7 8 9]
51
51 reverse / sort #!/usr/bin/perl -w #The “array_operator1” program @array = qw /5 4 9 8 1 3 6 2 7 10/; print "array = [@array]\n"; @array_reverse = reverse @array; print "reverse array = [@array_reverse]\n"; @array_sorted = sort @array; print "sort array = [@array_sorted]\n"; @array_reversesorted = reverse sort @array; print "reverse sort array = [@array_reversesorted]\n"; @array_sortedreverse = sort reverse @array; print "sort reverse array = [@array_sortedreverse]\n";
52
52 reverse / sort array = [5 4 9 8 1 3 6 2 7 10] ======================================== reverse array = [10 7 2 6 3 1 8 9 4 5] ======================================== sort array = [1 10 2 3 4 5 6 7 8 9] ======================================== reverse sort array = [9 8 7 6 5 4 3 2 10 1] ======================================== sort reverse array = [1 10 2 3 4 5 6 7 8 9]
53
53 split/join #!/usr/bin/perl -w #The “array_operator2” program - join / split $string = "5 4 9 8 1 3 6 2 7 10"; @array = split/ /, $string; print "array = [@array]\n"; $string = join ",", @array; print "array = [$string]\n"; array = [5 4 9 8 1 3 6 2 7 10] array = [5,4,9,8,1,3,6,2,7,10]
54
54 How to map between IP and domain name ? IPDomain name 140.112.28.186gene.csie.ntu.edu.tw 140.112.28.191biominer.csie.ntu.edu.tw 140.112.28.190knn.csie.ntu.edu.tw
55
55 Use 2 array to map between IP and domain name ? @IP 140.112.28.186 140.112.28.191 140.112.28.190 @Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [0] [1] [2] [0] [1] [2]
56
56 How to search a certain ip or domain name ? @IP 140.112.28.186 140.112.28.191 140.112.28.190 @Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [0] [1] [2] [0] [1] [2]
57
57 Why Hash ? %Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [140.112.28.186] [140.112.28.191] [140.112.28.190] KeyValue
58
58 How to get a certain domain name? %Domain_name gene.csie.ntu.edu.tw biominer.csie.ntu.edu.tw knn.csie.ntu.edu.tw [140.112.28.186] [140.112.28.191] [140.112.28.190] KeyValue $Domain_name{“140.112.28.186”}
59
59 Examples of Hash
60
60 Hashes: Associating Data With Words %nucleotide_bases %nucleotide_bases = ( A, Adenine, T, Thymine ); %nucleotide_based = ( A => Adenine, T => Thymine); keyvalue
61
61 print "The expanded name for 'A' is $nucleotide_bases{ 'A' }\n"; The expanded name for 'A' is Adenine Working with hash entries
62
62 %nucleotide_bases = ( A, Adenine, T, Thymine ); @hash_names = keys %nucleotide_bases; print "The names in the %nucleotide_bases hash are: @hash_names\n"; The names in the %nucleotide_bases hash are: A T %nucleotide_bases = ( A, Adenine, T, Thymine ); $hash_size = keys %nucleotide_bases; print "The size of the %nucleotide_bases hash is: $hash_size\n"; The size of the %nucleotide_bases hash is: 2 How big is the hash?
63
63 $nucleotide_bases{ 'G' } = 'Guanine'; $nucleotide_bases{ 'C' } = 'Cytosine'; %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); Adding entries to a hash
64
64 The Grown %nucleotide_bases Hash
65
65 delete $nucleotide_bases{ ‘C' }; $nucleotide_bases{ 'C' } = undef; Removing entries from a hash
66
66 #! /usr/bin/perl -w # The ‘slicing_hashes' program – extract a certain subset among a hash %gene_counts = ( Human => 31000, 'Thale cress' => 26000, 'Nematode worm' => 18000, 'Fruit fly' => 13000, Yeast => 6000, 'Tuberculosis microbe' => 4000 ); @counts = @gene_counts{ Human, “Fruit fly”, 'Tuberculosis microbe' }; print "@counts\n"; Slicing hashes 31000 13000 4000
67
67 #! /usr/bin/perl -w # The 'bases' program - a hash of the nucleotide bases. %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); $sequence = 'CTATGCGGTA'; print "\nThe sequence is $sequence, which expands to:\n\n"; while ( $sequence =~ /(.)/g ) { print "\t$nucleotide_bases{ $1 }\n"; } Working with hash entries: a complete example
68
68 The sequence is CTATGCGGTA, which expands to: Cytosine Thymine Adenine Thymine Guanine Cytosine Guanine Thymine Adenine Results from bases...
69
69 #! /usr/bin/perl -w # The 'genes' program - a hash of gene counts. use constant LINE_LENGTH => 60; %gene_counts = ( Human => 31000, 'Thale cress' => 26000, 'Nematode worm' => 18000, 'Fruit fly' => 13000, Yeast => 6000, 'Tuberculosis microbe' => 4000 ); Processing every entry in a hash
70
70 print '-' x LINE_LENGTH, "\n"; while ( ( $genome, $count ) = each %gene_counts ) { print "`$genome' has a gene count of $count\n"; } print '-' x LINE_LENGTH, "\n"; foreach $genome ( sort keys %gene_counts ) { print "`$genome' has a gene count of $gene_counts{ $genome }\n"; } print '-' x LINE_LENGTH, "\n"; The genes program, cont.
71
71 ------------------------------------------------------------ 'Human' has a gene count of 31000 'Tuberculosis microbe' has a gene count of 4000 'Fruit fly' has a gene count of 13000 'Nematode worm' has a gene count of 18000 'Yeast' has a gene count of 6000 'Thale cress' has a gene count of 26000 ------------------------------------------------------------ 'Fruit fly' has a gene count of 13000 'Human' has a gene count of 31000 'Nematode worm' has a gene count of 18000 'Thale cress' has a gene count of 26000 'Tuberculosis microbe' has a gene count of 4000 'Yeast' has a gene count of 6000 ------------------------------------------------------------ Results from genes...
72
72 How to sort by the values ?
73
73 Exercise Protein sequences
74
74 FASTA format >P53_HUMAN (P04637) Cellular tumor antigen p53 (Tumor suppressor p53) (Phosphoprotein p53) (Antigen NY-CO-13) - Homo sapiens (Human). MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
75
75 Read a FASTA file #!/usr/bin/perl -w my ( $line, $queryname, $queryseq ); while ( $line = <> ) { if ( $line =~ />(.+?)\s.+/) { $queryname = $1 ; } else { chomp $line; $queryseq = $queryseq. $line; }
76
76 Exercise Read more then one sequence Store the protein names and sequences from disorder.fa by 2 array Show all of protein names and sequences. Show the number of proteins and residues. ($len = length $seq;)
77
77 Exercise Read more then one sequence Store the protein names and sequences from disorder.fa by a hash Show the protein names and sequences sorted by protein name Find the longest sequence
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.