96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯
Schedule DateTimeSubjectSpeak er 8/13 一 13:30~17:30Perl Basics 蘇中才 8/15 三 13:30~17:30Programming Basics 蘇中才 8/17 五 13:30~17:30Regular expression 蘇中才 8/20 一 13:30~17:30Retrieving Data from Protein Sequence Database 蘇中才 8/22 三 13:30~17:30 Perl combines with BLAST and ClustalW 蘇中才 8/24 五 13:30~17:30PDB database and structure files 張天豪 8/27 一 8:30~12:30Extracting ATOM information 張天豪 8/27 一 13:30~17:30Mapping of Protein Sequence IDs and Structure IDs 張天豪 8/31 五 13:30~17:30Final and Examination 曾宇鳳
Process Management Introduction
system system(“date”); ` ` `date`; exec exec(“date”);
Introduction - system system (“ … ”); Example system “date”; system 'ls', "-al", '/home/course1/'; system ‘for i in *; do echo == $i ==; cat $i; done’; system “~/course5/output.pl”; Return 0 if success
Execute ‘date’ by system #!/usr/bin/perl -w # date1.pl : execute a shell command - date my $ret = system "date"; print "return = $ret\n";
Execute ‘date’ by system without message #!/usr/bin/perl -w # date2.pl : execute a shell command - date my $ret = system "date > /dev/null"; print "return = $ret\n";
Introduction - ` ` ` … ` Example $now = `date`; $result = `ls –al /home/course1/`; $result = `for i in *; do echo == $i ==; cat $i; done`; `~/course5/output.pl`;
Execute ‘date’ by system without message #!/usr/bin/perl -w # date3.pl : execute a shell command - date my $ret = `date`; print "return = [$ret]\n";
Print message to STDOUT and STDERR #!/usr/bin/perl -w # output.pl : output a message to STDOUT and STDERR print STDOUT "print to STDOUT\n"; print STDERR "print to STDERR\n";
Print message to STDOUT and STDERR [course5]$./output.pl print to STDOUT print to STDERR [course5]$./output.pl > log print to STDERR [course5]$ cat log print to STDOUT [course5]$ (./output.pl 2>&1) > log [course5]$ cat log print to STDOUT print to STDERR
Execute ‘date’ by system without message #!/usr/bin/perl -w # redirect.pl : STDOUT and STDERR my $ret = system "./output.pl"; print "redirect nothing ($ret)\n"; $ret = system "./output.pl 1>/dev/null"; print "redirect STDOUT to /dev/null ($ret)\n"; $ret = system "./output.pl 1>/dev/null 2>&1"; print "redirect STDOUT and STDERR to /dev/null ($ret)\n";
Print message to STDOUT and STDERR #!/usr/bin/perl -w # exec_output1.pl : execute output.pl my $ret = `./output.pl`; chomp($ret); print "return = [$ret]\n";
Print message to STDOUT and STDERR #!/usr/bin/perl -w # exec_output2.pl : execute output.pl my $ret = `./output.pl 2>&1`; chomp($ret); print "return = [$ret]\n";
%ENV Shell command env Example $ENV{‘PATH’} $ENV{‘HOME’} $ENV{‘HOSTNAME’} $ENV{‘USER’}
%ENV #!/usr/bin/perl -w # env.pl : execute a shell command - env = `env`; foreach { chomp; if (/PATH/) { print "$_\n"; } print "PATH=$ENV{'PATH'}\n";
Process Management Arguments, Here-document
Arguments parsing use Getopt::Std; getopts( "hvf:", \%opt ) or usage(); usage() if $opt{h}; usage() if (!defined{$opt{f}); sub usage() { print STDERR << "EOF"; usage: $0 [-hv] [-f file] -h : this (help) message -v : verbose output -f file : file containing usersnames, one per line example: $0 -v -f file EOF exit; }
Here-document print <<EOF; print me!!!! print you!!!! print us!!!! EOF print << x 3; print me!!!!
Exercise system and ` `
Quiz – system & ` ` Are they workable ? system ‘for i in *; do echo == $i ==; cat $i; done’; $result = `for i in *; do echo == $i ==; cat $i; done`; Why ?
Quiz – sleep10.pl #!/usr/bin/perl -w # sleep10.pl : sleep 10 seconds foreach (1..10) { print "$_\n"; sleep 1; }
Quiz – system & ` ` Do they execute by background mode? system ‘./sleep10.pl &’; $result = `./sleep10.pl &`; Why ?
Project BLAST, ClustalW
Project1 - BLAST Todo Get the result from Blast Extract its homology (evalue <= 10^-1) Input A protein sequence (FASTA format) Output All homologous sequences of the query sequence
BLAST Get BLAST packages ftp://ftp.ncbi.nih.gov/blast/ ftp://ftp.ncbi.nih.gov/blast/ Get nr database ftp://ftp.ncbi.nih.gov/blast/db/ ftp://ftp.ncbi.nih.gov/blast/db/ Command ~/tools/PSSM/BLAST/blastall -p blastp -i P53_HUMAN.fa -o output.txt -d /home/sbb/tools/PSSM/BLAST/db/SwissProt.v50.fa –m 9
Project2 - ClustalW Todo do multiple sequence alignment by ClustalW Input A protein sequence with its homology (FASTA format) Output The conserved score of each residue in the query sequence
ClustalW Get ClustalW package ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/ ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/ Command ./clustalW