96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

Slides:



Advertisements
Similar presentations
Arrays A list is an ordered collection of scalars. An array is a variable that holds a list. Arrays have a minimum size of 0 and a very large maximum size.
Advertisements

Lecture 6 More advanced Perl…. Substitute Like s/// function in vi: #cut with EcoRI and chew back $linker = “GGCCAATTGGAAT”; $linker =~ s/CAATTG/CG/g;
Regular Expressions A simple and powerful way to match characters Laurent Falquet, EPFL March, 2005 Swiss Institute of Bioinformatics Swiss EMBnet node.
● Perl reference
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Programming Perls* Objective: To introduce students to the perl language. –Perl is a language for getting your job done. –Making Easy Things Easy & Hard.
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Perl Basics A Perl Tutorial NLP Course What is Perl?  Practical Extraction and Report Language  Interpreted Language Optimized for String Manipulation.
CSC3530 Software Technology Tutorial Two PERL Basics.
Physical Mapping II + Perl CIS 667 March 2, 2004.
3ex.1 Note: use strict on the first line Because of a bug in the Perl Express debugger you have to put “use strict;” on the first line of your scripts.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Shell Script Examples.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
The foreach LooptMyn1 The foreach Loop The foreach loop gives an easy way to iterate over arrays. foreach works only on arrays, and will issue an error.
Practical Extraction & Report Language PERL Joseph Beltran.
Introduction to Perl Practical Extraction and Report Language or Pathologically Eclectic Rubbish Lister or …
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
LIN 6932 Unix Lecture 6 Hana Filip. LIN 6932 HW6 - Part II solutions posted on my website see syllabus.
Unix Talk #2 (sed). 2 You have learned…  Regular expressions, grep, & egrep  grep & egrep are tools used to search for text in a file  AWK -- powerful.
An Introduction to Unix Shell Scripting
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
UNIX Shell Script (1) Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Sed, awk, & perl CS 2204 Class meeting 13 *Notes by Mir Farooq Ali and other members of the CS faculty at Virginia Tech. Copyright 2003.
Introduction to Perl Yupu Liang cbio at MSKCC
Books. Perl Perl (Practical Extraction and Report Language) by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl 5 was released.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Introduction to Unix – CS 21
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Introduction to Programming the WWW I CMSC Winter 2003.
LIN Unix Lecture 7 Hana Filip. LIN Text Processing Command Line Utility Programs (cont.) sed LAST WEEK wc sort tr uniq awk TODAY join paste.
Introducing System Managers to Win32 Perl Programming Tim Christian College of Arts and Sciences Computing Support Services.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Topic 4:Subroutines CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 4, pages 56-72, Programming Perl 3rd edition pages 80-83,
Chapter Twelve sed, awk & perl1 System Programming sed, awk & perl.
Random Bits of Perl None of this stuff is worthy of it’s own lecture, but it’s all a bunch of things you should learn to use Perl well.
Getting started in Perl: Intro to Perl for programmers Matthew Heusser – xndev.com - Presented to the West Michigan Perl User’s Group.
Introduction to Perl October 4, 2004 Class Meeting 7 * Notes on Perl by Lenwood Heath, Virginia Tech © 2004.
1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
A Few More Functions. One more quoting operator qw// Takes a space separated sequence of words, and returns a list of single-quoted words. –no interpolation.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Week Four Agenda Link of the week Review week three lab assignment This week’s expected outcomes Next lab assignment Break-out problems Upcoming deadlines.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
2000 Copyrights, Danielle S. Lahmani Foreach example = ( 3, 5, 7, 9) foreach $one ) { $one*=3; } is now (9,15,21,27)
The Scripting Programming Language
PZ02CX Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02CX - Perl Programming Language Design and Implementation.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
Perl: Practical Extraction & Reporting Language RL Schwartz, Learning Perl, RL Schwartz & L Wall, Programming Perl, O’Reilly & Associates.
CS 403: Programming Languages Lecture 20 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.
Week Four Agenda Announcements Link of the week Review week three lab assignment This week’s expected outcomes Next lab assignment Break-out problems.
Introduction to Perl: Practical extraction and report language
Perl Programming Language Design and Implementation (4th Edition)
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Perl Variables: Array Web Programming.
Presentation transcript:

96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯

課前準備 課程網頁  安裝流程  抓 Putty / Pietty  連上  wget course/doc/course1.tgzhttp://gene.csie.ntu.edu.tw/~sbb/summer- course/doc/course1.tgz  tar zxvf course1.tgz

序號姓名帳號 1 許郁彬 course1 2 杜羿樞 course2 3 黃裕雄 course3 4 王建智 course4 5 陳士杰 course5 6 莊智傑 course6 7 朱柏威 course7 8 洪文峯 course8 9 吳耿豪 course9 10 張雯琪 course10 11 王悅 course11 12 張嘉芸 course12 13 林義峰 course13 14 游棨元 course14 15 許育堂 course15 16 陳建瑋 course16 17 黃國鑫 course17 18 翁小涵 course18 19 郭建鴻 course19 20 曾意儒 course20

Appendix Scalar, Array, Hash

Variable reset (1/2) $scalar = undef; $scalar = “”; $scalar = = (); %hash = ();

Variable reset = undef; print

Array = ("one", "two", "three"); my $number = ("one", "two", "three"); print print print $#number."\n"; print $number."\n";

= qw" "; print

Array – sort by number #! 5, 4, 22, 9, {$a print join "\n\n";

Hash – show all elements #! /usr/bin/perl -w %nucleotide_bases = ( A => Adenine, T => Thymine, G => Guanine, C => Cytosine ); while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n"; } foreach $key (keys %nucleotide_bases) { print "$key ====> $nucleotide_bases{$key}\n"; }

Hash – reverse with identical values %nucleotide_bases = ( A => Adenine, T => Thymine, G => Adenine, C => Cytosine ); while (($key, $value)=each %nucleotide_bases) { print "$key ====> $value\n"; } %reverse = reverse %nucleotide_bases; while (($key, $value)=each %reverse) { print "$key ====> $value\n"; }

Hash – the number of elements How to know the number of elements in a hash? Ex: my %hash = ( 'a'=>1, 'b'=>2); print scalar(keys(%hash))."\n";

Comment # This is a comment =This is a comment, too =This is a comment, three =cut print "Really ?\n";

Appendix STDIN, <>, our/my

$_ - extract data from while ( ) {print;} if ( ) {print;}

<>; $line = <>; #! /usr/bin/perl -w while ( $line = <> ) { print $line; } Processing Data Files (like UNIX command : cat) #! /usr/bin/perl -w while (<> ) { print; }

Others … while (defined($_ = <>)) { print; } while ($_ = <>) { print; } while (<>) { print; } for (;<>;) { print; } print while defined($_ = <>); print while ($_ = <>); print while <>;

our/my my $var; $var = 1; { my $var; $var = 2; print $var,"\n"; } print $var, "\n"; our $var; $var = 1; { our $var; $var = 2; print $var,"\n"; } print $var, "\n";

Appendix Regular expression

Reserved word open log, ">test.txt“ or die “…”; print log "test\n"; close log;

Magic diamond - <> print “$_” while (<>); print “$_” while ( );

Get the list of files in the current directory = ; = glob("*.pl");

Greedy matching my $string = "course1:x:509:510::/home/course1: /bin/bash"; if ($string =~ /(.*):/) { print "matched string = [$1]\n"; } #How to match the first column ?

Greedy matching my $string = "course1:x:509:510::/home/course1:/bin/bash"; if ($string =~ /^([\S]*):/) { print "matched string = [$1]\n"; } if ($string =~ /^([\S]*?):/) { print "matched string = [$1]\n"; } if ($string =~ /([^:]*):/) { print "matched string = [$1]\n"; }

Substitution – remove all x $_ = "China xxxxxx Taiwan"; s/x*//; # How to rewrite ? print; China xxxxx Taiwan

Quoted syntax SymbolGeneralDescriptionInterpolated ‘ q/ /StringNo “ qq/ /StringYes ` qx/ /ExecutionYes ( )qw/ /List of wordsNo / m/ /Pattern matchingYes s/ / / SubstitutionYes y/ / /tr/ / /transliterationNo “ qr/ /Regular expressionYes

Appendix Useful techniques

Shell command – file/directory mkdir(“doc”,0x744); chdir(“doc”); rmdir(“doc”); unlink(“log.txt”); chmod(0x700, “log1.txt”, “log2.txt”,”log3.txt”); rename (“old_name”, “new_name”); chown(,,”log1.txt”,”log2.txt”,”log3.txt”);

Perl Usage: perl [switches] [--] [programfile] [arguments] -c check syntax only (runs BEGIN and CHECK blocks) -d[:debugger] run program under debugger -e program one line of program (several -e's allowed, omit programfile) -i[extension] edit <> files in place (makes backup if extension supplied) -n assume "while (<>) {... }" loop around program -p assume loop like -n but print line also, like sed -u dump core after parsing program -v print version, subversion -w enable many useful warnings (RECOMMENDED) -W enable all warnings -X disable all warnings

Removal of ^M perl -pi.bak -e 's/\r//g;' index.html

File Copy #! /usr/bin/perl use File::Copy; copy("file1", "file2");

Reserved word for debug __FILE__ __LINE__ Ex: print "FILE:".__FILE__." LINE:".__LINE__."\n";

Debug Perl –d “program name”

Debug $perlcc –d test.pl

Special variable $_the last assignment $!Error message $$current process ID $?the status when the previous child process end $”the separator of the list $/ $ `,$&,$ ’ string matching $+the last backreference @_arguments of a subroutine

Bytecode generator $perlcc -B -o test test3.pl

CPAN perl -MCPAN -e "install GD"

BioPerl

PSI-BLAST Position Specific Iterative BLAST constructs a multiple sequence alignment then creates a position-specific scoring matrix (PSSM) Query Sequence Blast Sequence database PSSM Multiple sequence alignment Homologous proteins Blast New homologous proteins

PSSM (1/4) GHEGVGKVVKLGAGA GHEKKGYFEDRGPSA GHEGYGGRSRGGGYS GHEFEGPKGCGALYI GHELRGTTFMPALEC Query Sequence Homologous proteins A C D E F G H I K L M N P Q R S T V W Y Frequency Column 1: f A,1 =0/5, f C,1 =0/5, …, f G,1 =5/5, … Column 2: f A,1 =0/5, f C,1 =0/5, …, f H,1 =5/5, … … Column 15: f A,1 =2/5, f C,1 =1/5, …, f S,1 =1/5, …

PSSM (2/4) The original data: Column 1: f A,1 =0/5, f C,1 =0/5, …, f G,1 =5/5, … Column 2: f A,1 =0/5, f C,1 =0/5, …, f H,1 =5/5, … … Column 15: f A,1 =2/5, f C,1 =1/5, …, f S,1 =1/5, … Set a pseudo-counts of 1: Column 1: f’ A,1 = (0+1)/(5+20),f’ C,1 = (0+1)/(5+20),…,f’ G,1 = (1+1)/(5+20),… Column 2: f’ A,1 = (0+1)/(5+20),f’ C,1 = (0+1)/(5+20),…,f’ H,1 = (1+1)/(5+20),… … Column 15: f’ A,1 = (2+1)/(5+20),f’ C,1 = (1+1)/(5+20),…,f’ S,1 = (1+1)/(5+20),…

PSSM (3/4) The score is derived from the ratio of the observed to the expected frequencies. More precisely, the logarithm of this ratio is taken and refereed to as the log- likelihood ratio: where Score i,j is the score for residue i at position j, f’ ij is the relative frequency for a residue i at position j and q i is the expected relative frequency of residue i in a random sequence.

PSSM (4/4) A C D E F G H I 0.7 K L M N P Q R S T V W Y