Your Home Directory When you login to the server, you always start in your Home directory. Create sub-directories to store specific projects or groups.

Slides:



Advertisements
Similar presentations
Introduction to Unix – CS 21 Lecture 11. Lecture Overview Shell Programming Variable Discussion Command line parameters Arithmetic Discussion Control.
Advertisements

Programming Perls* Objective: To introduce students to the perl language. –Perl is a language for getting your job done. –Making Easy Things Easy & Hard.
Perl for Bioinformatics Lecture 4. Variables - review A variable name starts with a $ It contains a number or a text string Use my to define a variable.
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
T UTORIAL OF U NIX C OMMAND & SHELL SCRIPT S 5027 Professor: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2015.
Unix Filters Text processing utilities. Filters Filter commands – Unix commands that serve dual purposes: –standalone –used with other commands and pipes.
UNIX Filters.
L INUX C OMMAND L INE I NTERFACE G UNAANBAN.G
Shell Scripting Awk (part1) Awk Programming Language standard unix language that is geared for text processing and creating formatted reports but it.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Using the Unix Shell There is No ‘Undelete’. The Unix Shell “A Unix shell is a command-line interpreter or shell that provides a traditional user interface.
1 THE UNIX FILE SYSTEM By Chokechai Chuensukanant ID COSC 513 Operating System.
Unix Primer. Unix Shell The shell is a command programming language that provides an interface to the UNIX operating system. The shell is a “regular”
Lesson 7-Creating and Changing Directories. Overview Using directories to create order. Managing files in directories. Using pathnames to manage files.
Linux Directory Navigation. File & Directory Commands This is a minimal list of Unix commands that you must know for file management: ls (list) mkdir.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
1 Lecture 2 Working with Files and Directories COP 3344 Introduction to UNIX.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Unix Basics Chapter 4.
Introduction to Unix (CA263) File Processing. Guide to UNIX Using Linux, Third Edition 2 Objectives Explain UNIX and Linux file processing Use basic file.
LIN Unix Lecture 3 Hana Filip. LIN UNIX Resources UNIX Tutorials UNIX help for.
Subroutines and Files Bioinformatics Ellen Walker Hiram College.
Linux+ Guide to Linux Certification, Third Edition
Linux Operations and Administration
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Bioinformatics 生物信息学理论和实践 唐继军
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Bioinformatics 生物信息学理论和实践 唐继军
Introduction to Perl Yupu Liang cbio at MSKCC
Books. Perl Perl (Practical Extraction and Report Language) by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl 5 was released.
Chapter 10: BASH Shell Scripting Fun with fi. In this chapter … Control structures File descriptors Variables.
Bioinformatics 生物信息学理论和实践 唐继军
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
Perl II Part III: Motifs and Loops. Objectives Search for motifs in DNA or Proteins Interact with users at the keyboard Write data to files Use loops.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Chapter Five Advanced File Processing. 2 Lesson A Selecting, Manipulating, and Formatting Information.
Bioinformatics 生物信息学理论和实践 唐继军
UNIX An Introduction. Brief History UNIX UNIX Created at Bell Labs, 1969 Created at Bell Labs, 1969 BSD during mid 70s BSD during mid 70s AT&T began offering.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
CPTG286K Programming - Perl Chapter 1: A Stroll Through Perl Instructor: Denny Lin.
Λειτουργικά Συστήματα – Lab2 Γιάννης Πετράκης. Directory Navigation and Control  The Unix file system is set up like a tree branching out from the root.
Introduction to Perl NICOLE VECERE. Background General Purpose Language ◦ Procedural, Functional, and Object-oriented Developed for text manipulation.
Linux Commands C151 Multi-User Operating Systems.
Department of Electrical and Computer Engineering Introduction to Perl By Hector M Lugo-Cordero August 26, 2008.
Agenda Positional Parameters / Continued... Command Substitution Bourne Shell / Bash Shell / Korn Shell Mathematical Expressions Bourne Shell / Bash Shell.
Introduction to Perl. What is Perl Perl is an interpreted language. This means you run it through an interpreter, not a compiler. Similar to shell script.
1 Lecture 2 Working with Files and Directories COP 3353 Introduction to UNIX.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Sed. Class Issues vSphere Issues – root only until lab 3.
PERL By C. Shing ITEC Dept Radford University. Objectives Understand the history Understand constants and variables Understand operators Understand control.
Part 4 Arrays: Stacks foreach command Regular expressions: String structure analysis and substrings extractions and substitutions Command line arguments:
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
BINF 634 Fall LECTURE061 Outline Lab 1 (Quiz 3) Solution Program 2 Scoping Algorithm efficiency Sorting Hashes Review for midterm Quiz 4 Outline.
File Management commands cat Cat command cat cal.txt cat command displays the contents of a file here cal.txt on screen (or standard out).
Linux Administration Working with the BASH Shell.
Tutorial of Unix Command & shell scriptS 5027
Lesson 5-Exploring Utilities
Prepared by: Eng. Maryam Adel Abdel-Hady
Lecture 2 Working with Files and Directories
C151 Multi-User Operating Systems
Tutorial of Unix Command & shell scriptS 5027
Tutorial of Unix Command & shell scriptS 5027
LING 408/508: Computational Techniques for Linguists
Perl for Bioinformatics
Tutorial of Unix Command & shell scriptS 5027
Linux Shell Script Programming
Presentation transcript:

Your Home Directory When you login to the server, you always start in your Home directory. Create sub-directories to store specific projects or groups of information, just as you would place folders in a filing cabinet. Do not accumulate thousands of files with cryptic names in your Home directory

File & Directory Commands This is a minimal list of Linux commands that you must know for file management: All of these commands can be modified with many options. Learn to use Linux ‘man’ pages for more information. ls (list)mkdir (make directory) cd (change directory)pwd (present directory) cp (copy) rm (remove) mv (move)more (view by page) cat (view entire)man (help)

Navigation pwd (present working directory) shows the name and location of the directory where you are currently working: > pwd /home/jtang This is a “pathname,” the slashes indicate sub-directories The initial slash is the “root” of the whole filesytem ls (list) gives you a list of the files in the current directory: > ls assembin4.fasta Misc test2.txt bin temp testfile Use the ls -l (long) option to get more information about each file > ls -l total 1768 drwxr-x--- 2 browns02 users 8192 Aug 28 18:26 Opioid -rw-r browns02 users 6205 May af gb_in2 -rw-r browns02 users May af fasta

Sub-directories cd (change directory) moves you to another directory >cd Misc > pwd /u/browns02/Misc mkdir (make directory) creates a new sub-directory inside of the current directory > ls assembler phrap space > mkdir subdir > ls assembler phrap space subdir rmdir (remove directory) deletes a sub- directory, but the sub-directory must be empty > rmdir subdir > ls assembler phrap space

Shortcuts There are some important shortcuts in Linux for specifying directories. (dot) means "the current directory".. means "the parent directory" - the directory one level above the current directory, so cd.. will move you up one level ~ (tilde) means your Home directory, so cd ~ will move you back to your Home. Just typing a plain cd will also bring you back to your home directory

Create new files pico nano vi/vim emacs

Linux File Protections File protection (also known as permissions) enables the user to set up a file so that only specific people can read (r), write/delete (w), and execute (x) it. Write and delete privilege are the same on a Linux system since write privilege allows someone to overwrite a file with a different one.

File Owners and Groups Linux file permissions are defined according to ownership. The person who creates a file is its owner. You are the owner of files in your Home directory and all its sub-directories In addition, there is a concept known as a Group. Members of a group have privileges to see each other's files. We create groups as the members of a single lab - the students, technicians, postdocs, visitors, etc. who work for a given PI.

View File Permissions Use the ls -l command to see the permissions for all files in a directory: The username of the owner is shown in the third column. (The owner of the files listed above is jtang) The owner belongs to the group “None” The access rights for these files is shown in the first column. This column consists of 10 characters known as the attributes of the file: r, w, x, and - r indicates read permission w indicates write (and delete) permission x indicates execute (run) permission - indicates no permission for that operation $ ls -l total 2 -rw-r--r-- 1 jtang None 56 Feb 29 11:21 data.txt -rwxr-xr-x 1 jtang None 33 Feb 29 11:21 test.pl

Change Protections Only the owner of a file can change its protections To change the protections on a file use the chmod (change mode) command. [Beware, this is a confusing command.] Taken all together, it looks like this: > chmod 644 data.txt This will set the owner to have read, write; add the permission for the group and the world to read 600, 755, 700,

Commands for Files Files are used to store information, for example, data or the results of some analysis. You will mostly deal with text files Files on the RCR Alpha are automatically backed up to tape every night. cat dumps the entire contents of a file onto the screen. For a long file this can be annoying, but it can also be helpful if you want to copy and paste (use the buffer of your telnet program)

more Use the command more to view at the contents of a file one screen at a time: > more t27054_cel.pep !!AA_SEQUENCE 1.0 P1;T hypothetical protein Y49E Caenorhabditis elegans Length: 534 May 30, :49 Type: P Check: MLKKAPCLFG SAIILGLLLA AAGVLLLIGI PIDRIVNRQV IDQDFLGYTR 51 DENGTEVPNA MTKSWLKPLY AMQLNIWMFN VTNVDGILKR HEKPNLHEIG 101 PFVFDEVQEK VYHRFADNDT RVFYKNQKLY HFNKNASCPT CHLDMKVTIP t27054_cel.pep (87%) Hit the spacebar to page down through the file Ctrl-U moves back up a page At the bottom of the screen, more shows how much of the file has been displayed Similar command: less

Copy & Move cp lets you copy a file from any directory to any other directory, or create a copy of a file with a new name in one directory cp filename.ext newfilename.ext cp filename.ext subdir/newname.ext cp /u/jdoe01/filename.ext./subdir/newfilename.ext mv allows you to move files to other directories, but it is also used to rename files. Filename and directory syntax for mv is exactly the same as for the cp command. mv filename.ext subdir/newfilename.ext NOTE: When you use mv to move a file into another directory, the current file is deleted.

Delete Use the command rm (remove) to delete files There is no way to undo this command!!! We have set the server to ask if you really want to remove each file before it is deleted. You must answer “Y” or else the file is not deleted. But can use –f rm –rf

Some More Advanced Linux Commands grep: searches a file for a specific text pattern cut: copies one or more columns from a tab-delimited text file wc: word count | : the pipe — sends output of one command as input to the next > : redirect output to a file sed : stream editor – change text inside a file

Perl

Basic Concepts Variables and Assignment Conditions Loop Input/Output (I/O) Procedures/functions

Strings Text is handled in Perl as a string This basically means that you have to put quotes around any piece of text that is not an actual Perl instruction. Perl has two kinds of quotes - single ‘ and double “ (they are different- single quote will print as is)

Print Perl uses the term “print” to create output Without a print statement, you won’t know what your program has done You need to tell Perl to put a carriage return at the end of a printed line Use the “\n” (newline) command Include the quotes The “\” character is called an escape - Perl uses it a lot

#!/usr/bin/perl $DNA = 'ACGT'; # Next, we print the DNA onto the screen print $DNA, "\n"; print '$DNA\n'; print "$DNA\n"; exit;

Do the Math (your 2nd Perl program) #!/usr/bin/perl print " 4+5\n " ; print 4+5, " \n " ; print " 4+5= ", 4+5, " \n " ; [Note: use commas to separate multiple items in a print statement, whitespace is ignored]

Variables To be useful at all, a program needs to be able to store information from one line to the next Perl stores information in variables A variable name starts with the “$” symbol, and it can store strings or numbers Variables are case sensitive Give them sensible names Use the “=”sign to assign values to variables $one_hundred = 100; $my_sequence = " ttattagcc ";

= (1, 2, 3); shift, unshift push, pop Index using []

Hash %TABLE=(A=>’CGT’, B=>’CCC’,);

Hash Initialize: my %hash = (); Add key/value pair: $hash{$key} = $value; Add more keys: %hash = ( 'key1', 'value1', 'key2', 'value2 ); %hash = ( key1 => 'value1', key2 => 'value2', ); Delete: delete $hash{$key};

Strings (text) in variables can be used for some math-like operations Concatenate (join) use the dot. operator $seq1= " ACTG " ; $seq2= " GGCTA " ; $seq3= $seq1. $seq2; print $seq3; ACTGGGCTA String Operations

#!/usr/bin/perl -w $DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; $DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA'; $DNA3 = "$DNA1$DNA2"; $DNA4 = $DNA1. $DNA2; exit;

#!/usr/bin/perl –w $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; print "Here is the starting DNA:\n\n"; print "$DNA\n\n"; # Transcribe the DNA to RNA by substituting all T's with U's. $RNA = $DNA; $RNA =~ s/T/U/g; # Print the RNA onto the screen print "Here is the result of transcribing the DNA to RNA:\n\n"; print "$RNA\n"; # Exit the program. exit;

#!/usr/bin/perl -w # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file open(PROTEINFILE, $proteinfilename); $protein = ; # Now that we've got our data, we can close the file. close PROTEINFILE; # Print the protein onto the screen print "Here is the protein:\n\n"; print $protein; exit;

#!/usr/bin/perl -w # The filename of the file containing the protein sequence data $proteinfilename = 'NM_021964fragment.pep'; # First we have to "open" the file open(PROTEINFILE, $proteinfilename); # Read the protein sequence data from the file, and store it # into the = ; # Print the protein onto the screen # Close the file. close PROTEINFILE; exit;

#!/usr/bin/perl -w # array = ('A', 'C', 'G', 'T'); print print $bases[0], "\n"; print $bases[1], "\n"; print $bases[2], "\n"; print $bases[3], "\n"; exit;

String functions Chomp Length of a string Substring

#!/usr/bin/perl -w $proteinfilename = 'NM_021964fragment.pep'; open(PROTEINFILE, $proteinfilename); $protein = ; close PROTEINFILE; chomp $protein; $len = length $protein; print $len, ""; exit;

#!/usr/bin/perl -w $name = "PALLAPP"; $st1 = substr($name, 3); $st2 = substr($name, 1, 2);

Comparison String comparison (are they the same, > or <) eq (equal ) ne (not equal ) ge (greater or equal ) gt (greater than ) lt (less than ) le (less or equal )

#!/usr/bin/perl –w $word = 'MNIDDKL'; if($word eq 'QSTVSGE') { print "QSTVSGE\n"; } elsif($word eq 'MRQQDMISHDEL') { print "MRQQDMISHDEL\n"; } elsif ( $word eq 'MNIDDKL' ) { print "MNIDDKL-the magic word!\n"; } else { print "Is \”$word\“ a peptide?\n"; } exit;

$x = 10; $y = -20; if ($x <= 10) { print "1st true\n";} if ($x > 10) {print "2nd true\n";} if ($x -21) {print "3rd true\n";} if ($x > 5 && $y < 0) {print "4th true\n";} if (($x > 5 && $y 5) {print "5th true\n";}

But Use ==,, >=, !=, ||, && for numeric numbers Use eq, lt, le, gt, ge, ne, or, and for string comparisons

$x = 10; $y = -20; if ($x le 10) { print "1st true\n";} if ($x gt 5) {print "2nd true\n";} if ($x le 10 || $y gt -21) {print "3rd true\n";} if ($x gt 5 && $y lt 0) {print "4th true\n";} if (($x gt 5 && $y lt 0) || $y gt 5) {print "5th true\n";}

#!/usr/bin/perl -w $num = 1234; $str = '1234'; print $num, " ", $str, "\n"; $num_or_str = $num + $str; print $num_or_str, "\n"; $num_or_str = $num. $str; print $num_or_str, "\n"; exit;

More Arithmatics +, -, *, **, /, % +=, -=, *=, **=, /=, %= ++, --

$x = 10; $x = $x*1.5; print $x*=3, "\n"; print $x++, "\n"; print $x, "\n"; print ++$x, "\n"; print $x, "\n"; print $x % 3, "\n"; print $x**2, "\n";

#!/usr/bin/perl -w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ; close DNAFILE; $DNA = $DNA =~ s/\s//g; $count_of_CG = 0; $position = 0; while ( $position < length $DNA) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } $position++; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

#!/usr/bin/perl –w print "Please type the filename of the DNA sequence data: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ; close DNAFILE; $DNA = $DNA =~ s/\s//g; $count_of_CG = 0; for ( $position = 0 ; $position < length $DNA ; ++$position ) { $base = substr($DNA, $position, 1); if ( $base eq 'C' or $base eq 'G') { ++$count_of_CG; } print "CG content is ", $count_of_CG/(length $DNA)*100, "%\n";

$DNA = "ACCTAAACCCGGGAGAATTCCCACCAATTCTACGTAAC"; $s = ""; for ($i = 0, $j = 5; $i < $j; $i+=2, $j++) { $s.= substr($DNA, $i, $j); } print $s, "\n";

sub extract_sequence_from_fasta_data { my $sequence = ''; foreach my $line { if ($line =~ /^\s*$/) { next; } elsif($line =~ /^\s*#/) { next; } elsif($line =~ /^>/) { next; } else { $sequence.= $line; } # remove non-sequence data (in this case, whitespace) from $sequence string $sequence =~ s/\s//g; return $sequence; }

Subroutine Some code needs to be reused A good way to organize code Called “function” in some languages Name Return Parameters

#!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ; close DNAFILE; $DNA = $DNA =~ s/\s//g; $count_of_G = countG($DNA); print $count_of_G; sub countG { my($dna) my($count) = 0; $count = ( $dna =~ tr/Gg//); return $count; }

#!/usr/bin/perl –w print "Please type the filename: "; $dna_filename = ; chomp $dna_filename; open(DNAFILE, $dna_filename); $name = = ; close DNAFILE; $DNA = $DNA =~ s/\s//g; $count_of_G = count($DNA, 'Gg'); print $count_of_G; sub count { my($dna, $pattern) my($count) = 0; $count = ( eval("$dna =~ tr/$pattern//") ); return $count; }

Scope my provides lexical scoping; a variable declared with my is visible only within the block in which it is declared. Blocks of code are hunks within curly braces {}; files are blocks. Use use vars qw([list of var names]) or our ([var_names]) to create package globals.

#!/usr/bin/perl -w use Bio; use strict; use warnings; my $DNA = fasta_read(); print "First ", dna2peptide($DNA), "\n"; print "Second ", dna2peptide(substr($DNA, 1)), "\n"; print "Third ", dna2peptide(substr($DNA, 2)), "\n"; $DNA = reverse $DNA; $DNA =~ tr/ACGTacgt/TGCAtgca/; print "Fourth ", dna2peptide($DNA), "\n"; print "Fifth ", dna2peptide(substr($DNA, 1)), "\n"; print "Sixth ", dna2peptide(substr($DNA, 2)), "\n";

my $x = 10; for (my $x = 0; $x < 5; $x++) { Scope(); print $x, "\n"; } print $x, "\n"; sub Scope { my $x = 0; }

sub IUB_to_regexp { my($iub) my $regular_expression = ‘’; my %iub2character_class = ( A => 'A', C => 'C', G => 'G', T => 'T', R => '[GA]', Y => '[CT]', M => '[AC]', K => '[GT]', S => '[GC]', W => '[AT]', B => '[CGT]', D => '[AGT]', H => '[ACT]', V => '[ACG]', N => '[ACGT]', ); $iub =~ s/\^//g; for ( my $i = 0 ; $i < length($iub) ; ++$i ) { $regular_expression.= $iub2character_class{substr($iub, $i, 1)}; } return $regular_expression; }

Hash Initialize: my %hash = (); Add key/value pair: $hash{$key} = $value; Add more keys: %hash = ( 'key1', 'value1', 'key2', 'value2 ); %hash = ( key1 => 'value1', key2 => 'value2', ); Delete: delete $hash{$key};

sub codon2aa { my($codon) $codon = uc $codon; my %genetic_code = ( 'TCA' => 'S', # Serine 'TCC' => 'S', # Serine 'TCG' => 'S', # Serine 'TCT' => 'S', # Serine 'TTC' => 'F', # Phenylalanine 'TTT' => 'F', # Phenylalanine 'TTA' => 'L', # Leucine 'TTG' => 'L', # Leucine #Many more ); if(exists $genetic_code{$codon}) { return $genetic_code{$codon}; }else{ print STDERR "Bad codon \"$codon\"!!\n"; exit; }

sub parseREBASE { my($rebasefile) = ( ); my %rebase_hash = ( ); my $name; my $site; my $regexp; open($rebase_filehandle, $rebasefile) or die "Cannot open file\n"; while( ) { # Discard header lines ( 1.. /Rich Roberts/ ) and next; # Discard blank lines /^\s*$/ and next; # Split the two (or three if includes parenthesized name) fields = split( " ", $_); $name = $site = # Translate the recognition sites to regular expressions $regexp = IUB_to_regexp($site); # Store the data into the hash $rebase_hash{$name} = "$site $regexp"; } # Return the hash containing the reformatted REBASE data return %rebase_hash; }

Range ( 1.. /Rich Roberts/ ) and next from first line till some line containing Rich Roberts If that is true, it will check the statement after "and" If that is not true, it will not check the statement after "and" open(…) or die If can open, the statement is already true, no need to check the statement after "or" If cannot open, the statement is false, need to check the statement after "or" to see if it can be true

@fred = = = = qw(one = ($a,$b,$c) = = = = = (1,2,3); $fred[3] = "hi"; $fred[6] = "ho"; is now (1,2,3,"hi",undef,undef,"ho")

Array operators push and pop (right-most = (1,2,3); $oldvalue = shift and unshift (left-most = (5,6,7); $x = = = = =

Print to file Open a file to print open FILE, ">filename.txt"; open (FILE, ">filename.txt“); Print to the file print FILE $str;

#!/usr/bin/perl print "My name is $0 \n"; print "First arg is: $ARGV[0] \n"; print "Second arg is: $ARGV[1] \n"; print "Third arg is: $ARGV[2] \n"; $num = $#ARGV + 1; print "How many args? $num \n"; print "The full argument string \n";

Regular Expression ^ beginning of string $ end of string. any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; | alternative ( ) grouping; “storing” [ ] set of characters { } repetition modifier \ quote or special

Repeats a*zero or more a’s a+one or more a’s a?zero or one a’s (i.e., optional a) a{m}exactly m a’s a{m,}at least m a’s a{m,n}at least m but at most n a’s

\

[]

Perl tr/// function tr means transliterate – replaces a character with another character $dna =~ tr/a/c/ replaces all “a” with “c” in in $dna It also works on a range: $dna =~ tr/a-z/A-Z/ replaces all lower case letters with upper case tr also counts $count = ($string =~ tr/A//) (you might think this also deletes all “A” from the string, but it doesn’t)

Wildcards Perl has a set of wildcard characters for Reg. Exps. that are completely different than the ones used by Unix the dot (. ) matches any character \d matches any digit (a number from 0-9) \w matches any text character (a letter or number, not punctuation or space) \s matches white space (any amount) ^ matches the beginning of a line $ matches the end of a line (Yes, this is very confusing!)

Repeat for a count Use curly brackets to show that a character repeats a specific number (or range) of times: find an EcoRI fragment of bp length (two EcoRI sites with any other sequence between): if $ecofrag =~ /GAATTC[GATC]{100,500}GAATTC/ The + sign is used to indicate an unlimited number of repeats (occurs 1 or more times)

my $mystring; $mystring = "Hello world!"; if($mystring =~ m/World/) { print "Yes"; } if($mystring =~ m/World/i) { print "Yes"; }

Grabbing parts of a string Regular expressions can do more than just ask ‘ if ” questions They can be used to extract parts of a line of text into variables; Check this out: /^>(\w+)\s(. +)$/; Complete gibberish, right? It means: -look for the > sign at the beginning of a FASTA formatted sequence file -dump the first word (\w+) into variable $1 ( the sequence ID ) -after a space, dump the rest of the line (.+), until you reach the end of line $, into variable $2 ( the description )

$mystring = "[2004/04/13] The date of this article."; if($mystring =~ m/(\d)/) { print "The first digit is $1."; } if($mystring =~ m/(\d+)/) { print "The first number is $1."; } if($mystring =~ m/(\d+)\/(\d+)\/(\d+)/) { print "The date is $1-$2-$3"; } while($mystring =~ m/(\d+)/g) { print "Found number $1."; = ($mystring =~ m/(\d+)/g); print

Download and install programs Unzip or untar unzip If file.tar.gz, tar xvfz file.tar.gz Go to the directory and “./configure” Then “make”

System subroutine system ("ls –ltr");

print "Please input file name:\n"; my $fname = ; = ReadFasta($fname); my $len = $#dnas + 1; for (my $i = 0; $i < $len; $i++) { for (my $j = $i+1; $j < $len; $j++) { for (my $k = $j+1; $k < $len; $k++) { $fname = "$i\_$j\_$k"; print $fname; open(OUT, ">$fname"); print OUT $dnas[$i]; print OUT $dnas[$j]; print OUT $dnas[$k]; close OUT; system ("./clustalw2 $i\_$j\_$k"); }

Perl debugger perl –d program arguments n: next line s: step in r: run until the end of the current sub, repeat c: continue to the next breakpoint

Check source l List next several lines l 8-10 List line 8-10 l 100 List line 100 l subname List subroutine subname f restrcit.pl Switch to view restrict.pl

Breakpoint b 100 Add a breakpoint at line 100 of the current file b subname Add a breakpoint at this subroutine B Remove a break point B 100 will remove a breakpoint at line 100 B * will remove all breakpoints

See variable p $var Print the value of the variable y var Display my variable V display variables V var w $var Watch this var, stop when the value is changed

Finding Regulatory Motifs in DNA Sequences

Motifs and Transcriptional Start Sites gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC

Motif Logo Motifs can mutate on non important bases The five motifs in five different genes have mutations in position 3 and 5 Representations called motif logos illustrate the conserved and variable regions of a motif TGGGGGA TGAGAGA TGGGGGA TGAGAGA TGAGGGA

meme.sdsc.edu/meme/cgi-bin/meme.cgi

Sequence Alignment

MSCS 230: Bioinformatics I - Pairwise Sequence Alignment 87 Global Alignment F(i-1,j-1)F(i,j-1) F(i-1,j)F(i,j) s(x i,y j ) d d Move ahead in both x i aligned to gap y j aligned to gap While building the table, keep track of where optimal score came from, reverse arrows

88 Example HEAGAWGHEE P A-16 W-24 H-32 E-40 A-48 E-56 AEGHW A E6-30 H P W-3 15

89 Completed Table HEAGAWGHEE P A W H E A E

A variant of the basic algorithm: Maybe it is OK to have an unlimited # of gaps in the beginning and end: CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG Then, we don’t want to penalize gaps in the ends

The Local Alignment Recurrence The largest value of s i,j over the whole edit graph is the score of the best local alignment. The recurrence: 0 s i,j = max s i-1,j-1 + δ (v i, w j ) s i-1,j + δ (v i, -) s i,j-1 + δ (-, w j ) Notice there is only this change from the original recurrence of a Global Alignment

Affine Gap Penalties In nature, a series of k indels often come as a single event rather than a series of k single nucleotide events: Normal scoring would give the same score for both alignments This is more likely. This is less likely.

Affine gaps  (n) = d + (n – 1)  e | | gap gap open extend To compute optimal alignment, d e  (n)