Download presentation
Presentation is loading. Please wait.
Published byDoris Crawford Modified over 9 years ago
1
Introduction to PERL Genetics 875 2012 PERL is a language that is easy to use and was designed to do certain tasks (like reading, writing, moving text & sequences) very well Advantages of PERL: - it ’ s intuitive - it ’ s easy to get started … you don ’ t need to know everything initially - it ’ s very good at reading and manipulating files, sequences, text - there is usually >1 way to accomplish a task Disadvantages of PERL: Perl programs are different from other programs, in that the program you write is “ run ” by another program which interprets your code (this interpreter is actually called perl … your programs will be run by the perl interpreter). Because of this, your code is one level removed from the actual computer … Therefore, perl programs are slower than other languages (like C, C++, and Java). Thus, perl is not used so much for functions that require heavy computation. 1
2
A great way to learn PERL: http://www.oreilly.com/catalog/lperl3/ “ Learning Perl ” Also, some great online resources: http://www.perl.com/ A short PERL tutorialorial http://archivehttp://archive.ncsa.uiuc.edu/General/Training/PerlIntro/ And lots of other help on the web …. 2
3
Like any language, programming languages have structure A book has words, sentences, paragraphs, chapters, and punctuation linking them all together ENGLISHPERL NounScalar, variable VerbFunction, command PhraseStatement, expression ParagraphLoop ChapterSubroutines, packages, modules 3
4
A variable is a container that can hold information that has the potential to vary Variables can be singular, in which case they are identified by a “ $ ” in front of the variable name eg) $x $File1 $StudentName They can be a number, letter (called a “ character ” in perl), string of numbers, or string of letters … just remember that whatever it is, it is considered a single item. eg) $x = 5 $motif = “ GATTAC ” $StudentName = “ Rutabega ” Variables can be plural and those come in different forms: Arrays and Hashes 4
5
Arrays are a list of single variables An array is a container that holds a list of separate, single variables in a specific order An array is denoted by a @ in front of its name Eg) @StudentNames = ( “ Caligula ”, “ Randolph ”, “ Imelda ” ); “ Caligula ” is stored in the first ‘ cell ’ of the array, which is the “ 0 ” cell “ Randoph ” is stored in the second ‘ cell ’ of the array, which is the “ 1 ” cell “ Imelda ” is stored in the third ‘ cell ’ of the array, which is the “ 2 ” cell ** Note that in programming languages, you always start counting at “ 0 ” instead of at “ 1 ” Position in array: 0 1 2 Value stored at that position: Caligula Randolph Imelda 5
6
Arrays are a list of single variables An array is a container that holds a list of separate, single variables in a specific order An array is denoted by a @ in front of its name Eg) @StudentNames = ( “ Caligula ”, “ Randoph ”, “ Imelda ” ); Position in array: 0 1 2 Value stored at that position: Caligula Randoph Imelda You can ‘ call ’ a specific cell (which, remember, is a singular variable identified by $): $StudentNames[0] = “ Caligula ” $StudentNames[1] = “ Randolph ” This $ tells perl that you want a singular variable These brackets tell perl that you are looking at a single cell in an array Between these two parts of the name, perl knows this is a cell of an array 6
7
Exercise 0: Write your first perl program! We will start by creating a simple perl program where you will print a string to the screen. A few things about writing perl programs: -- The first few lines of the program (which you ’ ll write in a.pl text file) will contain information for the computer about how to run the program -- In order for the perl interpreter to understand your code, you must use the right syntax. Like in English, each phrase must have an obvious start and stop point. The most common punctuation in perl is “ ; ” which acts like a period does in English. A statement begins after the “ ; ” from the previous statement, and ends at the next “ ; ” There will be other kinds of punctuation which define statements/items, like (….) {……} “ ….. ” and we ’ ll get to these in a bit. One useful punctuation is # which means “ Don ’ t read this line of the file ” – it is useful because you can type in notes to yourself ( “ comments ” ) that aren ’ t part of the code.
8
Exercise 0: Write your first perl program! 1.Open the terminal on your computer and go to the desktop use the unix command “ cd ” to change directories (type everything written in brown) cd Desktop 2.We will use the text editor “ emacs ” to create and write your file emacs FirstProgram.pl This should open a blank file, since you just created it 3.Type this in the first line of your.pl file: #!/usr/bin/perl This is a special magic command that tells the computer to use the perl interpreter to read and execute your program. 4.We will use a special mode of perl called “ strict. ” To do that, type this on the second line of your.pl file: use strict; 5.Save your file using the emacs command, “ Ctrl x Ctrl s ” (ie, hold down Ctrl key and hit x then s) You are now ready to start writing your own code! 8
9
Exercise 0: Write your first perl program! 6. Print a sentence to the screen using the built-in perl “ print ” function print “ Hello. Welcome to your first perl program \n ” ; The default for the print function is to print to the screen from where you ran the program. We will learn later how you can print to a file. Note the “ \n ” at the end of this print statement. “ \n ” stands for “ new-line character ” This “ \n ” adds a ‘ return ’ to the end of your statement to end the line You ’ ve now written your first perl program. To run your program, open another terminal window. You will call the perl interpreter and then feed it your program file name perl FirstProgram.pl 7. Save your file using the emacs command, “ Ctrl x Ctrl s ” (ie, hold down Cntrl key and hit x then s) You will either see your sentenced on the screen, or you will get some kind of error … 9
10
#!/usr/bin/perl use strict; print “ Hello. Welcome to your first perl program \n ” ; 10
11
Exercise 1: Modify your first perl program We will create and define a string variable and an array. 6.You will add code to your existing program. Make a variable called Name: my $Name; *since we are using “ strict ” mode, you must define a variable before you use it … for whatever reason, you do that by typing “ my ” in front of the variable, only when you create the variable (ie. The first time you ever type it) 7.Define the variable $Name to be your own name: $Name = “ Audrey ” ; 8.Create an array called FavoriteHolidays my @FavoriteHolidays; Define the array as your top 3 favorite holidays, exactly as below: @FavoriteHolidays = ( “ Halloween ”, “ Christmas ”, “ Arbor Day ” ); 11
12
Exercise 1: Modify your first perl program 10. Print the variables you just defined to the screen using the built-in perl “ print ” function print “ The top favorite holiday for $Name is $FavoriteHolidays[0]\n ” ; 11. Save your program by typing Ctrl x Ctrl s 12. Exit the program by typing Ctrl x Ctrl c You will either see your name and holiday, or you will get some kind of error … To run your program, open another terminal window. You will call the perl interpreter and then feed it your program file perl FirstProgram.pl We will create and define a string variable and an array. 12
13
#!/usr/bin/perl use strict; print “ Hello. Welcome to your first perl program \n ” ; my $Name; $Name = "Audrey"; my @FavoriteHolidays; @FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day"); print "The top favorite holiday for $Name is $FavoriteHolidays[0]\n"; 13
14
Hashes are fancy containers for single variables Whereas an array indexes variables by their position in the list: A hash indexes one variable by another (known as a ‘ key ’ ): for example, Name and hometown Key in hash: Caligula Randolph Imelda Value stored with that key: Rome Berlin Manila A hash is denoted by %. To call the individual values contained in the hash, you need the key name my %HomeTowns; $HomeTowns{ “ Caligula ” } = “ Rome ” Position in array: 0 1 2 Value stored at that position: Caligula Randoph Imelda $ for calling single variable curly brackets tell you it ’ s a hash
15
Exercise 2: Create and use a Hash 1. You will add code to your existing program. Make a hash called HolidayMonth: my %HolidayMonth; 2.Define the Hash, with the key = holiday and the stored value = the month $HolidayMonth{ “ Halloween ” } = “ October ” ; $HolidayMonth{ “ Christmas ” } = “ December ” ; $HolidayMonth{ “ Arbor Day ” } = “ April ” ; 3. Print the month of the top holiday print “ The top favorite holiday for $Name is $FavoriteHolidays[0] in $HolidayMonth{Halloween} \n ” ; 15
16
#!/usr/bin/perl use strict; print “ Hello. Welcome to your first perl program \n ” ; my $Name; $Name = "Audrey"; my @FavoriteHolidays; @FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day"); my %HolidayMonth; $HolidayMonth{ “ Halloween ” } = “ October ” ; $HolidayMonth{ “ Christmas ” } = “ December ” ; $HolidayMonth{ “ Arbor Day ” } = “ April ” ; print “ The top favorite holiday for $Name is $FavoriteHolidays[0] in $HolidayMonth{Halloween} \n ” ; 16
17
Perl has a lot of built in functions and ‘ operators ’ + means add $x + 5; is 7 - means subtract$y – 3; is 0 * means multiply$x * 3; is 6 / means divide($x*3)/2 is 3 ++ means increase by 1$y++; is 4 = assignment operator (set a variable to = something) = = is to evaluate equality There are different operators for strings: $x = 123 $y = 456 $z = 3. means concatenate two strings $x. $y; is 123456 x means replicate a string$z x 4; is 3333 eq evaluates string equality These things work on numbers. $x = 2; $y = 3; 17
18
Conditional statement Often you only want to do something if a certain condition is true. This is a case for if/unless/else statements If $x is equal to 5, then do something translates to if ($x = = 5) { something …. } 18
19
Conditional statement Often you only want to do something if a certain condition is true. This is a case for if/unless/else statements If $x is equal to 5, then do something translates to if ($x = = 5) { something …. } Parentheses define the start and stop of the condition = = means if $x is exactly equal to 5 If you type if ($x = 5) it will reset $x to be 5 and the statement is automatically true Curly brackets define what to do if the conditional statement is true. 19
20
Conditional statement Can also use if-then-else statements: if ($x = = 5) { something …. }else { do something different … } if ($x = = 5) { something …. }elsif ($x<10) { do something different … } OR The program will evaluate the statement in ( …) – if true, it will do what ’ s in {..} if false it will SKIP what ’ s in { … } and resume on the line after that section. 20
21
Conditional statement The ‘ while ’ statement is useful: do something while (some condition is true). my $count = 0; while ($count < 100) { do some function … $count++; ) The ‘ while ’ statement turns out to be very useful for reading in files … Remember that ++ is the “ increment by one ” operator. So each time you go through the loop, $count increases by one. If you forget to increase count and it stays at 0, you will be in an infinite loop. Note that a while statement is a kind of loop … 21
22
Repeating actions: Loops Very often, want to repeat the same function many times (often on different variables). For example: -- open a file of microarray data -- read in each line of the file -- divide the 3 rd cell of data by some constant -- save the file for (my $i = 0; $i<10; $i++) { do something … } There are 3 components of a “ for loop ” : Here $i acts as a counter 22
23
Repeating actions: Loops Very often, want to repeat the same function many times (often on different variables). For example: -- open a file of microarray data -- read in each line of the file -- divide the 3 rd cell of data by some constant -- save the file for (my $i = 0; $i<10; $i++) { do something … } create a new variable to use as a counter usually start that counter off at 0 do whatever as long as $i < 10 after each loop, increment $I by one (using the ++ operator ) 23
24
Repeating actions: Loops Very often, want to repeat the same function many times (often on different variables). For example: -- open a file of microarray data -- read in each line of the file -- divide the 3 rd cell of data by some constant -- save the file for (my $i = 0; $i<10; $i++) { do something … } create a new variable to use as a counter usually start that counter off at 0 do whatever as long as $i < 10 after each loop, increment $I by one (using the ++ operator ) An important concept: scope – if you create a variable inside a loop, it is a “ local ” variable = it only exists while you ’ re in the loop (in this case, $I is a local variable). If you want a variable that is “ global, ” ie. it exists for the duration of the program, be sure to declare it outside of any loops. 24
25
#!/usr/bin/perl use strict; print “ Hello. Welcome to your first perl program \n ” ; my $Name; $Name = "Audrey"; my @FavoriteHolidays; @FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day"); my %HolidayMonth; $HolidayMonth{ “ Halloween ” } = “ October ” ; $HolidayMonth{ “ Christmas ” } = “ December ” ; $HolidayMonth{ “ Arbor Day ” } = “ April ” ; for (my $i=0; $i<3; $i++) { print “ Number $i favorite holiday for $Name is $FavoriteHolidays[$i]; } Exercise 3: using loops 25
26
File Handling: talking to the outside world can open existing files to read in data and can create new files to write to using “ open ” open (HANDLE, “ FileName.txt ” ) shorthand file handle actual file name … default is read-only file 26
27
File Handling: talking to the outside world can open existing files to read in data and can create new files to write to using “ open ” open (HANDLE, “ >FileName.txt ” ) shorthand file handle actual file name this “ > ” means it ’ s a writable file Create a new file and print to it to save your data open (SF, “ >SaveFile.txt ” ); print SF “ $x ” ; 27
28
#!/usr/bin/perl use strict; print “ Hello. Welcome to your first perl program \n ” ; my $Name; $Name = "Audrey"; my @FavoriteHolidays; @FavoriteHolidays = ("Halloween", "Christmas", "Arbor Day"); my %HolidayMonth; $HolidayMonth{ “ Halloween ” } = “ October ” ; $HolidayMonth{ “ Christmas ” } = “ December ” ; $HolidayMonth{ “ Arbor Day ” } = “ April ” ; open (SF, “ >SaveFile.txt ” ); for (my $i=0; $i<3; $i++) { print SF “ Number $i favorite holiday for $Name is $FavoriteHolidays[$i]\n ” ; } Exercise 4: print results to a file Notice how I had to create SF outside the loop so that the file is globally accessible. 28
29
Reading in a file: combining file handling and the while statement open (FILE, “ FileName.txt ” ) while (my $line = ) { print “ $line\n ” ; } 29
30
Reading in a file: combining file handling and the while statement open (FILE, “ FileName.txt ” ) while (my $line = ) { print “ $line\n ” ; } create a variable to hold each line of the file is the line input operator … reads each line in a file while there are more lines in FILE Another useful thing: STDIN is the standard way of getting information from the the user and it tells the program to wait until the user enters some information. Here ’ s an example: print “ Hello user. What is your favorite color: ” ; my $answer = ; chomp($answer) When the user enters the data, a \n (return) character will be stuck onto the end of what perl takes as input. Usually, you don ’ t want that so you can use the ‘ chomp ’ function, which cuts the last character off of a string. You would probably want to do this on $line in the example above as well.
31
Regular expressions: comparing sequences These are some of the most useful functions in PERL. They allow you to easily scan your sequence, search for substrings, transpose, etc. =~ is the operator for doing regular expressions. =~ m is the match operator … used to search for a match to some sequence $sequence = “ CCATATAGAGATGAGCCTATA ” ; if ($sequence =~ m/GATGAG/) { print “ sequence contains GATGAG\n ” ; } 31
32
Regular expressions: comparing sequences =~ s is the swap operator … used to swap one word for another $sequence = “ CCATATAGAGATGAGCCTATA ” ; sequence =~ s/GATGAG/nnnnnn/; This will convert the sequence CCATATAGAGATGAGCCTATA to CCATATAGAnnnnnnCCTATA 32
33
=~ tr is the transpose operator … used to transpose one character into another Regular expressions: comparing sequences $sequence =~ tr/GATC/CTAG/; This function is useful to use in conjunction with the built-in “ reverse ” function. my $sequence = “ GGATCCAA ” ; my $newsequence = reverse($sequence); #newseq is now AACCTAGG $newsequence =~ tr/GATC/CTAG/; # newseq is now TTGGATCC $newsequence is now the reverse complement of $sequence 33
34
Exercise 4: open and read a Fasta file 1.Create a new file called ReadFasta.pl emacs ReadFile.pl 2.Type the usual stuff at the top of the file #!/usr/bin/perl use strict; 3.Open the file upstream.fasta and read in the data using the ‘ while ’ statement open (FILE, “ upstreams.fasta ” ); while (my $line = ) { print “ line = $line\n ” ; } 4.Save the file: Ctrl x s 5.Run the file: perl ReadFasta.pl 34
35
#!/usr/bin/perl use strict; open (FILE, “ upstreams.fasta ” ); while (my $line = ) { print “ line is $line\n ” ; } 35
36
Exercise 4: open and read a Fasta file 6.You will store the fasta sequence data in a Hash. Go back into your program and create a hash to hold the FASTA sequence. Then create a scalar $gene to hold gene name my %Fasta; my $gene; 7.In the while statement, evaluate each line to see if it is Name or Sequence. A fasta file has >NAME\n followed by sequence if ($line =~ m/>/) { $gene = $line; } 8. Now you know that the subsequent lines must be sequence. Store that in the hash else { $Fasta{$gene} = $Fasta{$gene}. $line; } Note what we are doing: we expect >NAME to come before sequence … but the sequence could extend for multiple lines in the file. Therefore, we need to concatenate sequence from multiple lines, hence the “. ” operator to concatenate strings. 36
37
37 >YNL313C TATGTATATGCTTAAACTAGCCTGTTCTAGATAGTCGCTATCGATTTTGCCACATTA CCACCTTAAGTTGATATAATATTGCTTATTATAAAGGAAAGAACGCGTTTCCTAAC TTCGTATATGGCGATAATTATCTAAGAAACTTCGCATCGTGAAAAAAAAGATGAAA AAAATGGAAGCTCATCGAGGCCAAAGGAATTGCTAAAAAGAAGCTATCAGACCAGG AAGTAAACTAGTGGTTGCAAAATT For Line 1: $line will contain >, so $gene gets set to >YNL313C (and remains this until $gene is reset) For Line 2: $line will NOT contain > and is therefore assumed to be sequence so, $Fasta{$gene} = $line at Line 2 For Line 3: $line will NOT contain > … BUT if $Fasta{$line} = $line at Line 3 then will LOSE previous sequence so … concatonate with previous sequence: $Fasta{$gene} = $Fasta{$gene}. $line; (remember right side gets evaluated FIRST then left side gets set equal to it)
38
Exercise 4: open and read a Fasta file Next, you ’ ll search through each upstream sequence for each gene for a consensus sequence. 9. We need a way to search through all of the sequences, indexed by genes. We will use the “ foreach ” method of looping. Because the elements of a hash are not stored in any special order, we will use a way to step through each ‘ key ’ in the hash. foreach my $g (keys %Fasta) { print “ gene is $g and sequence is $Fasta{$g}\n ” ; } 38
39
#!/usr/bin/perl use strict; open (FILE, “ upstreams.fasta ” ); my %Fasta; my $gene; while (my $line = ) { if ($line =~m />/) { $gene = $line; } else { $Fasta{$gene} = $Fasta{$gene}. $line; } 39
40
#!/usr/bin/perl use strict; open (FILE, “ upstreams.fasta ” ); my %Fasta; my $gene; while (my $line = ) { if ($line =~m />/) { $gene = $line; } else { $Fasta{$gene} = $Fasta{$gene}. $line; } foreach my $g (keys %Fasta) { print “ gene is $g and sequence is $Fasta{$g}\n ” ; } 40
41
Exercise 4: open and read a Fasta file Next, you ’ ll search through each upstream sequence for each gene for a consensus sequence. You will make a new hash to store the sequence matches. 10. First create the new hash, %Matches my %Matches; Next, within your loop … search each upstream sequence for the motif, GATGC 11.If there is a match, set the value to GATGC …. else set the value to “ no match ” foreach my $g (keys %Fasta) { if ($Fasta{$g} =~ m/GATGC/i) { $Matches{$g} = “ GATGC ” ; } else { $Matches{$g} = “ no match ” } this little i means do a case-insensitive search 41
42
#!/usr/bin/perl use strict; open (FILE, “ upstreams.fasta ” ); my %Fasta; my $gene; while (my $line = ) { if ($line =~ m/>/) { $gene = $line; } else { $Fasta{$gene} = $Fasta{$gene}. $line; } my %Matches; foreach my $g(keys %Fasta) { if ($Fasta{$g} =~ m/GATGC/i) { $Matches{$g} = “ GATGC ” ; print “ $g contains GATGC\n ” ; } else { $Matches{$g} = “ no matches ” ; } 42
43
#!/usr/bin/perl use strict; open (FILE, “ upstreams.fasta ” ); my %Fasta; my $gene; while (my $line = ) { if ($line =~ m/>/) { $gene = $line; } else { $Fasta{$gene} = $Fasta{$gene}. $line; } open (SAVEFILE, “ >YGR136W_output.txt ” ); my %Matches; foreach my $g (keys %Fasta) { if ($Fasta{$g} =~ m/GATGC/i) { $Matches{$g} = “ GATGC ” ; print “ $g contains GATGC\n ” ; print SAVEFILE “ $g\tGATGC\n ” ; #\t means tab, \n means return } else { $Matches{$g} = “ no matches ” ; } 43
44
Exercise 4: open and read a Fasta file Finally, save the results in the Matches hash to a new file. Create and open a savefile, Matches.txt: open (SF, “ >Matches.txt ” ); 14.Step through the hash and print the gene and match information to the file foreach my $g (keys %Matches) { print SF “ $g … $Matches{$g}\n ” ; } Save the file Ctrl x Ctrl s 16.Run the program from the command line perl ReadFasta.pl 44
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.