Subroutines and Files Bioinformatics Ellen Walker Hiram College.

Slides:



Advertisements
Similar presentations
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
Advertisements

 2005 Pearson Education, Inc. All rights reserved Introduction.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
COMP234 Perl Printing Special Quotes File Handling.
CS 330 Programming Languages 10 / 14 / 2008 Instructor: Michael Eckmann.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
11ex.1 Modules and BioPerl. 11ex.2 sub reverseComplement { my ($seq) $seq =~ tr/ACGT/TGCA/; $seq = reverse $seq; return $seq; } my $revSeq = reverseComplement("GCAGTG");
 2003 Prentice Hall, Inc. All rights reserved. Customized by Sana Odeh for the use of this class. 1 Introduction to Computers and Programming in JAVA.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
1ex.1 Perl Programming for Biology Exercise 1 The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel Aviv University, Israel March 2009 Eyal Privman.
Chapter 2: Introduction to C++.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Linux & Shell Scripting Small Group Lecture 4 How to Learn to Code Workshop group/ Erin.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Python programs How can I run a program? Input and output.
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
C Programming Lecture 4 : Variables , Data Types
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Intro and Review Welcome to Java. Introduction Java application programming Use tools from the JDK to compile and run programs. Videos at
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 2-1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
1 System Administration Introduction to Scripting, Perl Session 3 – Sat 10 Nov 2007 References:  chapter 1, The Unix Programming Environment, Kernighan.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
 Pearson Education, Inc. All rights reserved Introduction to Java Applications.
1 CSC 221: Computer Programming I Fall 2004 Lists, data access, and searching  ArrayList class  ArrayList methods: add, get, size, remove  example:
Books. Perl Perl (Practical Extraction and Report Language) by Larry Wall Perl 1.0 was released to usenet's alt.comp.sources in 1987 Perl 5 was released.
7 1 User-Defined Functions CGI/Perl Programming By Diane Zak.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
Iteration While / until/ for loop. While/ Do-while loops Iteration continues until condition is false: 3 important points to remember: 1.Initialise condition.
1 Chapter 2 C++ Syntax and Semantics, and the Program Development Process.
Lecture 5 1.What is a variable 2.What types of information are stored in a variable 3.Getting user input from the keyboard 1.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
More Strings CS303E: Elements of Computers and Programming.
Strings and Patterns in Perl Ellen Walker Bioinformatics Hiram College.
C++ String Class nalhareqi©2012. string u The string is any sequence of characters u To use strings, you need to include the header u The string is one.
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Agenda Positional Parameters / Continued... Command Substitution Bourne Shell / Bash Shell / Korn Shell Mathematical Expressions Bourne Shell / Bash Shell.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming Input and Output.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
Visual Basic Review LBS 126. VB programming Project Form 1Form 2Form 3 Text boxButton Picture box Objects Text box Button Objects.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Bioinformatics Introduction to Perl. Introduction What is Perl Basic concepts in Perl syntax: – variables, strings, – Use of strict (explicit variables)
Advanced Perl For Bioinformatics Part 1 2/23/06 1-4pm Module structure Module path Module export Object oriented programming Part 2 2/24/06 1-4pm Bioperl.
CSC 4630 Perl 3 adapted from R. E. Beck. Problem But we worked on it first: Input: Read from a text file named in a command line argument Output: List.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
L071 Introduction to C Topics Compilation Using the gcc Compiler The Anatomy of a C Program Reading Sections
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 2-1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
CSCE 206 Structured Programming in C
CS 330 Class 7 Comments on Exam Programming plan for today:
Computer Programming ||
(optional - but then again, all of these are optional)
(optional - but then again, all of these are optional)‏
Lecture Note Set 1 Thursday 12-May-05
Introduction to C Topics Compilation Using the gcc Compiler
2.1 Parts of a C++ Program.
File IO and Strings CIS 40 – Introduction to Programming in Python
Creating your first C program
Introduction to Computer Science
Presentation transcript:

Subroutines and Files Bioinformatics Ellen Walker Hiram College

Why Subroutines? Saves typing Saves potential copy/paste errors Collect common algorithm in one place for reuse

Built-In Subroutines Provide common useful functions, e.g. –Index –Length –Substr Call with arguments, –Index($string, $pat) #$string and $pat are arguments Different arguments produce different results

Finding Predefined Subroutines Textbooks (Safari Online has several) Google (include “Perl” in your string) Online documentation – is nicely searchablehttp://

How a Subroutine Works my $code = “ACA”; print length($code); print “goodbye\n”; Sub length my $string = my $length = 0; …code to count … return $length; ACA 3 “ACA”

Key Components sub name –Declares this as a subroutine and names it –Pulls the arguments out of the list (in parentheses, one at a time, left to right) –Example: somesub(“ACT”,1) – $a = ($a is “ACT) – $b = ($b is 1) return value –Ends the subroutine & gives it a value

Example (p. 122) # find all GC-rich 4-7mers and determine their complements my $GCmatch; while ($someDNA =~m/([GC]{4,7})/g ){ $GCmatch = $1; print “5’ $GCmatch 3’\n\n”; $compl = complement($GCmatch); print “3’ $compl 5’”\n”; }

Subroutine (p. 123) #book version has good documentation sub complement { my $dna = #get first arg my $anti = $dna; $anti =~ tr/ACGTacgt/TGCAtgca/; return $anti; }

Download These (Ch. 7) Counting nucleotides –countNucleotides( $str, “C”); –countNucleotides( $str, “[CG]”); Printing sequences with fixed line width –printSequence($str, 80);

Variable Scope Variables exist from when they are declared (“my”) until the end of the block (closing brace). Variables in subroutines exist only during the subroutine Each call to a subroutine re-initializes the variables

Files and Programs Files are stored on the computer’s hard drive and maintained by the operating system. Programs are connected to files via special subroutines –“open” creates a file handle –“close” releases the file (important!)

Basic File Manipulation Open a file and read –my $HANDLE; –open ($HANDLE, ‘<‘, $filename); –$line = ; Open a file and write –My $HANDLE; –open($HANDLE, ‘>’, $filename); –print $HANDLE “Hello world!”; Close a file –close($HANDLE);

Allowing for Errors If you try to read a file that doesn’t exist, or write a file that does, the open() command will return false The rest of your program won’t work. To fix this add: or die(“some message $file :$!”) to the end of the command ($! Contains the system error messages)

Complete Open Examples open ($HANDLE, ‘<‘, $filename) or die(“Cannot open file: $filename: $!); open ($HANDLE, ‘>‘, $filename) or die(“Cannot write file: $filename: $!);

Reading lines Subroutine chomp removes the ‘\n’ character at the end of each line $line = puts the next line in $line When there are no more lines, the result is false Example: put the whole file in one sequence while ($line = ) { chomp $line $seq = $seq. $line }

Printing to a file The print commands (print and printf) can optionally be followed with a file handle before the string to print Examples: –print $HANDLE “Hello\n”; –printf $HANDLE “GC percent is %.1f\n”, $GCcount * / $total;

Subroutine to read FASTA formatted file (p. 141) Returns sequence as one long string Removes whitespace, lines that begin with # (comments), and all digits ReadInDNA

FASTA File Format One header line, begins with > Many lines of text, sometimes capitalized, sometimes with spaces after every n characters (ReadInDNA handles these variations)

Getting a FASTA File Go to NCBI Search for what you want and download the file to your current machine Send the file to your directory of cs.hiram.edu (Demo to be provided)

Assignment Using subroutines from your text, determine the GC content of the given genomes. (Examples to be provided)