Functions Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.

Slides:



Advertisements
Similar presentations
JQuery MessageBoard. Lets use jQuery and AJAX in combination with a database to update and retrieve information without refreshing the page. Here we will.
Advertisements

Dictionaries (aka hash tables or hash maps) Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
I/O means Input and Output. One way: use standard input and standard output. To read in data, use scanf() (or a few other functions) To write out data,
Loops continued and coding efficiently Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Python Basics: Statements Expressions Loops Strings Functions.
Introduction to Python
RAPTOR Syntax and Semantics By Lt Col Schorsch
Recursion. Recursion is a powerful technique for thinking about a process It can be used to simulate a loop, or for many other kinds of applications In.
Functions as Arguments, Sorting Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Lucene Part3‏. Lucene High Level Infrastructure When you look at building your search solution, you often find that the process is split into two main.
Lists Introduction to Computing Science and Programming I.
Program Design and Development
Guide To UNIX Using Linux Third Edition
Main task -write me a program
Group practice in problem design and problem solving
REPETITION STRUCTURES. Topics Introduction to Repetition Structures The while Loop: a Condition- Controlled Loop The for Loop: a Count-Controlled Loop.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
Lists in Python.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 6 Value- Returning Functions and Modules.
Guide to Programming with Python Chapter Seven (Part 1) Files and Exceptions: The Trivia Challenge Game.
Python Programming Chapter 6: Iteration Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
If statements while loop for loop
Python Programming Using Variables and input. Objectives We’re learning to build functions and to use inputs and outputs. Outcomes Build a function Use.
Functions, Procedures, and Abstraction Dr. José M. Reyes Álamo.
Python uses boolean variables to evaluate conditions. The boolean values True and False are returned when an expression is compared or evaluated.
CSC 110 Using Python [Reading: chapter 1] CSC 110 B 1.
Lecture 26: Reusable Methods: Enviable Sloth. Creating Function M-files User defined functions are stored as M- files To use them, they must be in the.
Guide to Programming with Python Chapter Seven Files and Exceptions: The Trivia Challenge Game.
Files Tutor: You will need ….
GE3M25: Computer Programming for Biologists Python, Class 5
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Efficiently Solving Computer Programming Problems Doncho Minkov Telerik Corporation Technical Trainer.
Today… Modularity, or Writing Functions. Winter 2016CISC101 - Prof. McLeod1.
NXT File System Just like we’re able to store multiple programs and sound files to the NXT, we can store text files that contain information we specify.
Getting Started With Python Brendan Routledge
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Part 1 Learning Objectives To understand that variables are a temporary named location to store data and that programmers work with different data types.
CSC 1010 Programming for All Lecture 5 Functions Some material based on material from Marty Stepp, Instructor, University of Washington.
Introduction to Programming
IGCSE 4 Cambridge Data types and arrays Computer Science Section 2
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Containers and Lists CIS 40 – Introduction to Programming in Python
Topics Introduction to Repetition Structures
Functions CIS 40 – Introduction to Programming in Python
Topics Introduction to Repetition Structures
Dictionaries GENOME 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Python I/O.
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes for 2010: I skipped slide 10. This is.
Topics Introduction to File Input and Output
Functions, Procedures, and Abstraction
File Handling.
Data Structures – 1D Lists
Fundamentals of Data Structures
ARRAYS 1 GCSE COMPUTER SCIENCE.
Introduction to Python: Day Three
Introduction to Programming
More for loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Topics Introduction to Value-returning Functions: Generating Random Numbers Writing Your Own Value-Returning Functions The math Module Storing Functions.
Stata Basic Course Lab 2.
Topics Introduction to File Input and Output
Introduction to Programming
Topics Introduction to File Input and Output
Functions, Procedures, and Abstraction
Introduction to Computer Science
Presentation transcript:

Functions Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein

A quick review  Dictionaries:  key:value pairs  a.k.a. hash tables, lookup tables  Examples:  Word and definition  Name and phone number  Gene name and score  Username and password  Dictionaries are useful when you want to look up some data (value) based on a key  Each key can appear only once

0 val1 1 val2 2 val3 3 val4 4 val5 … max last_val by value: is myVal == val1 ? is myVal == val2 ? is myVal == val3 ? is myVal == val4 ? is myVal == val5 ? is myVal == last_val ? 0 val1 1 val2 2 val3 3 val4 4 val5 … max last_val by index: 4 (index points directly to position in memory) Note: dictionary and list access times  Accessing a list by index is very fast!  Accessing a dictionary by key is very fast!  Accessing a list by value (e.g. list.index(myVal) or list.count(myVal)) can be SLOW.

import sys matrixFile = open(sys.argv[1], "r") matrix = [] # initialize empty matrix line = matrixFile.readline().strip() # read first line stripped while len(line) > 0: # until end of file fields = line.split("\t") # split line on tabs, giving a list of strings intList = [] # create an int list to fill for field in fields: # for each field in current line intList.append(int(field)) # append the int value of field to intList matrix.append(intList) # after intList is filled, append it to matrix line = matrixFile.readline().strip() # read next line and repeat loop matrixFile.close() for row in matrix: # go through the matrix row by row for val in row: # go through each value in the row print val, # print each value without line break print "" # add a line break after each row … and think how much you've learned! 4 weeks ago, this would have been gibberish: Take a deep breath …

In theory, what you know so far allows you to solve any computational task (“universality”) So … why don’t we stop here?

most real-life tasks will be (very) painful to solve using only what you know so far...

What are we missing?  A way to generalized procedures …  A way to store and handle complex data …  A way to organize our code …  Better design and coding practices …

TIP OF THE DAY Code like a pro … How to approach a computational task:

Pseudo-code Debug prints Design principles Variable Naming Assessing efficiency Readability Commenting Code recycling “Dry runs” Modules Hungarian notation Incremental coding TIP OF THE DAY Code like a pro … Think DesignImproveDebugCode Have a beer How to approach a computational task:

Functions

Why functions?  Reusable piece of code  write once, use many times  Within your code; across several codes  Helps simplify and organize your program  Helps avoid duplication of code

What a function does?  Takes defined inputs (arguments) and may produce a defined output (return) things happen stuff goes in (arguments) other stuff comes out (return)  Other than the arguments and the return, everything else inside the function is invisible outside the function (variables assigned, etc.). Black box!  The function doesn't need to have a return.  Spoiler: The arguments can be changed and changes are visible outside the function

import math def jc_dist(rawdist): if rawdist 0.0: newdist = (-3.0/4.0) * math.log(1.0 - (4.0/3.0)* rawdist) return newdist elif rawdist >= 0.75: return else: return 0.0 def ( ): Defining a function define the function and argument(s) names Do something return a computed value

import sys dist = sys.argv[1] correctedDist = jc_dist(dist) Using (calling) a function

Once you've written the function, you can forget about it and just use it!

import sys dist = sys.argv[1] correctedDist = jc_dist(dist) AnotherDist = AnotherCorrectedDist = jc_dist(AnotherDist) OneMoreCorrectedDist = jc_dist(0.63) Using (calling) a function

import sys import math rawdist = float(sys.argv[1]) if rawdist 0.0: newdist = (-3.0/4.0) * math.log(1.0 - (4.0/3.0)* rawdist) print newdist elif rawdist >= 0.75: print else: print 0.0 From “In-code” to Function import sys import math def jc_dist(rawdist): rawdist = float(sys.argv[1]) if rawdist 0.0: newdist = (-3.0/4.0) * math.log(1.0 - (4.0/3.0)* rawdist) return newdist elif rawdist >= 0.75: return else: return 0.0 Add a function definition delete - use function argument instead of argv return value rather than printing it Jukes-Cantor distance correction written directly in program: Jukes-Cantor distance correction written as a function:

math.log(value) readline(), readlines(), read() sort() split(), replace(), lower() We've used lots of functions before!  These functions are part of the Python programming environment (in other words they are already written for you).  Note - some of these are functions attached to objects (and called object "methods") rather than stand-alone functions. We'll cover this later.

Function names, access, and usage  Giving a function an informative name is very important! Long names are fine if needed: def makeDictFromTwoLists(keyList, valueList): def translateDNA(dna_seq): def getFastaSequences(fileName):  For now, your function will have to be defined within your program and before you use it. Later you'll learn how to save a function in a module so that you can load your module and use the function just the way we do for Python modules.  Usually, potentially reusable parts of your code should be written as functions.  Your program (outside of functions) will often be very short - largely reading arguments and making output.

import sys myFile = open(sys.argv[1], "r") # make an empty dictionary scoreDict = {} for line in myFile: fields = line.strip().split("\t") # record each value with name as key scoreDict[fields[0]] = float(fields[1]) myFile.close() Below is part of the program from a sample problem last class. It reads key - value pairs from a tab-delimited file and makes them into a dictionary. Rewrite it so that there is a function called makeDict that takes a file name as an argument and returns the dictionary. Use: scoreDict = makeDict(myFileName) seq seq seq seq seq seq seq etc. Here's what the file contents look like: Sample problem #1

import sys def makeDict(fileName): myFile = open(fileName, "r") myDict = {} for line in myFile: fields = line.strip().split("\t") myDict[fields[0]] = float(fields[1]) myFile.close() return myDict myFileName = sys.argv[1] scoreDict = makeDict(myFileName) Solution #1

import sys def makeDict(fileName): myFile = open(fileName, "r") myDict = {} for line in myFile: fields = line.strip().split("\t") myDict[fields[0]] = float(fields[1]) myFile.close() return myDict myFileName = sys.argv[1] scoreDict = makeDict(myFileName) Two things to notice here: - you can use any file name (string) when you call the function - you can assign any name to the function return (in programming jargon, the function lives in its own namespace) Solution #1 name used inside function name used to call function name used inside function Assign the return value

Write a function that mimics the.readlines() method. Your function will have a file object as the argument and will return a list of strings (in exactly the format of readlines()). Use your new function in a program that reads the contents of a file and prints it to the screen. You can use other file methods within your function, and specifically, the method read() - just don't use the.readlines() method directly. Note: This isn't a useful function, since Python developers already did it for you, but the point is that the functions you write are just like the ones we've already been using. BTW you will learn how to attach functions to objects a bit later (things like the split function of strings, as in myString.split()). Sample problem #2

import sys def readlines(file): text = file.read() tempLines = text.split("\n") lines = [] for tempLine in tempLines: lines.append(tempLine + "\n") return lines myFile = open(sys.argv[1], "r") lines = readlines(myFile) for line in lines: print line.strip() Solution #2

Write a program that reads a file containing a tab-delimited matrix of pairwise distances and puts them into a 2-dimensional list of distances (floats). Have the program accept two additional arguments, which are the names of 2 sequences from the matrix, and print their distance. Here's what the file contents look like: names seq1 seq2 seq3 seq seq seq Etc. … Be sure it works with ANY matrix file with this format! The file will always be a square matrix of size (N+1) x (N+1). N for each distance and 1 row and column for names. >python dist.py matrixFile seq2 seq3 0.3 Make the matrix reading a function. Hints: use the first line to make a dictionary of names to list indices; your function should return a 2-dimensional list of floats. Challenge problem

import sys def makeMatrix(fileName): myFile = open(fileName, "r") myMatrix = [] lines = myFile.readlines() for rowIndex in range(1,len(lines)): fields = lines[rowIndex].strip().split("\t") matRow = [] for colIndex in range(1,len(fields)): matRow.append(float(fields[colIndex])) myMatrix.append(matRow) myFile.close() return myMatrix def makeNameMap(fileName): myFile = open(fileName, "r") line = myFile.readline(); myFile.close() nameMap = {} fields = line.strip().split("\t") for index in range(1,len(fields)): nameMap[fields[index]] = index - 1 return nameMap distMatrix = makeMatrix(sys.argv[1]) nameMap = makeNameMap(sys.argv[1]) print distMatrix[nameMap[sys.argv[2]]][nameMap[sys.argv[3]]] I wrote both complex parts as functions; this makes the point that once these are written and debugged, the program is simple and easy to read (the last three lines). Challenge solution looks up the argument string as the key in nameMap, which returns the index of the name in the 2-dimensional list of distance values (this could be done more efficiently - this way you open the file twice)