MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:

Slides:



Advertisements
Similar presentations
Python Programming Chapter 5: Fruitful Functions Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Advertisements

Introduction to Programming Lesson 1. Objectives Skills/ConceptsMTA Exam Objectives Understanding Computer Programming Understand computer storage and.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
5-1 Flow of Control Recitation-01/25/2008  CS 180  Department of Computer Science  Purdue University.
C Lecture Notes 1 Program Control (Cont...). C Lecture Notes 2 4.8The do / while Repetition Structure The do / while repetition structure –Similar to.
Structured programming
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 4 High-level Programming with Python Part I: Controlling the flow of your.
Intro to Robots Conditionals and Recursion. Intro to Robots Modulus Two integer division operators - / and %. When dividing an integer by an integer we.
INTRODUCTION TO PYTHON PART 3 - LOOPS AND CONDITIONAL LOGIC CSC482 Introduction to Text Analytics Thomas Tiahrt, MA, PhD.
COMPE 111 Introduction to Computer Engineering Programming in Python Atılım University
Python – Part 4 Conditionals and Recursion. Modulus Operator Yields the remainder when first operand is divided by the second. >>>remainder=7%3 >>>print.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
CIS Computer Programming Logic
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Introduction. 2COMPSCI Computer Science Fundamentals.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Analysis of Algorithms These slides are a modified version of the slides used by Prof. Eltabakh in his offering of CS2223 in D term 2013.
 Expression Tree and Objects 1. Elements of Python  Literals, Strings, Tuples, Lists, …  The order of file reading  The order of execution 2.
Fall Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
1 Analysis of Algorithms CS 105 Introduction to Data Structures and Algorithms.
Functions, Procedures, and Abstraction Dr. José M. Reyes Álamo.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 5: Introduction to C: More Control Flow.
Algorithm Design.
Python uses boolean variables to evaluate conditions. The boolean values True and False are returned when an expression is compared or evaluated.
9/14/2015BCHB Edwards Introduction to Python BCHB Lecture 4.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Essential Computing for Bioinformatics Lecture.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
8-1 Compilers Compiler A program that translates a high-level language program into machine code High-level languages provide a richer set of instructions.
 2000 Deitel & Associates, Inc. All rights reserved. Chapter 10 - JavaScript/JScript: Control Structures II Outline 10.1Introduction 10.2Essentials of.
Algorithms Java Methods A & AB Object-Oriented Programming and Data Structures Maria Litvin ● Gary Litvin Copyright © 2006 by Maria Litvin, Gary Litvin,
Quiz 3 is due Friday September 18 th Lab 6 is going to be lab practical hursSept_10/exampleLabFinal/
Loops Robin Burke IT 130. Outline Announcement: Homework #6 Conditionals (review) Iteration while loop while with counter for loops.
1/33 Basic Scheme February 8, 2007 Compound expressions Rules of evaluation Creating procedures by capturing common patterns.
September 7, 2004ICP: Chapter 3: Control Structures1 Introduction to Computer Programming Chapter 3: Control Structures Michael Scherger Department of.
Python Basics  Functions  Loops  Recursion. Built-in functions >>> type (32) >>> int(‘32’) 32  From math >>>import math >>> degrees = 45 >>> radians.
Control Flow (Python) Dr. José M. Reyes Álamo. 2 Control Flow Sequential statements Decision statements Repetition statements (loops)
BING 6004: Intro to Computational BioEngineering Spring 2016 Lecture 1: Using Python Expressions and Variables Bienvenido Vélez UPR Mayaguez Reference:
Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning with Python 1 Introduction to Python programming for Bioinformatics.
Lecture #1: Introduction to Algorithms and Problem Solving Dr. Hmood Al-Dossari King Saud University Department of Computer Science 6 February 2012.
Alex Ropelewski Pittsburgh Supercomputing Center National Resource for Biomedical Supercomputing Bienvenido Vélez
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
MARC: Developing Bioinformatics Programs June 2012 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Using Molecular Biology to Teach Computer Science High-level Programming with Python Finding Patterns.
High-level Programming with Python Expressions and Variables Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning.
MARC: Developing Bioinformatics Programs Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning.
1 A Short Introduction to Analyzing Biological Data Using Relational Databases Part II: Creating a Relational Database to Model Biological Data Alex Ropelewski.
MARC: Developing Bioinformatics Programs June 2012 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist:
Introduction to Python
Control Flow (Python) Dr. José M. Reyes Álamo.
Using Molecular Biology to Teach Computer Science
Using Molecular Biology to Teach Computer Science
Introduction to Python programming for biologists
Python: Control Structures
Essential Computing for Bioinformatics
CS 115 Lecture 8 Structured Programming; for loops
Arithmetic operations, decisions and looping
Functions, Procedures, and Abstraction
3. Decision Structures Rocky K. C. Chang 19 September 2018
Introduction to Python
Computer Science Core Concepts
Introduction to Python
Introduction to Programming
REPETITION Why Repetition?
COMPUTING.
Lecture 6 - Recursion.
Presentation transcript:

MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez Reference: How to Think Like a Computer Scientist: Learning with Python 1 Essential Computing for Bioinformatics Lecture 4 High-level Programming with Python Controlling the flow of your program

The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students from the biological sciences, computer science, and mathematics departments. They have been developed as a part of the NIH funded project “Assisting Bioinformatics Efforts at Minority Schools” (2T36 GM008789). The people involved with the curriculum development effort include: Dr. Hugh B. Nicholas, Dr. Troy Wymore, Mr. Alexander Ropelewski and Dr. David Deerfield II, National Resource for Biomedical Supercomputing, Pittsburgh Supercomputing Center, Carnegie Mellon University. Dr. Ricardo González Méndez, University of Puerto Rico Medical Sciences Campus. Dr. Alade Tokuta, North Carolina Central University. Dr. Jaime Seguel and Dr. Bienvenido Vélez, University of Puerto Rico at Mayagüez. Dr. Satish Bhalla, Johnson C. Smith University. Unless otherwise specified, all the information contained within is Copyrighted © by Carnegie Mellon University. Permission is granted for use, modify, and reproduce these materials for teaching purposes. Most recent versions of these presentations can be found at Essential Computing for Bioinformatics

 Basics of Functions  Decision statements  Recursion  Iteration statements Outline These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 3

Built-in Functions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 4 >>> import math >>> decibel = math.log10 (17.0) >>> angle = 1.5 >>> height = math.sin(angle) >>> degrees = 45 >>> angle = degrees * 2 * math.pi / >>> math.sin(angle) To convert from degrees to radians, divide by 360 and multiply by 2*pi Can you avoid having to write the formula to convert degrees to radians every time?

Defining Your Own Functions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 5 def ( ): import math def radians(degrees): result = degrees * 2 * math.pi / return(result) >>> def radians(degrees):... result=degrees * 2 * math.pi / return(result)... >>> radians(45) >>> radians(180)

Monolithic Code These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 6 From string import * cds = “atgagtgaacgtctgagcattaccccgctggggccgtatatc” gc = float(count(cds, 'g') + count(cds, 'c'))/ len(cds) print gc

Step 1: Wrap Reusable Code in Function These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 7 def gcCount(sequence): gc = float(count(sequence, 'g') + count(sequence, 'c'))/ len(sequence) print gc >>> gcCount(“actgaccgggat”)

Step 2: Add function to script file These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 8 Save script in a file Re-load when you want to use the functions No need to retype your functions Keep a single group of related functions and declarations in each file

 Powerful mechanism for creating building blocks  Code reuse  Modularity  Abstraction (i.e. hide (or forget) irrelevant detail) Why Functions? These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 9

 Should have a single well defined 'contract'  E.g. Return the gc-value of a sequence  Contract should be easy to understand and remember  Should be as general as possible  Should be as efficient as possible  Should not mix calculations with I/O Function Design Guidelines These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 10

Applying the Guidelines These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 11 def gcCount(sequence): gc = float(count(sequence, 'g') + count(sequence, 'c'))/ len(sequence) print gc What can be improved? def gcCount(sequence): gc = float(count(sequence, 'g' + count(sequence, 'c'))/ len(sequence) return gc Why is this better? More reusable function Can call it to get the gcCount and then decide what to do with the value May not have to print the value Function has ONE well-defined objective or CONTRACT

Basics of Functions  Decision statements  Recursion  Iteration statements Outline 12 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center

Decision statements These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 13 if : elif : … else: Each is a BOOLEAN expressions Each is a sequence of statements Level of indentation determines what’s inside each block Indentation has meaning in Python

Compute the complement of a DNA base These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 14 def complementBase(base): if (base == ’a'): return ’t' elif (base == ’t'): return ’a' elif (base == ’c'): return ’g' elif (base == ’g'): return ’c' How can we improve this function?

 Expressions that yield True of False values  Ways to yield a Boolean value  Boolean constants: True and False  Comparison operators (>, =, <=)  Logical Operators (and, or, not)  Boolean functions  0 (means False)  Empty string '’ (means False) Boolean Expressions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 15

 Lets assume that b,a are Boolean values:  (b and True) = b  (b or True) = True  (b and False) = False  (b or False) = b  not (a and b) = (not a) or (not b)  not (a or b) = (not a) and (not b) Some Useful Boolean Laws These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 16 De Morgan’s Laws

A strange Boolean function These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 17 def test(x): if x: return True else: return False What can you use this function for? What types of values can it accept?

Basics of Functions Decision statements  Recursion  Iteration statements Outline 18 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center

Recursive Functions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 19 A classic! >>> fact(5) 120 >>> fact(10) >>> fact(100) L >>> def fact(n): if (n==0): return 1 else: return n * fact(n - 1)

Recursion Basics These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 20 n = 3 fact(2) fact(3) n = 2 n = 1 fact(1) n = 0 fact(0) 1 1 * 1 = 1 2 * 1 = 2 3 * 2 = 6 n = 3 n = 2 n = 1 n = 0 def fact(n): if (n==0): return 1 else: return n * fact(n - 1) Interpreter keeps a stack of activation records

Beware of Infinite Recursions! These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 21 def fact(n): if (n==0): return 1 else: return n * fact(n - 1) What if you call fact 5.5? Explain When using recursion always think about how will it stop or converge

 Compute the reverse of a sequence  Compute the molecular mass of a sequence  Compute the reverse complement of a sequence  Determine if two sequences are complement of each other  Compute the number of stop codons in a sequence  Determine if a sequence has a subsequence of length greater than n surrounded by stop codons  Return the starting position of the subsequence identified in exercise 6 Practice Exercises on Functions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 22 Write recursive Python functions to satisfy the following specifications:

Reversing a sequence recursively def reverse(sequence): 'Returns the reverse string of the argument sequence' if (len(sequence)>1): return reverse(sequence[1:])+sequence[0] else: return sequence

Runtime Complexity - 'Big O' Notation These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 24 def fact(n): if (n==0): return 1 else: return n * fact(n - 1) How 'fast' is this function? Can we come up with a more efficient version? How can we measure 'efficiency' Can we compare algorithms independently from a specific implementation, software or hardware?

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 25 Runtime Complexity - 'Big O' Notation What is a step? How can we measure the size of an input? Answer in both cases: YOU CAN DEFINE THESE! Big Idea Measure the number of steps taken by the algorithm as an asymptotic function of the size of its input

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 26 'Big O' Notation - Factorial Example A 'step' is a function call to fact The size of an input value n is n itself T(0) = 0 T(n) = T(n-1) + 1 = (T(n-2) + 1) + 1 = … = T(n-n) + n = T(0) + n = 0 + n = n Step 1: Count the number of steps for input n Step 2: Find the asymptotic function T(n) = O(n) def fact(n): if (n==0): return 1 else: return n * fact(n - 1) A.K.A Linear Function

Outline Basics of Functions Decision statements Recursion Iteration statements 27 These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center

Iteration These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 28 while : SYNTAX SEMANTICS Repeat the execution of as long as expression remains true SYNTAX = FORMAT SEMANTICS = MEANING

Iterative Factorial These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 29 def iterFact(n): result = 1 while(n>0): result = result * n n = n - 1 return result Work out the runtime complexity: whiteboard

Formatted Output using % operator These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 30 For more details visit: % >>> '%s is %d years old' % ('John', 12) 'John is 12 years old' >>> is a string is a list of values n parenthesis (a.k.a. a tuple) % produces a string replacing each %x with a correding value from the tuple

The For Loop: Another Iteration Statement These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 31 for in : SYNTAX SEMANTICS Repeat the execution of the binding variable to each element of the sequence

These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 32 For Loop Example start = first value end = value right after last one step = increment def iterFact2(n): result = 1 for i in xrange(1,n+1): result = result * i return result xrange(start,end,step) generates a sequence of values :

Revisiting code from Lecture 1 seq="ACTGTCGTAT" print seq Acount= seq.count('A') Ccount= seq.count('C') Gcount= seq.count('G') Tcount= seq.count('T') Total = float(len(seq)) APct = int((Acount/Total) * 100) print 'A percent = %d ' % APct CPct = int((Ccount/Total) * 100) print 'C percent = %d ' % CPct GPct = int((Gcount/Total) * 100) print 'G percent = %d ' % GPct TPct = int((Tcount/Total) * 100) print 'T percent = %d ' % TPct Can we reduce the amount of repetitive code?

Approach: Use For Loop bases = ['A', 'C', 'T', 'G'] sequence = "ACTGTCGTAT" for base in bases: nextPercent = 100 * sequence.count(base)/float(len(sequence)) print 'Percent %s: %d' % (base, nextPercent) How many functions would you refactor this code into?

Exercises on Functions These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 35 Write iterative Python functions to satisfy the following specifications: 1. Compute the reverse of a sequence 2. Compute the molecular mass of a sequence 3. Compute the reverse complement of a sequence 4. Determine if two sequences are complement of each other 5. Compute the number of stop codons in a sequence 6. Determine if a sequence has a subsequence of length greater than n surrounded by stop codons 7. Return the starting position of the subsequence identified in exercise 6

Finding Patterns Within Sequences These materials were developed with funding from the US National Institutes of Health grant #2T36 GM to the Pittsburgh Supercomputing Center 36 from string import * def searchPattern(dna, pattern): 'print all start positions of a pattern string inside a target string’ site = find (dna, pattern) while site != -1: print ’pattern %s found at position %d' % (pattern, site) site = find (dna, pattern, site + 1) Example from: Pasteur Institute Bioinformatics Using Python >>> searchPattern("acgctaggct","gc") pattern gc at position 2 pattern gc at position 7 >>>

Homework Extend searchPattern to handle unknown residues