“Everything Else”.

Slides:



Advertisements
Similar presentations
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

An Introduction to Python – Part II Dr. Nancy Warter-Perez.
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
Introduction to Python
Lilian Blot CORE ELEMENTS COLLECTIONS & REPETITION Lecture 4 Autumn 2014 TPOP 1.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
An Introduction to Textual Programming
The if statement and files. The if statement Do a code block only when something is True if test: print "The expression is true"
Fall Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.
Lists and the ‘ for ’ loop. Lists Lists are an ordered collection of objects >>> data = [] >>> print data [] >>> data.append("Hello!") >>> print data.
Working on exercises (a few notes first). Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
Built-in Data Structures in Python An Introduction.
Q and A for Sections 2.9, 4.1 Victor Norman CS106 Fall 2015.
COMP 171: Data Types John Barr. Review - What is Computer Science? Problem Solving  Recognizing Patterns  If you can find a pattern in the way you solve.
Functions. Built-in functions You’ve used several functions already >>> len("ATGGTCA")‏ 7 >>> abs(-6)‏ 6 >>> float("3.1415")‏ >>>
Searching and Regular Expressions. Proteins 20 amino acids Interesting structures beta barrel, greek key motif, EF hand... Bind, move, catalyze, recognize,
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
More about Strings. String Formatting  So far we have used comma separators to print messages  This is fine until our messages become quite complex:
Working on exercises (a few notes first)‏. Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
Strings in Python. Computers store text as strings GATTACA >>> s = "GATTACA" s Each of these are characters.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
Python Arithmetic Operators OperatorOperationDescription +AdditionAdd values on either side of the operator -SubtractionSubtract right hand operand from.
Quiz 3 Topics Functions – using and writing. Lists: –operators used with lists. –keywords used with lists. –BIF’s used with lists. –list methods. Loops.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
PYTHON PROGRAMMING. WHAT IS PYTHON?  Python is a high-level language.  Interpreted  Object oriented (use of classes and objects)  Standard library.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
CSC 108H: Introduction to Computer Programming Summer 2011 Marek Janicki.
ENGINEERING 1D04 Tutorial 2. What we’re doing today More on Strings String input Strings as lists String indexing Slice Concatenation and Repetition len()
String and Lists Dr. José M. Reyes Álamo.
Chapter 1.2 Introduction to C++ Programming
Whatcha doin'? Aims: To start using Python. To understand loops.
Topics Designing a Program Input, Processing, and Output
Chapter 1.2 Introduction to C++ Programming
Repetition Structures
Chapter 1.2 Introduction to C++ Programming
Python Let’s get started!.
Introduction to Python
CMSC201 Computer Science I for Majors Lecture 22 – Binary (and More)
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Tuples and Lists.
CMSC201 Computer Science I for Majors Lecture 22 – Searching
Containers and Lists CIS 40 – Introduction to Programming in Python
(optional - but then again, all of these are optional)
Python: Control Structures
CSc 120 Introduction to Computer Programing II
(optional - but then again, all of these are optional)‏
Variables, Expressions, and IO
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
Conditionals (if-then-else)
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes for 2010: I skipped slide 10. This is.
Introduction to Python
Topics Introduction to File Input and Output
Sentinel logic, flags, break Taken from notes by Dr. Neil Moore
Using files Taken from notes by Dr. Neil Moore
Data types Numeric types Sequence types float int bool list str
Repetition Structures
String and Lists Dr. José M. Reyes Álamo.
Input from the Console This video shows how to do CONSOLE input.
Python programming exercise
Strings in Python.
Topics Designing a Program Input, Processing, and Output
Introduction to Python Strings in Python Strings.
Introduction to Computer Science
Lists and the ‘for’ loop
Data Types Every variable has a given data type. The most common data types are: String - Text made up of numbers, letters and characters. Integer - Whole.
Topics Introduction to File Input and Output
Getting Started in Python
Presentation transcript:

“Everything Else”

Start by looking at the documentation. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches? Start by looking at the documentation. S.find(sub [,start [,end]]) -> int Return the lowest index in S where substring sub is found, such that sub is contained within s[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure.

Experiment with find >>> seq = "aaaaTaaaTaaT" >>> seq.find("T") 4 >>> seq.find("T", 4)‏ >>> seq.find("T", 5)‏ 8 >>> seq.find("T", 9)‏ 11 >>> seq.find("T", 12)‏ -1 >>>

How to program it? The only loop we’ve done so far is “for”. But we aren’t looking at every element in the list. We need some way to jump forward and stop when done.

The solution is the while statment while statement The solution is the while statment While the test is true >>> pos = seq.find("T")‏ >>> while pos != -1: ... print "T at index", pos ... pos = seq.find("T", pos+1)‏ ... T at index 4 T at index 8 T at index 11 >>> Do its code block

Duplication is bad. (Unless you’re a gene?)‏ There’s duplication... Duplication is bad. (Unless you’re a gene?)‏ The more copies there are the more likely some will be different than others. >>> pos = seq.find("T")‏ >>> while pos != -1: ... print "T at index", pos ... pos = seq.find("T", pos+1)‏ ... T at index 4 T at index 8 T at index 11 >>>

The break statement The break statement says “exit this loop immediately” instead of waiting for the normal exit. >>> pos = -1 >>> while 1: ... pos = seq.find("T", pos+1)‏ ... if pos == -1: ... break ... print "T at index", pos ... T at index 4 T at index 8 T at index 11 >>>

break in a for A break also works in the for loop sequences = [] Find the first 10 sequences in a file which have a poly-A tail sequences = [] for line in open(filename): seq = line.rstrip()‏ if seq.endswith("AAAAAAAA"): sequences.append(seq)‏ if len(sequences) > 10: break

Sometimes the if statement is more complex than if/else elif Sometimes the if statement is more complex than if/else “If the weather is hot then go to the beach. If it is rainy, go to the movies. If it is cold, read a book. Otherwise watch television.” if is_hot(weather): go_to_beach()‏ elif is_rainy(weather): go_to_movies()‏ elif is_cold(weather): read_book() else: watch_television()‏

tuples Python has another fundamental data type - a tuple. A tuple is like a list except it’s immutable (can’t be changed)‏ >>> data = ("Cape Town", 2004, []) >>> print data ('Cape Town', 2004, [])‏ >>> data[0] 'Cape Town' >>> data[0] = "Johannesburg" Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: object doesn't support item assignment >>> data[1:] (2004, [])‏ >>>

Why tuples? We already have a list type. What does a tuple add? This is one of those deep computer science answers. Tuples can be used as dictionary keys, because they are immutable so the hash value doesn’t change. Tuples are used as anonymous classes and may contain heterogeneous elements. Lists should be homogenous (eg, all strings or all numbers or all sequences or...)‏

String Formating So far all the output examples used the print statement. Print puts spaces between fields, and sticks a newline at the end. Often you’ll need to be more precise. Python has a new definition for the “%” operator when used with a strings on the left-hand side - “string interpolation” >>> name = "Andrew" >>> print "%s, come here" % name Andrew, come here >>>

Simple string interpolation The left side of a string interpolation is always a string. The right side of the string interpolation may be a dictionary, a tuple, or anything else. Let’s start with the last. The string interpolation looks for a “%” followed by a single character (except that “%%” means to use a single “%”). That letter immediately following says how to interpret the object; %s for string, %d for number, %f for float, and a few others Most of the time you’ll just use %s.

Also note some of the special formating codes. % examples Also note some of the special formating codes. >>> "This is a string: %s" % "Yes, it is" 'This is a string: Yes, it is' >>> "This is an integer: %d" % 10 'This is an integer: 10' >>> "This is an integer: %4d" % 10 'This is an integer: 10' >>> "This is an integer: %04d" % 10 'This is an integer: 0010' >>> "This is a float: %f" % 9.8 'This is a float: 9.800000' >>> "This is a float: %.2f" % 9.8 'This is a float: 9.80' >>>

string % tuple To convert multiple values, use a tuple on the right. (Tuple because it can be heterogeneous)‏ Objects are extracted left to right. First % gets the first element in the tuple, second % gets the second, etc. >>> "Name: %s, age: %d, language: %s" % ("Andrew", 33, "Python")‏ 'Name: Andrew, age: 33, language: Python' >>> >>> "Name: %s, age: %d, language: %s" % ("Andrew", 33)‏ Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: not enough arguments for format string The number of % fields and tuple length must match.

string % dictionary When the right side is a dictionary, the left side must include a name, which is used as the key. >>> d = {"name": "Andrew", ... "age": 33, ... "language": "Python"} >>> >>> print "%(name)s is %(age)s years old. Yes, %(age)s." % d Andrew is 33 years old. Yes, 33. A %(names)s may be duplicated and the dictionary size and % count don’t need to match.

Opening a file for writing is very similar to opening one for reading. Writing files Opening a file for writing is very similar to opening one for reading. >>> infile = open("sequences.seq")‏ >>> outfile = open("sequences_small.seq", "w")‏ Open file for writing

The write method I need to write my own newline. >>> infile = open("sequences.seq")‏ >>> outfile = open("sequences_small.seq", "w")‏ >>> for line in infile: ... seq = line.rstrip()‏ ... if len(seq) < 1000: ... outfile.write(seq)‏ ... outfile.write("\n")‏ ... >>> outfile.close()‏ >>> infile.close()‏ >>> I need to write my own newline. The close is optional, but good style. Don’t fret too much about it.

Command-line arguments The short version is that Python gives you access to the list of Unix command-line arguments through sys.argv, which is a normal Python list. % cat show_args.py import sys print sys.argv % python show_args.py ['show_args.py'] % python show_args.py 2 3 ['show_args.py', '2', '3'] % python show_args.py "Hello, World" ['show_args.py', 'Hello, World'] %

The hydrophobic residues are [FILAPVM]. Exercise 1 The hydrophobic residues are [FILAPVM]. Write a program which asks for a protein sequence and prints “Hydrophobic signal” if (and only if) it has at least 5 hydrophobic residues in a row. Otherwise print “No hydrophobic signal.”

Test cases for #1 Protein sequence? AA Protein sequence? FILAEPVM No hydrophobic signal Protein sequence? AAAAAAAAAA Hydrophobic signal Protein sequence? AAFILAPILA Protein sequence? ANDREWDALKE Protein sequence? FILAEPVM No hydrophobic signal Protein sequence? FILA Protein sequence? QQPLIMAW Hydrophobic signal

Exercise #2 Some test cases Modify your solution from Exercise #1 so that it prints “Strong hydrophobic signal” if the input sequence has 7 or more hydrophobic residues in a row, print “Weak hydrophobic signal” if it has 3 or more in a row. Otherwise, print “No hydrophobic signal.” Some test cases Protein sequence? AA No hydrophobic signal Protein sequence? AAAAAAAAAA Strong hydrophobic signal Protein sequence? AAFILAPILA Protein sequence? ANDREWDALKE Protein sequence? FILAEPVM Weak hydrophobic signal Protein sequence? FILA Protein sequence? QQPLIMAW

Exercise #3 The Prosite pattern for a Zinc finger C2H2 type domain signature is C.{2,4}C.{3}[LIVMFYWC].{8}H.{3,5} Based on the pattern, create a sequence which is matched by it. Use Python to test that the pattern matches your sequence.

See the next page for test cases. Exercise #4 The (made-up) enzyme APD1 cleaves DNA. It recognizes the sequence GAATTC and separates the two thymines. Every such site is cut so if that pattern is present N times then the fully digested result has N+1 sequences. Write a program to get a DNA sequence from the user and “digest” it with APD1. For output print each new sequence, one per line. Hint: Start by finding the location of all cuts. See the next page for test cases.

Test cases for #4 Enter DNA sequence: A A Enter DNA sequence: GAATTC Enter DNA sequence: AGAATTCCCAAGAATTCCTTTGAATTCAGTC AGAAT TCCCAAGAAT TCCTTTGAAT TCAGTC