Numbers, lists and tuples Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.

Slides:



Advertisements
Similar presentations
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Container Types in Python
Chapter 6 Lists and Dictionaries CSC1310 Fall 2009.
Strings Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
File input and output if-then-else Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
 2007 Pearson Education, Inc. All rights reserved C Formatted Input/Output.
Introduction to Python Lecture 1. CS 484 – Artificial Intelligence2 Big Picture Language Features Python is interpreted Not compiled Object-oriented language.
January 24, 2006And Now For Something Completely Different 1 “And Now For Something Completely Different” A Quick Tutorial of the Python Programming Language.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Parsing data records. >sp|P31946|1433B_HUMAN protein beta/alpha OS=Homo sapiens MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY.
 2005 Pearson Education, Inc. All rights reserved Formatted Output.
Lists in Python.
CS190/295 Programming in Python for Life Sciences: Lecture 3 Instructor: Xiaohui Xie University of California, Irvine.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Lists and the ‘ for ’ loop. Lists Lists are an ordered collection of objects >>> data = [] >>> print data [] >>> data.append("Hello!") >>> print data.
Numbers, lists and tuples Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
Built-in Data Structures in Python An Introduction.
10. Python - Lists The list is a most versatile datatype available in Python, which can be written as a list of comma-separated values (items) between.
Introducing Python CS 4320, SPRING Lexical Structure Two aspects of Python syntax may be challenging to Java programmers Indenting ◦Indenting is.
Copyright © 2012 Pearson Education, Inc. Publishing as Pearson Addison-Wesley C H A P T E R 8 Lists and Tuples.
Python I Some material adapted from Upenn cmpe391 slides and other sources.
More about Strings. String Formatting  So far we have used comma separators to print messages  This is fine until our messages become quite complex:
Strings in Python. Computers store text as strings GATTACA >>> s = "GATTACA" s Each of these are characters.
1 CSC 221: Introduction to Programming Fall 2012 Lists  lists as sequences  list operations +, *, len, indexing, slicing, for-in, in  example: dice.
LISTS and TUPLES. Topics Sequences Introduction to Lists List Slicing Finding Items in Lists with the in Operator List Methods and Useful Built-in Functions.
File input and output and conditionals Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Python Programing: An Introduction to Computer Science
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
Strings CSE 1310 – Introduction to Computers and Programming Alexandra Stefan University of Texas at Arlington 1.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
Python Lists and Sequences Peter Wad Sackett. 2DTU Systems Biology, Technical University of Denmark List properties What are lists? A list is a mutable.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
28 Formatted Output.
String and Lists Dr. José M. Reyes Álamo.
C Formatted Input/Output
Excel STDEV.S Function.
Containers and Lists CIS 40 – Introduction to Programming in Python
CS 100: Roadmap to Computing
CSc 120 Introduction to Computer Programing II
Lists Part 1 Taken from notes by Dr. Neil Moore & Dr. Debby Keen
Numbers, lists and tuples
CISC101 Reminders Quiz 2 this week.
Basic Python Collective variables (sequences) Lists (arrays)
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble Notes for 2010: I skipped slide 10. This is.
Bryan Burlingame Halloween 2018
Creation, Traversal, Insertion and Removal
Strings Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
String and Lists Dr. José M. Reyes Álamo.
Chapter 5: Lists and Dictionaries
Python Data Structures: Lists
Topics Sequences Introduction to Lists List Slicing
Lists Part 1 Taken from notes by Dr. Neil Moore
Python Lists and Sequences
CS190/295 Programming in Python for Life Sciences: Lecture 3
CHAPTER 3: String And Numeric Data In Python
15-110: Principles of Computing
Intro to Computer Science CS1510 Dr. Sarah Diesburg
CHAPTER 4: Lists, Tuples and Dictionaries
Python Data Structures: Lists
Python Data Structures: Lists
Introduction to Python Strings in Python Strings.
Bryan Burlingame Halloween 2018
Lists and the ‘for’ loop
Topics Sequences Introduction to Lists List Slicing
“Everything Else”.
CS 100: Roadmap to Computing
CSE 231 Lab 6.
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Lists Like tuples, but mutable. Formed with brackets: Assignment: >>> a = [1,2,3] Or with a constructor function: a = list(some_other_container) Subscription.
Presentation transcript:

Numbers, lists and tuples Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Numbers Python defines various types of numbers: –Integer (1234) –Floating point number (12.34) –Octal and hexadecimal number (0177, 0x9gff) –Complex number ( j) You will likely only use the first two.

Conversions >>> 6/2 3 >>> 3/4 0 >>> 3.0/ >>> 3/ >>> 3*4 12 >>> 3* The result of a mathematical operation on two numbers of the same type is a number of that type. The result of an operation on two numbers of different types is a number of the more complex type. watch out - truncated rather than rounded integer → float

Formatting numbers The % operator formats a number. The syntax is % >>> "%f" % 3 ' ' >>> "%.2f" % 3 '3.00' >>> "%5.2f" % 3 ' 3.00'

Formatting codes %d = integer (d as in digit?) %f = float value (decimal number) %e = scientific notation %g = easily readable notation (i.e., use decimal notation unless there are too many zeroes, then switch to scientific notation)

More complex formats %[flags][width][.precision][code] Left justify (“-”) Include numeric sign (“+”) Fill in with zeroes (“0”) Number of digits after decimal Total width of output d, f, e, g

Examples >>> x = 7718 >>> "%d" % x '7718' >>> "%-6d" % x '7718 ' >>> "%06d" % x '007718' >>> x = >>> "%d" % x '1' >>> "%f" % x ' ' >>> "%e" % x ' e+00' >>> "%g" % x ' ' >>> "%g" % (x * ) ' e+07' Don’t worry if this all looks like Greek – you can figure out how to do these when you need them in your programs. . Read as “use the preceding code to format the following number” (It sure looks like to Greek to me)

Lists A list is an ordered set of objects >>> myString = "Hillary" >>> myList = ["Hillary", "Barack", "John"] Lists are –ordered left to right –indexed like strings (from 0) –mutable –possibly heterogeneous (including containing other lists) >>> list1 = [0, 1, 2] >>> list2 = ['A', 'B', 'C'] >>> list3 = ['D', 'E', 3, 4] >>> list4 = [list1, list2, list3] >>> list4 [[0, 1, 2], ['A', 'B', 'C'], ['D', 'E', 3, 4]]

Lists and dynamic programming # program to print scores in a DP matrix dpm = [ [0,-4,-8], [-4,10,6], [-8,6,20] ] print dpm[0][0], dpm[0][1], dpm[0][2] print dpm[1][0], dpm[1][1], dpm[1][2] print dpm[2][0], dpm[2][1], dpm[2][2] > python print_dpm.py GA G-4106 A-8620 this is called a 2-dimensional list (or a matrix, or a 2-dimensional array)

# program to print scores in a matrix dpm = [ [0,-4,-8], [-4,10,6], [-8,6,20] ] print "%3d" % dpm[0][0], "%3d" % dpm[0][1], "%3d" % dpm[0][2] print "%3d" % dpm[1][0], "%3d" % dpm[1][1], "%3d" % dpm[1][2] print "%3d" % dpm[2][0], "%3d" % dpm[2][1], "%3d" % dpm[2][2] > python print_dpm.py More readable output print integers with 3 characters each

Lists >>> L = ["adenine", "thymine"] + ["cytosine", "guanine"] >>> L = ["adenine", "thymine", "cytosine", "guanine"] >>> print L[0] adenine >>> print L[-1] guanine >>> print L[2:] ['cytosine', 'guanine'] >>> L * 3 ['adenine', 'thymine', 'cytosine', 'guanine', 'adenine', 'thymine', 'cytosine', 'guanine', 'adenine', 'thymine', 'cytosine', 'guanine'] >>> L[9] Traceback (most recent call last): File " ", line 1, in ? IndexError: list index out of range Lists and strings are similar Strings >>> s = 'A'+'T'+'C'+'G' >>> s = "ATCG" >>> print s[0] A >>> print s[-1] G >>> print s[2:] CG >>> s * 3 'ATCGATCGATCG' >>> s[9] Traceback (most recent call last): File " ", line 1, in ? IndexError: string index out of range (you can think of a string as an immutable list of characters)

Lists >>> L = ["adenine", "thymine", "cytosine", "guanine"] >>> print L ['adenine', 'thymine', 'cytosine', 'guanine'] >>> L[1] = "uracil" >>> print L ['adenine', 'uracil', 'cytosine', 'guanine'] >>> L.reverse() >>> print L ['guanine', 'cytosine', 'uracil', 'adenine'] >>> del L[0] >>> print L ['cytosine', 'uracil', 'adenine'] Lists can be changed; strings are immutable. Strings >>> s = "ATCG" >>> print s ATCG >>> s[1] = "U" Traceback (most recent call last): File " ", line 1, in ? TypeError: object doesn't support item assignment >>> s.reverse() Traceback (most recent call last): File " ", line 1, in ? AttributeError: 'str' object has no attribute 'reverse'

More list operations and methods >>> L = ["thymine", "cytosine", "guanine"] >>> L.insert(0, "adenine") # insert before position 0 >>> print L ['adenine', 'thymine', 'cytosine', 'guanine'] >>> L.insert(2, "uracil") >>> print L ['adenine', 'thymine', 'uracil', 'cytosine', 'guanine'] >>> print L[:2] ['adenine', 'thymine'] >>> L[:2] = ["A", "T"] # replace elements 0 and 1 >>> print L ['A', 'T', 'uracil', 'cytosine', 'guanine'] >>> L[:2] = [] >>> print L ['uracil', 'cytosine', 'guanine'] >>> L = ['A', 'T', 'C', 'G'] >>> L.index('C') # find index of first list element that is the same as 'C' 2 >>> L.remove('C') # remove first element that is the same a 'C' >>> print L ['A', 'T', 'G'] >>> last = L.pop() # remove and return last element in list >>> print last 'G' >>> print L ['A', 'T']

Methods for expanding lists >>> data = [] # make an empty list >>> print data [] >>> data.append("Hello!")# append means "add to the end" >>> print data ['Hello!'] >>> data.append(5) >>> print data ['Hello!', 5] >>> data.append([9, 8, 7])# append a list to end of the list >>> print data ['Hello!', 5, [9, 8, 7]] >>> data.extend([4, 5, 6]) # extend means append each element >>> print data ['Hello!', 5, [9, 8, 7], 4, 5, 6] >>> print data[2] [9, 8, 7] >>> print data[2][0] 9 notice that this list contains three different types of objects: a string, some numbers, and a list.

Turn a string into a list string.split(x) or list(S) >>> protein = "ALA PRO ILE CYS" >>> residues = protein.split() # split() uses whitespace >>> print residues ['ALA', 'PRO', 'ILE', 'CYS'] >>> list(protein) # list explodes each char ['A', 'L', 'A', ' ', 'P', 'R', 'O', ' ', 'I', 'L', 'E', ' ', 'C', 'Y', 'S'] >>> print protein.split() ['ALA', 'PRO', 'ILE', 'CYS'] >>> protein2 = "HIS-GLU-PHE-ASP" >>> protein2.split("-") # split at every “-” character ['HIS', 'GLU', 'PHE', 'ASP']

Turn a list into a string join is the opposite of split:.join(L) >>> L1 = ["Asp", "Gly", "Gln", "Pro", "Val"] >>> print "-".join(L1) Asp-Gly-Gln-Pro-Val >>> print "**".join(L1) Asp**Gly**Gln**Pro**Val >>> L2 = "\n".join(L1) >>> L2 'Asp\nGly\nGln\nPro\nVal' >>> print L2 Asp Gly Gln Pro Val the order is confusing. - string to join with is first. - list to be joined is second.

Tuples: immutable lists Tuples are immutable. Why? Sometimes you want to guarantee that a list won’t change. Tuples support operations but not methods. >>> T = (1,2,3,4) >>> T*4 (1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4) >>> T + T (1, 2, 3, 4, 1, 2, 3, 4) >>> T (1, 2, 3, 4) >>> T[1] = 4 Traceback (most recent call last): File " ", line 1, in ? TypeError: object doesn't support item assignment >>> x = (T[0], 5, "eight") >>> print x (1, 5, 'eight') >>> y = list(x) # converts a tuple to a list >>> print y.reverse() ('eight', '5', '1') >>> z = tuple(y) # converts a list to a tuple

Basic list operations: L = ['dna','rna','protein'] # list assignment L2 = [1,2,'dogma',L] # list hold different objects L2[2] = 'central' # change an element (mutable) L2[0:2] = 'ACGT' # replace a slice del L[0:1] = 'nucs' # delete a slice L2 + L # concatenate L2*3 # repeat list L[x:y] # define the range of a list len(L) # length of list ''.join(L) # convert a list to string S.split(x) # convert string to list- x delimited list(S) # convert string to list - explode list(T) # converts a tuple to list Methods: L.append(x) # add to the end L.extend(x) # append each element from x to list L.count(x) # count the occurrences of x L.index(x) # give element location of x L.insert(i,x) # insert at element x at element i L.remove(x) # delete first occurrence of x L.pop(i) # extract element I L.reverse() # reverse list in place L.sort() # sort list in place

Reminder - linked from the course web site is a Python cheat sheet that contains most of the basic information we are covering in a shorter reference form.

Sample problem #1 Write a program called dna-composition.py that takes a DNA sequence as the first command line argument and prints the number of A’s, C’s, G’s and T’s. > python dna-composition.py ACGTGCGTTAC 2 A’s 3 C’s 3 G’s 3 T’s

Solution #1 import sys sequence = sys.argv[1].upper() print sequence.count('A'), "A's" print sequence.count('C'), "C's" print sequence.count('G'), "G's" print sequence.count('T'), "T's" Note - this uses the trick that you can embed single quotes inside a double-quoted string (or vice versa) without using an escape code.

Sample problem #2 The object sys.argv is a list of strings. Write a program reverse-args.py that removes the program name from the beginning of this list and then prints the remaining command line arguments (no matter how many of them are given) in reverse order with asterisks in between. > python reverse-args.py *2*1

Solution #2 import sys args = sys.argv[1:] args.reverse() print "*".join(args)

Sample problem #3 The melting temperature of a primer sequence (with its exact reverse complement) can be estimated as: T = 2 * (# of A or T nucleotides) + 4 * (# of G or C nucleotides) Write a program melting-temperature.py that computes the melting temperature of a DNA sequence given as the first argument. > python melting-temperature.py ACGGTCA 22

Solution #3 import sys sequence = sys.argv[1].upper() numAs = sequence.count('A') numCs = sequence.count('C') numGs = sequence.count('G') numTs = sequence.count('T') temp = (2 * (numAs + numTs)) + (4 * (numGs + numCs)) print temp

Challenge problem Download the file "speech.txt" from the course web site. Read the entire file contents into a string, divide it into a list of words, sort the list of words, and print the list. Make the words all lower case so that they sort more sensibly (by default all upper case letters come before all lower case letters). Tips: To read the file as a single string use: speech_text = open("speech.txt").read() To sort a list of strings use: string_list.sort()

Challenge problem solution speech_text = open("speech.txt").read() # next line optional, just gets rid of punctuation speech_text = speech_text.replace(",","").replace(".","") speech_text = speech_text.lower() wordList = speech_text.split() wordList.sort() print wordList

Reading Chapters 10 and 12 of Think Python by Downey.