10/20/2014BCHB524 - 2014 - Edwards Advanced Python Concepts: Modules BCHB524 2014 Lecture 14.

Slides:



Advertisements
Similar presentations
While loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

10/1/2014BCHB Edwards Python Modules and Basic File Parsing BCHB Lecture 10.
10/6/2014BCHB Edwards Sequence File Parsing using Biopython BCHB Lecture 11.
Lecture 8: Basic concepts of subroutines. Functions In perl functions take the following format: – sub subname – { my $var1 = $_[0]; statements Return.
“Everything Else”. Find all substrings We’ve learned how to find the first location of a string in another string with find. What about finding all matches?
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
BioPython Workshop Gershon Celniker Tel Aviv University.
9/16/2015BCHB Edwards Introduction to Python BCHB Lecture 5.
11/6/2013BCHB Edwards Using Web-Services: NCBI E-Utilities, online BLAST BCHB Lecture 19.
8/29/2014BCHB Edwards Introduction to Python BCHB Lecture 2.
9/14/2015BCHB Edwards Introduction to Python BCHB Lecture 4.
Statistical significance of alignment scores Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering.
9/23/2015BCHB Edwards Advanced Python Data Structures BCHB Lecture 7.
Dictionaries.   Review on for loops – nested for loops  Dictionaries (p.79 Learning Python)  Sys Module for system arguments  Reverse complementing.
9/28/2015BCHB Edwards Basic Python Review BCHB Lecture 8.
Working on exercises (a few notes first)‏. Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
11/4/2015BCHB Edwards Advanced Python Concepts: Object Oriented Programming BCHB Lecture 17.
GE3M25: Computer Programming for Biologists Python, Class 5
9/2/2015BCHB Edwards Introduction to Python BCHB524 Lecture 1.
11/9/2015BCHB Edwards Advanced Python Concepts: OOP & Inheritance BCHB Lecture 18.
9/11/2015BCHB Edwards Introduction to Python BCHB Lecture 3.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Biopython 1. What is Biopython? tools for computational molecular biology to program in python and want to make it as easy as possible to use python for.
10/30/2013BCHB Edwards Project/Review BCHB Lecture 17.
Sequence File Parsing using Biopython
Introduction to Python
Introduction to Python
Advanced Python Idioms
Introduction to Python
Introduction to Python
Advanced Python Concepts: Modules
Advanced Python Data Structures
Python Modules and Basic File Parsing
Introduction to Python
Introduction to Python
Using Web-Services: NCBI E-Utilities, online BLAST
Advanced Python Data Structures
Introduction to Python
Introduction to Python
Using Web-Services: NCBI E-Utilities, online BLAST
Advanced Python Concepts: Object Oriented Programming
Advanced Python Concepts: OOP & Inheritance
Sequence File Parsing using Biopython
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
Advanced Python Concepts: OOP & Inheritance
Advanced Python Concepts: Object Oriented Programming
Introduction to Python
Advanced Python Concepts: Exceptions
Table 1. Occurrence of N-X-S/T motives in tryptic peptides1
Basic Local Alignment Search Tool (BLAST)
Introduction to Python
Advanced Python Data Structures
Advanced Python Concepts: Modules
Advanced Python Concepts: OOP & Inheritance
Introduction to Python
Advanced Python Idioms
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
Introduction to Python
Advanced Python Concepts: Exceptions
Introduction to Python
Introduction to Python
Advanced Python Idioms
Advanced Python Concepts: Modules
Python Modules and Basic File Parsing
Advanced Python Concepts: Object Oriented Programming
Using Web-Services: NCBI E-Utilities, online BLAST
Sequence File Parsing using Biopython
“Everything Else”.
Presentation transcript:

10/20/2014BCHB Edwards Advanced Python Concepts: Modules BCHB Lecture 14

10/20/2014BCHB Edwards2 Modules We've already used these from the python standard library and from BioPython import sys filename = sys.argv[1] file_handle = open(filename) import Bio.SeqIO seq_records = Bio.SeqIO.parse(file_handle, "swiss") import gzip file_handle = gzip.open(filename) import urllib file_handle = urllib.urlopen(url) import math x = math.sqrt(y)

10/20/2014BCHB Edwards3 Modules Modules contain previously defined functions, variables, and (soon!) classes. Store useful code for re-use in many programs Structure related code together import sys import MyNucStuff seqfilename = sys.argv[1] seq = MyNucStuff.read_seq_from_filename(seqfilename) cseq = MyNucStuff.complement(seq) rcseq = MyNucStuff.reverseComplement(seq) print " Sequence:",seq print " C sequence:",cseq print "RC sequence:",rcseq

10/20/2014BCHB Edwards4 Modules In MyNucStuff.py def read_seq_from_filename(seq_filename): seq_file = open(seq_filename) dna_seq = ''.join(seq_file.read().split()) dna_seq = dna_seq.upper() seq_file.close() return dna_seq compl_dict = {'A':'T', 'T':'A', 'C':'G', 'G':'C'} def complement(seq): return ''.join(map(compl_dict.get,seq)) def revseq(seq): return ''.join(reversed(seq)) def reverseComplement(seq): return complement(revseq(seq)) # plus anything else you like...

10/20/2014BCHB Edwards5 Modules We can import specific functions directly… And reference them without the module name import sys from MyNucStuff import read_seq_from_filename from MyNucStuff import complement from MyNucStuff import reverseComplement seqfilename = sys.argv[1] seq = read_seq_from_filename(seqfilename) cseq = complement(seq) rcseq = reverseComplement(seq) print " Sequence:",seq print " C sequence:",cseq print "RC sequence:",rcseq

10/20/2014BCHB Edwards6 Modules We can even import all the functions from a module… import sys from MyNucStuff import * seqfilename = sys.argv[1] seq = read_seq_from_filename(seqfilename) cseq = complement(seq) rcseq = reverseComplement(seq) print " Sequence:",seq print " C sequence:",cseq print "RC sequence:",rcseq

10/20/2014BCHB Edwards7 Packages Packages are collections of modules, grouped together. All equivalent: Implemented using files and folders/directories. import Bio.SeqIO Bio.SeqIO.parse(handle, "swiss") from Bio import SeqIO SeqIO.parse(handle, "swiss") from Bio.SeqIO import parse parse(handle, "swiss")

What can go wrong? Sometimes our own.py files can "collide" with Python's packages. Test what happens with an "empty" module in files: xml.py (and then try to import ElementTree) Bio.py (and then try to import SeqIO) etc… 10/20/2014BCHB Edwards8

10/20/2014BCHB Edwards9 A module for codon tables Module is called: codon_table Functions: read_codons_from_filename(filename) returns dictionary of codons – value is pair: (amino-acid symbol, initiation codon true/false) amino_acid(codon_table,codon) returns amino-acid symbol for codon is_init(codon_table,codon) returns true if codon is an initiation codon, false, otherwise get_ambig_aa (codon_table,codon) Returns the single amino-acid consistent with ambiguous codon (containing N's), or X. translate(codon_table,seq,frame) returns amino-acid sequence for DNA sequence seq

10/20/2014BCHB Edwards10 A module for codon tables from MyNucStuff import * from codon_table import * import sys if len(sys.argv) < 3: print "Require codon table and DNA sequence on command-line." sys.exit(1) table = read_codons_from_filename(sys.argv[1]) seq = read_seq_from_filename(sys.argv[2]) if is_init(table,seq[:3]): print "Initial codon is an initiation codon" for frame in (1,2,3): print "Frame",frame,"(forward):",translate(table,seq,frame)

A module for codons In codon_table.py: 10/20/2014BCHB Edwards11 def read_codons_from_filename(codonfile): # magic return codon_table def amino_acid(table,codon): # magic return aa def is_init(table,codon): # magic return init def get_ambig_aa(table,codon): # magic return aa def translate(table,seq,frame): # magic return aaseq

A module for codons In codon_table.py: 10/20/2014BCHB Edwards12 def read_codons_from_filename(codonfile): f = open(codonfile) data = {} for l in f: sl = l.split() key = sl[0] value = sl[2] data[key] = value f.close() b1 = data['Base1'] b2 = data['Base2'] b3 = data['Base3'] aa = data['AAs'] st = data['Starts'] codon_table = {} n = len(aa) for i in range(n): codon = b1[i] + b2[i] + b3[i] isInit = (st[i] == 'M') codon_table[codon] = (aa[i],isInit) return codon_table

10/20/2014BCHB Edwards13 Exercise Rework the lecture, and your solutions (or mine) from the homework exercises #1 through #3, to make a MyDNAStuff module. Put as many useful nucleotide functions as possible into the module... Rework the lecture, and your solutions (or mine) from homework exercises #4 and #5, to make the codon_table module with functions specified in this lecture. Demonstrate the use of these modules to translate an amino- acid sequence in all six-frames with just a few lines of code. The final result should look similar to Slide 10. Your program should handle DNA sequence with N’s in it.

Class Project: Expectations 40% of your grade! Project Report Long version of your homework write-up Project Presentation Demo your program Describe your project solution 10/20/2014BCHB Edwards14

10/20/2014BCHB Edwards15 Class Project: Blast Database 1. Write a program that computes all pairwise blast alignments for two species' proteomes and stores the alignments in a relational database. 2. Write a program that retrieves the blast alignment for two proteins (specified by their accessions) from the relational database. 3. Write a program that finds pairs of orthologous proteins that are mutually best hits in the species' proteomes.

10/20/2014BCHB Edwards16 Class Project: MS/MS Viewer Write a program to display peptide fragmentation spectra from an mzXML file. The program will take an mzXML file, a scan number, and a peptide sequence as input. The peptide's b-ion and y-ion m/z values should be computed, and peaks matching these m/z values annotated with appropriate labels. The output figure/plot should aid the user in determining whether or not the peptide is a good match to the spectrum.

10/20/2014BCHB Edwards17 Class Project: Protein Digest Write a simple web-server application using TurboGears to carry out an in silico enzymatic digest of a user-provided protein sequence. Users should be able to specify min and max length, min and max molecular weight, # of missed cleavages, and specific enzyme. Output should be a table of peptides, with their length, molecular weight, # of missed cleavages, and amino-acids to left and right of each peptide in the protein sequence.