Python.

Slides:



Advertisements
Similar presentations
The genetic code.
Advertisements

Translation By Josh Morris.
ATG GAG GAA GAA GAT GAA GAG ATC TTA TCG TCT TCC GAT TGC GAC GAT TCC AGC GAT AGT TAC AAG GAT GAT TCT CAA GAT TCT GAA GGA GAA AAC GAT AAC CCT GAG TGC GAA.
RNA Say Hello to DNA’s little friend!. EngageEssential QuestionExplain Describe yourself to long lost uncle. How do the mechanisms of genetics and the.
Supplementary Fig.1: oligonucleotide primer sequences.
Gene Mutations Worksheet
Transcription & Translation Worksheet
1 Essential Computing for Bioinformatics Bienvenido Vélez UPR Mayaguez Lecture 5 High-level Programming with Python Part II: Container Objects Reference:
 Genetic information, stored in the chromosomes and transmitted to the daughter cells through DNA replication is expressed through transcription to RNA.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Dictionaries.
IGEM Arsenic Bioremediation Possibly finished biobrick for ArsR by adding a RBS and terminator. Will send for sequencing today or Monday.
Introduction to Biopython
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Proteins are made by decoding the Information in DNA Proteins are not built directly from DNA.
Nature and Action of the Gene
Biological Dynamics Group Central Dogma: DNA->RNA->Protein.
Introduction to Python for Biologists Lecture 2 This Lecture Stuart Brown Associate Professor NYU School of Medicine.
Math 15 Introduction to Scientific Data Analysis Lecture 10 Python Programming – Part 4 University of California, Merced Today – We have A Quiz!
More on translation. How DNA codes proteins The primary structure of each protein (the sequence of amino acids in the polypeptide chains that make up.
Undifferentiated Differentiated (4 d) Supplemental Figure S1.
Supplemental Table S1 For Site Directed Mutagenesis and cloning of constructs P9GF:5’ GAC GCT ACT TCA CTA TAG ATA GGA AGT TCA TTT C 3’ P9GR:5’ GAA ATG.
Lecture 10, CS5671 Neural Network Applications Problems Input transformation Network Architectures Assessing Performance.
Fig. S1 siControl E2 G1: 45.7% S: 26.9% G2-M: 27.4% siER  E2 G1: 70.9% S: 9.9% G2-M: 19.2% G1: 57.1% S: 12.0% G2-M: 30.9% siRNF31 E2 A B siRNF31 siControl.
PART 1 - DNA REPLICATION PART 2 - TRANSCRIPTION AND TRANSLATION.
TRANSLATION: information transfer from RNA to protein the nucleotide sequence of the mRNA strand is translated into an amino acid sequence. This is accomplished.
RNA Structure Like DNA, RNA is a nucleic acid. RNA is a nucleic acid made up of repeating nucleotides.
Today… Genome 351, 8 April 2013, Lecture 3 The information in DNA is converted to protein through an RNA intermediate (transcription) The information in.
Do Now Look at the picture below and answer the following questions.
 The following material is the result of a curriculum development effort to provide a set of courses to support bioinformatics efforts involving students.
Chapter 11 DNA and Genes.
NSCI 314 LIFE IN THE COSMOS 4 - The Biochemistry of Life on Earth Dr. Karen Kolehmainen Department of Physics CSUSB
Passing Genetic Notes in Class CC106 / Discussion D by John R. Finnerty.
Dictionaries. A “Good morning” dictionary English: Good morning Spanish: Buenas días Swedish: God morgon German: Guten morgen Venda: Ndi matscheloni Afrikaans:
Parts is parts…. AMINO ACID building block of proteins contain an amino or NH 2 group and a carboxyl (acid) or COOH group PEPTIDE BOND covalent bond link.
Suppl. Figure 1 APP23 + X Terc +/- Terc +/-, APP23 + X Terc +/- G1Terc -/-, APP23 + X G1Terc -/- G2Terc -/-, APP23 + X G2Terc -/- G3Terc -/-, APP23 + and.
Structure and Function of DNA DNA Replication and Protein Synthesis.
Example 1 DNA Triplet mRNA Codon tRNA anticodon A U A T A U G C G
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
DNA, RNA and Protein.
G U A C G U A C C A U G G U A C A C U G UUU UUC UUA UCU UUG UCC UCA
Protein Synthesis Translation e.com/watch?v=_ Q2Ba2cFAew (central dogma song) e.com/watch?v=_ Q2Ba2cFAew.
From DNA to Protein.
BioPython Download & Installation Documentation
Translation PROTEIN SYNTHESIS.
Whole process Step by step- from chromosomes to proteins.
Please turn in your homework
Modelling Proteomes.
Python.
BioPython Download & Installation Documentation
Supplementary information Table-S1 (Xiao)
Sequence – 5’ to 3’ Tm ˚C Genome Position HV68 TMER7 Δ mt. Forward
Supplemental Table 3. Oligonucleotides for qPCR
GENE MUTATIONS aka point mutations © 2016 Paul Billiet ODWS.
Supplementary Figure 1 – cDNA analysis reveals that three splice site alterations generate multiple RNA isoforms. (A) c.430-1G>C (IVS 6) results in 3.
Biology Chapter 9 Section 2 Part 2
Section Objectives Relate the concept of the gene to the sequence of nucleotides in DNA. Sequence the steps involved in protein synthesis.
Protein Synthesis Translation.
DNA By: Mr. Kauffman.
Gene architecture and sequence annotation
PROTEIN SYNTHESIS RELAY
More on translation.
Transcription You’re made of meat, which is made of protein.
Molecular engineering of photoresponsive three-dimensional DNA
Fundamentals of Protein Structure
Today’s notes from the student table Something to write with
Transcription and Translation
Python.
Bellringer Please answer on your bellringer sheet:
Presentation transcript:

Python

What is Biopython? Biopython is a python library of resources for developers of Python-base software for bioinformatics and research. can parse bioinformatics files into local data structures Fasta, GenBank, Blast output Clustalw etc. Can access many files directly ( web database, NCBI) from within the script. Works with sequences and records Many search algorithms, comparative algorithms and format options.

Installing BioPython Comes with Anaconda. You don’t even have to type in the import commands! If you use the standard IDLE environment you will need to download BioPython and place it in the proper directory. Bioinformatics has become so important in recent years that almost every programming environment, C++, Perl, etc has its own Bioinfo libraries.

Sequence objects Biological sequences represent the main point of interest in Bioinformatics processing. Python includes a special datatype called a Sequence. Sequence objects are not the same as Python strings. They are really strings together with additional information, such as an alphabet, and a variety of methods such as translate(), reverse_complement() and so on. dna = ‘AGTACACTGGT ‘  this is a pure string // Here is how you create a sequence object. seqdna = Seq(‘AGTACACTGGT ‘, Alphabet())  sequence obj Note that seqdna is a sequence object not just a string.

Alphabets - See IUPAC (international union of pure and applied chemistry) Alphabets are just the set of allowable characters that are used in the string. IUPAC.unambiguous_dna is really just the set {A,C,G, T} of nucleotides. IUPAC.unambiguous_rna is {A,C,G,U} IUPAC.protein is just the 20 standard amino acids {A,R,N,D,C,Q,E,H,I,L,K,M,F,P,S,T,W,Y,V} and others We will use mainly the {A,C,G,T} DNA set. Nice for type checking our sequences.

Dumping Alphabets from Bio.Alphabet import IUPAC print IUPAC.unambiguous_dna.letters print IUPAC.ambiguous_dna.letters print IUPAC.unambiguous_rna.letters print IUPAC.protein.letters OUTPUT GATC GATCRYWSMKHBVDN GAUC ACDEFGHIKLMNPQRSTVWY

Can work with Sequence objects like strings from Bio.Seq import Seq from Bio.Alphabet import IUPAC my_seq = Seq("GATCG", IUPAC.unambiguous_dna) print my_seq[0]  prints first letter print len(my_seq)  print length of string in sequence print Seq(“AAAA”).count(“AA”)  non overlapping count ie 2 print GC(my_seq)  Gives the GC % of the sequence. print my_seq[2:5]  We can even slice them. Returns a Seq. #convert seq obj to a pure string obj dna_string = str(my_seq)

MutaableSeq >>>from Bio.Seq import Seq >>>from Bio.Alphabet import IUPAC my_seq = Seq("GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA", IUPAC.unambiguous_dna) Observe what happens if you try to edit the sequence: >>> my_seq[5] = "G" Traceback (most recent call last): ... TypeError: ’Seq’ object does not support item assignment However, you can convert it into a mutable sequence (a MutableSeq object) and do pretty much anything you want with it: >>> mutable_seq = my_seq.tomutable() >>> mutable_seq MutableSeq(’GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA’, IUPACUnambiguousDNA())

We can modify these >>> mutable_seq MutableSeq(’GCCATTGTAATGGGCCGCTGAAAGGGTGCCC’, IUPACUnambiguousDNA()) >>> mutable_seq[5] = "C" >>> mutable_seq MutableSeq(’GCCATCGTAATGGGCCGCTGAAAGGGTGCCC’, IUPACUnambiguousDNA()) >>> mutable_seq.remove("T") >>> mutable_seq MutableSeq(’GCCACGTAATGGGCCGCTGAAAGGGTGCCC’, IUPACUnambiguousDNA()) >>> mutable_seq.reverse() >>> mutable_seq MutableSeq(’CCCGTGGGAAAGTCGCCGGGTAATGCACCG’, IUPACUnambiguousDNA())

Nucleotide sequences and (reverse) complements >>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC", IUPAC.unambiguous_dna) >>> my_seq Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA()) >>> my_seq.complement() Seq('CTAGCTACCCGGATATATCCTAGCTTTTAGCG', IUPACUnambiguousDNA()) >>> my_seq.reverse_complement() Seq('GCGATTTTCGATCCTATATAGGCCCATCGATC', IUPACUnambiguousDNA())

Reversing a Sequence an easy way to just reverse a Seq object (or a Python string) is slice it with -1 step # FORWARD >>> my_seq Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA()) #BACKWARD ( Using a -1 step slice ) >>> my_seq[::-1] Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG', IUPACUnambiguousDNA())

Double Stranded DNA 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’ DNA coding strand (aka Crick strand, strand +1) 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’ ||||||||||||||||||||||||||||||||||||||| 3’ TACCGGTAACATTACCCGGCGACTTTCCCACGGGCTATC 5’ DNA template strand (aka Watson strand, strand −1)

Transcription 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’ ||||||||||||||||||||||||||||||||||||||| 3’ TACCGGTAACATTACCCGGCGACTTTCCCACGGGCTATC 5’ Transcription 5’ AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG 3’ Single stranded messenger RNA

Lets do some Reverse Comp from Bio.Seq import Seq from Bio.Alphabet import IUPAC coding_dna = Seq(“ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) template_dna= coding_dna.reverse_complement() print template_dna CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT

Transcribe ( T->U ) from Bio.Seq import Seq from Bio.Alphabet import IUPAC coding_dna = Seq(“ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) messenger_rna = coding_dna.transcribe() print messenger_rna AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG //or you can do both messenger_rna = coding_dna.reverse_complement().transcribe()

Translate into protein from Bio.Seq import Seq from Bio.Alphabet import IUPAC messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCG AUAG", IUPAC.unambiguous_rna) print messenger_rna print messenger_rna.translate() # I added the spaces AUG GCC AUU GUA AUG GGC CGC UGA AAG GGU GCC CGA UAG MAIVMGR*KGAR* # the * represents stop codons.

Standard Translation Table

Printing Tables from Bio.Seq import Seq from Bio.Alphabet import IUPAC from Bio.Data import CodonTable stdTable = CodonTable.unambiguous_dna_by_id[1] print stdTable mitoTable = CodonTable.unambiguous_dna_by_id[2] print mitoTable

Table 1 Standard, SGC0 | T | C | A | G | --+---------+---------+---------+---------+-- T | TTT F | TCT S | TAT Y | TGT C | T T | TTC F | TCC S | TAC Y | TGC C | C T | TTA L | TCA S | TAA Stop| TGA Stop| A T | TTG L(s)| TCG S | TAG Stop| TGG W | G C | CTT L | CCT P | CAT H | CGT R | T C | CTC L | CCC P | CAC H | CGC R | C C | CTA L | CCA P | CAA Q | CGA R | A C | CTG L(s)| CCG P | CAG Q | CGG R | G A | ATT I | ACT T | AAT N | AGT S | T A | ATC I | ACC T | AAC N | AGC S | C A | ATA I | ACA T | AAA K | AGA R | A A | ATG M(s)| ACG T | AAG K | AGG R | G G | GTT V | GCT A | GAT D | GGT G | T G | GTC V | GCC A | GAC D | GGC G | C G | GTA V | GCA A | GAA E | GGA G | A G | GTG V | GCG A | GAG E | GGG G | G Table 2 Vertebrate Mitochondrial, SGC1 | T | C | A | G | --+---------+---------+---------+---------+-- T | TTT F | TCT S | TAT Y | TGT C | T T | TTC F | TCC S | TAC Y | TGC C | C T | TTA L | TCA S | TAA Stop| TGA W | A T | TTG L | TCG S | TAG Stop| TGG W | G C | CTT L | CCT P | CAT H | CGT R | T C | CTC L | CCC P | CAC H | CGC R | C C | CTA L | CCA P | CAA Q | CGA R | A C | CTG L | CCG P | CAG Q | CGG R | G A | ATT I(s)| ACT T | AAT N | AGT S | T A | ATC I(s)| ACC T | AAC N | AGC S | C A | ATA M(s)| ACA T | AAA K | AGA Stop| A A | ATG M(s)| ACG T | AAG K | AGG Stop| G G | GTT V | GCT A | GAT D | GGT G | T G | GTC V | GCC A | GAC D | GGC G | C G | GTA V | GCA A | GAA E | GGA G | A G | GTG V(s)| GCG A | GAG E | GGG G | G

Codon - Amino Acids Amino Acid SLC DNA codons Isoleucine I ATT, ATC, ATA Leucine   L CTT, CTC, CTA, CTG, TTA, TTG Valine V GTT, GTC, GTA, GTG Phenylalanine   F TTT, TTC Methionine M ATG Cysteine  C TGT, TGC Alanine       A GCT, GCC, GCA, GCG Glycine   G GGT, GGC, GGA, GGG Proline       P CCT, CCC, CCA, CCG Threonine   T ACT, ACC, ACA, ACG Serine        S TCT, TCC, TCA, TCG, AGT, AGC Tyrosine   Y TAT, TAC Tryptophan   W TGG Glutamine   Q CAA, CAG Asparagine   N AAT, AAC Histidine  H CAT, CAC Glutamic acid   E GAA, GAG Aspartic acid  D GAT, GAC Lysine        K AAA, AAG Arginine   R CGT, CGC, CGA, CGG, AGA, AGG Stop codons Stop TAA, TAG, TGA .

The SeqRecord Object A SeqRecord is a structure that allows the storage of additional information with a sequence. This includes the usual information found in standard genbank files. The following is a sample of the fields in SeqRecord. .seq - The sequence .id - The primary ID used to identify the sequence (String) .name – The common name of the sequence .annotations – A dictionary of additional information about the sequence .features –A list of SeqFeature objects and others.

Build From Scratch >>> from Bio.Seq import Seq >>> simple_seq = Seq("GATC") >>> from Bio.SeqRecord import SeqRecord >>> simple_seq_r = SeqRecord(simple_seq) or pass in the id, description etc. >>> simple_seq_r.id = "AC12345" >>> simple_seq_r.description = "Made up sequence I wish I could write a paper about" >>> print(simple_seq_r.description) Made up sequence I wish I could write a paper about >>> simple_seq_r.seq Seq(’GATC’, Alphabet())

Fill SeqRecord from Fasta file >gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... pPCP1, complete sequence TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAA TCAGATCCAGGGGGTAATCTGCTCTCC ------------------------------------------------------------------------------------------ >>> from Bio import SeqIO # Note that SeqIO.read will only read one record. >>> record = SeqIO.read("NC_005816.fna", "fasta") >>> record SeqRecord(seq=Seq(’TGTAACGAACGGTGCAATAGTGATCCACA CCCAACGCCTGAAATCAGATCCAGG...CTG’, SingleLetterAlphabet()), id=’gi|45478711|ref|NC_005816.1|’, name=’gi|45478711|ref|NC_005816.1|’, description=’gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... sequence’, dbxrefs=[])

ACCEss the fields Individually >>> record.seq Seq(’TGTAACGAACGGTGCAATAGTGATCCACACCCAACGC CTGAAATCAGATCCAGG...CTG’, SingleLetterAlphabet()) >>> record.id ’gi|45478711|ref|NC_005816.1|’ >>> record.description ’gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... pPCP1, complete sequence’

These are missing >>> record.dbxrefs [] >>> record.annotations {} >>> record.letter_annotations {} >>> record.features [] Note which is a dict and which is a list!!

Reading Genbank Files LOCUS NC_005816 9609 bp DNA circular BCT 21-JUL-2008 DEFINITION Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence. ACCESSION NC_005816 VERSION NC_005816.1 GI:45478711 PROJECT GenomeProject:10638

Read the File >>> from Bio import SeqIO >>> record = SeqIO.read("NC_005816.gb", "genbank") >>> record SeqRecord(seq=Seq(’TGTAACGAACGGTGCAATAGTGATCC ACACCCAACGCCTGAAATCAGATCCAGG...CTG’, IUPACAmbiguousDNA()), id=’NC_005816.1’, name=’NC_005816’, description=’Yersinia pestis biovar Microtus str. 91001 plasmid pPCP1, complete sequence.’, dbxrefs=[’Project:10638’])

Read a record from Bio import SeqIO record = SeqIO.read("micoplasmaGen.gb","genbank") print record.description ct=0 for f in record.features: if f.type=='gene': ct+=1 print ct