Download presentation
Presentation is loading. Please wait.
1
Python
2
What is Biopython? Biopython is a python library of resources for developers of Python-base software for bioinformatics and research. can parse bioinformatics files into local data structures Fasta, GenBank, Blast output Clustalw etc. Can access many files directly ( web database, NCBI) from within the script. Works with sequences and records Many search algorithms, comparative algorithms and format options.
3
Installing BioPython Comes with Anaconda. You don’t even have to type in the import commands! If you use the standard IDLE environment you will need to download BioPython and place it in the proper directory. Bioinformatics has become so important in recent years that almost every programming environment, C++, Perl, etc has its own Bioinfo libraries.
4
Sequence objects Biological sequences represent the main point of interest in Bioinformatics processing. Python includes a special datatype called a Sequence. Sequence objects are not the same as Python strings. They are really strings together with additional information, such as an alphabet, and a variety of methods such as translate(), reverse_complement() and so on. dna = ‘AGTACACTGGT ‘ this is a pure string // Here is how you create a sequence object. seqdna = Seq(‘AGTACACTGGT ‘, Alphabet()) sequence obj Note that seqdna is a sequence object not just a string.
5
Alphabets - See IUPAC (international union of pure and applied chemistry)
Alphabets are just the set of allowable characters that are used in the string. IUPAC.unambiguous_dna is really just the set {A,C,G, T} of nucleotides. IUPAC.unambiguous_rna is {A,C,G,U} IUPAC.protein is just the 20 standard amino acids {A,R,N,D,C,Q,E,H,I,L,K,M,F,P,S,T,W,Y,V} and others We will use mainly the {A,C,G,T} DNA set. Nice for type checking our sequences.
6
Can work with Sequence objects like strings
from Bio.Seq import Seq from Bio.Alphabet import IUPAC my_seq = Seq("GATCG", IUPAC.unambiguous_dna) print my_seq[0] prints first letter print len(my_seq) print length of string in sequence print Seq(“AAAA”).count(“AA”) non overlapping count ie 2 print GC(my_seq) Gives the GC % of the sequence. print my_seq[2:5] We can even slice them. Returns a Seq. #convert seq obj to a pure string obj dna_string = str(my_seq)
7
Nucleotide sequences and (reverse) complements
>>> from Bio.Seq import Seq >>> from Bio.Alphabet import IUPAC >>> my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC", IUPAC.unambiguous_dna) >>> my_seq Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA()) >>> my_seq.complement() Seq('CTAGCTACCCGGATATATCCTAGCTTTTAGCG', IUPACUnambiguousDNA()) >>> my_seq.reverse_complement() Seq('GCGATTTTCGATCCTATATAGGCCCATCGATC', IUPACUnambiguousDNA())
8
Reversing a Sequence an easy way to just reverse a Seq object (or a Python string) is slice it with -1 step # FORWARD >>> my_seq Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA()) #BACKWARD ( Using a -1 step slice ) >>> my_seq[::-1] Seq('CGCTAAAAGCTAGGATATATCCGGGTAGCTAG', IUPACUnambiguousDNA())
9
Double Stranded DNA 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’
DNA coding strand (aka Crick strand, strand +1) 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’ ||||||||||||||||||||||||||||||||||||||| 3’ TACCGGTAACATTACCCGGCGACTTTCCCACGGGCTATC 5’ DNA template strand (aka Watson strand, strand −1)
10
Transcription 5’ ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG 3’ ||||||||||||||||||||||||||||||||||||||| 3’ TACCGGTAACATTACCCGGCGACTTTCCCACGGGCTATC 5’ Transcription 5’ AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG 3’ Single stranded messenger RNA
11
Lets do some Reverse Comp
from Bio.Seq import Seq from Bio.Alphabet import IUPAC coding_dna = Seq(“ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) template_dna= coding_dna.reverse_complement() print template_dna CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT
12
Transcribe ( T->U ) from Bio.Seq import Seq from Bio.Alphabet import IUPAC coding_dna = Seq(“ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna) messenger_rna = coding_dna.transcribe() print messenger_rna AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG //or you can do both messenger_rna = coding_dna.reverse_complement().transcribe()
13
Translate into protein
from Bio.Seq import Seq from Bio.Alphabet import IUPAC messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCG AUAG", IUPAC.unambiguous_rna) print messenger_rna print messenger_rna.translate() # I added the spaces AUG GCC AUU GUA AUG GGC CGC UGA AAG GGU GCC CGA UAG MAIVMGR*KGAR* # the * represents stop codons.
14
Translation Table
15
Codon - Amino Acids Amino Acid SLC DNA codons Isoleucine I
ATT, ATC, ATA Leucine L CTT, CTC, CTA, CTG, TTA, TTG Valine V GTT, GTC, GTA, GTG Phenylalanine F TTT, TTC Methionine M ATG Cysteine C TGT, TGC Alanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCG Threonine T ACT, ACC, ACA, ACG Serine S TCT, TCC, TCA, TCG, AGT, AGC Tyrosine Y TAT, TAC Tryptophan W TGG Glutamine Q CAA, CAG Asparagine N AAT, AAC Histidine H CAT, CAC Glutamic acid E GAA, GAG Aspartic acid D GAT, GAC Lysine K AAA, AAG Arginine R CGT, CGC, CGA, CGG, AGA, AGG Stop codons Stop TAA, TAG, TGA .
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.