GE3M25: Computer Programming for Biologists Python, Class 5

Slides:



Advertisements
Similar presentations
For loops Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Chapter 6 Lists and Dictionaries CSC1310 Fall 2009.
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Computer Programming for Biologists Class 9 Dec 4 th, 2014 Karsten Hokamp
String and Lists Dr. Benito Mendoza. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list List.
Fundamentals of Python: From First Programs Through Data Structures
An Introduction to Python – Part II Dr. Nancy Warter-Perez.
An Introduction to Python – Part II Dr. Nancy Warter-Perez April 21, 2005.
CSCI/CMPE 4341 Topic: Programming in Python Chapter 6: Lists, Tuples, and Dictionaries – Exercises Xiang Lian The University of Texas – Pan American Edinburg,
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Group practice in problem design and problem solving
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Python programs How can I run a program? Input and output.
Lesson 10: Working with Tables and Forms. Learning Objectives After studying this lesson, you will be able to:  Insert a table in a document  Modify,
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Trinity College Dublin, The University of Dublin A Brief Introduction to Scientific Programming with Python Karsten Hokamp, PhD TCD Bioinformatics Support.
1 CSC 221: Introduction to Programming Fall 2012 Functions & Modules  standard modules: math, random  Python documentation, help  user-defined functions,
Computer Programming for Biologists Class 8 Nov 28 th, 2014 Karsten Hokamp
If statements while loop for loop
9/23/2015BCHB Edwards Advanced Python Data Structures BCHB Lecture 7.
Dictionaries.   Review on for loops – nested for loops  Dictionaries (p.79 Learning Python)  Sys Module for system arguments  Reverse complementing.
9/28/2015BCHB Edwards Basic Python Review BCHB Lecture 8.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
1 CSC 221: Introduction to Programming Fall 2011 Lists  lists as sequences  list operations +, *, len, indexing, slicing, for-in, in  example: dice.
CS105 STRING LIST TUPLE DICTIONARY. Characteristics of Sequence What is sequence data type? It stores several objects Each object has an order Each object.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python Karsten Hokamp, PhD Genetics TCD, 03/11/2015.
Trinity College Dublin, The University of Dublin Data download: bioinf.gen.tcd.ie/GE3M25/project Get.fastq.gz file associated with your student ID
1 CSC 221: Introduction to Programming Fall 2012 Lists  lists as sequences  list operations +, *, len, indexing, slicing, for-in, in  example: dice.
14. DICTIONARIES AND SETS Rocky K. C. Chang 17 November 2014 (Based on from Charles Dierbach, Introduction to Computer Science Using Python and Punch and.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 4 Karsten Hokamp, PhD Genetics TCD, 01/12/2015.
Trinity College Dublin, The University of Dublin GE3M25: Computer Programming for Biologists Python, Class 2 Karsten Hokamp, PhD Genetics TCD, 17/11/2015.
Python Lesson 1 1. Starter Create the following Excel spreadsheet and complete the calculations using formulae: 2 Add A1 and B1 A2 minus B2 A3 times B3.
Step 3: Tools Database Searching
Computer Programming for Biologists Class 4 Nov 14 th, 2014 Karsten Hokamp
Copyright OpenHelix. No use or reproduction without express written consent1.
HW4: sites that look like transcription start sites Nucleotide histogram Background frequency Count matrix for translation start sites (-10 to 10) Frequency.
9/11/2015BCHB Edwards Introduction to Python BCHB Lecture 3.
Python’s Standard Library Part II Dennis Tran. Output Formatting The repr module provides a version of repr() customized for abbreviated displays of large.
Embedded Software Design Week V Python Lists and Dictionaries PWM LED 1-Wire Temperature Sensor.
PERL SCRIPTING. COMPUTER BASICS CPU, RAM, Hard drive CPU can only use data in the register directly CPU RAM HARD DRIVE.
String and Lists Dr. José M. Reyes Álamo. 2 Outline What is a string String operations Traversing strings String slices What is a list Traversing a list.
Lists/Dictionaries. What we are covering Data structure basics Lists Dictionaries Json.
DAY 3. ADVANCED PYTHON PRACTICE SANGREA SHIM TAEYOUNG LEE.
DAY 2. GETTING FAMILIAR WITH NGS SANGREA SHIM. INDEX  Day 2  Get familiar with NGS  Understanding of NGS raw read file  Quality issue  Alignment/Mapping.
Introduction to Python
Advanced Python Idioms
Introduction to Python
Multiple Sequence Alignment
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
Algorithmic complexity: Speed of algorithms
Generating Random Numbers
Containers and Lists CIS 40 – Introduction to Programming in Python
Introduction to Python
GE3M25: Data Analysis, Class 4
Introduction to Python
Bryan Burlingame Halloween 2018
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
4. sequence data type Rocky K. C. Chang 16 September 2018
Algorithmic complexity: Speed of algorithms
6. Dictionaries and sets Rocky K. C. Chang 18 October 2018
Advanced Python Idioms
Basic Python Review BCHB524 Lecture 8 BCHB524 - Edwards.
Algorithmic complexity: Speed of algorithms
Advanced Python Idioms
Bryan Burlingame Halloween 2018
Introduction to Computer Science
Presentation transcript:

GE3M25: Computer Programming for Biologists Python, Class 5 TCD, 08/12/2015 Karsten Hokamp, PhD Genetics

Overview http://bioinf.gen.tcd.ie/GE3M25/ Recap Modules Dictionaries Working from the command line Weekly task http://bioinf.gen.tcd.ie/GE3M25/

Recap Collections: list(), tuple(), set() Special methods: 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort' Special functions: all(), any(), len(), max(), min(), sorted(), sum(), zip()  Find out more through help() function

Exercise: Create a variable 'seq' containing a DNA string Create a list 'dna1' from the DNA string Create a tuple 'dna2' from the DNA string Create a set 'dna3' from the DNA string Compare structure and content of the collections Try to access the first element of each collection Try to modify the first element of each collection Add an element to each of your collection Try to remove the last element from each of your collection

Weekly task: Read in a DNA sequence in FASTA format from a file Prompt the user for a short motif Split the sequence at the sites that match Print the fragment lengths in sorted order Do not report fragments of zero length

Python modules Software packages that add functionality Part of distribution (random, math, string, ...) External packages: wiki.python.org/moin/UsefulModules

Python modules Load module: import module_name Use module: module_name.variable module_name.method() Documentation: help(module_name)

Python modules Examples: import random random.random()  0.231185 random.randint(1,10)  3 random.choice('ACGT')  'G'

Python modules Exercises: Create a random number Create a random integer between 50 and 100 Get a random letter from the word 'mississippi' Check out the help for module 'string' Print all small letters, one per line Sort the ascii_letters string, which letter is first? Check out the help for module 'math' Calculate the log2 value of 0.5 Print the value of pi

Python modules Exercise: Revisit the script 'gene_list.py' from last lesson Change it to read a file name from the command line (instead of hard-coding it into the script) Tip: Use module 'string' , object 'argv' Run your script from the command line: python3 gene_list.py ~/Downloads/gene_list.txt

Exercise: Read in a file with probe ids, gene ids, fold-change and p-values, separated by tab Print out only gene ids and fold-change Print out gene ids and fold-change as log2 values 3. Print all the lines with absolute fold-change > 2 and p-value <= 0.05 Print values to a file instead of the screen

DNA  Protein translation Process a DNA string three nucleotides at a time Translate that codon Print the amino acid

DNA  Protein translation Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' codon = dna[0:3] print(codon) 3 6 15 …

DNA  Protein translation Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' i = 0 codon = dna[i:i+3] print(codon) 3 6 15 …

DNA  Protein translation Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, 16, 3) : codon = dna[i:i+3] print(codon) 3 6 15 …

DNA  Protein translation Process a DNA string three nucleotides at a time dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] print(codon) 3 6 15 …

DNA  Protein translation 2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N') …

DNA  Protein translation 2. Translate the codon dna = 'ATGCCAGGTTTACACGGT' for i in range(0, len(dna)-2, 3) : codon = dna[i:i+3] if codon == 'AAA' : print('K') elif codon == 'AAC' : print('N') … We need a look-up table!

Dictionary Collection of key-value pairs Symbols: {} and [] Initialisation: table = {} table = dict() Storing values: table = { 'AAA' : 'K', 'AAG' : 'K' } table['AAC'] = 'N' key value

Dictionary Accessing keys and values aa = table['AAC'] aa = table[codon] codons = table.keys() amino_acids = set(table.values()) for codon in table.keys() : print("translate %s into %s" %

Dictionary Exercise: Generate one million random integers from 1 to 10 Use a dictionary (occ) to count how often each integer occurs Calculate and print the frequency of each integer Tips: check if a key exists: if key in occ.keys() increase value to an existing key: occ[key] += 1

Dictionary Look-up table for codons:

Dictionary Generate table on the fly:

Dictionary Exercise: Read a DNA sequence from a file and translate it into a protein sequence Make it work for upper and lower case

Weekly task 5 Option a: 100 HOXA protein sequences have been assembled from UniProt First align the sequences with the tool of your choice from the EBI website and then load the tree file into TreeDraw. Use the controls to generate a tree that is best suited to indicate the clustering of sequences and relationships between genes from different species. Submit an image of your tree together with a short description of how you generated the alignment and the tree and a discussion of the presented relationships. Possible points of discussion: Can you think of a suitable sequence to use for rooting the tree? Can you detect any inconsistencies/surprises in the tree in respect to known/expected evolutionary relation of species?

Weekly task 5 Option b: Write a Python script that does the following: Read in a DNA sequence from a file in Fasta format Translate the DNA into a protein sequence and print to the screen Repeat mutating one nucleotide at a time and stop if a) the start codon is changed b) a stop codon is introduced before the end of the sequence Report for each mutation where it occurs and what substitution is made

Weekly task 5 To be submitted by e-mail to kahokamp@tcd.ie before Thursday, 17th December, 5 pm