Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.

Slides:



Advertisements
Similar presentations
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Advertisements

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Asteraceae (Compositae) Genome Resources at NCBI GenBank.
SAP Business Warehouse
Creating Web Page Forms
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Python programs How can I run a program? Input and output.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Computer Programming for Biologists Class 7 Nov 27 th, 2014 Karsten Hokamp
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
MCB 5472 Assignment #5: RBH Orthologs and PSI-BLAST February 19, 2014.
Inferring an Origin of Replication With Computational Methods Brian Smith.
MCB 5472 Assignment #6: HMMER and using perl to perform repetitive tasks February 26, 2014.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Copyright OpenHelix. No use or reproduction without express written consent1.
Using the feature color file (.fc) and fasta file to create figures (ATOH7 example) Using the feature color file (.fc) and fasta file to create figures.
Resetting Student PreTests. Within the MyNursingLab Study Plans, pretests can be taken only one time by the student.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Working on exercises (a few notes first). Comments Sometimes you want to make a comment in the Python code, to remind you what’s going on. Python ignores.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Gnuplot From command line, type gnuplot G N U P L O T Linux version 3.7 patchlevel 1 last modified Fri Oct 22 18:00:00 BST 1999 Copyright(C) ,
Assignment feedback Everyone is doing very well!
Why? – Examples Speaking Computer-ise – How – What – Environment (windows) Basic Instructions – Declare – Conditional – Loop – Input Write a quiz game.
WRITING REPORTS Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall 2015.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Sackler Medical School
Chapter 3 MATLAB Fundamentals Introduction to MATLAB Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Copyright OpenHelix. No use or reproduction without express written consent1.
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
HTML Structure II (Form) WEEK 2.2. Contents Table Form.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Gene plot Frankia ACN vs Frankia CCI3
Regulatory Genomics Lab
Introduction to MATLAB
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
How to use a bioinformatics website!
Example of a common SNP in dogs
Cookies BIS1523 – Lecture 23.
Plot blast.out and blast.out.top
Aeromonas_hydrophila_ML09_119_uid versus Aeromonas_salmonicida A449
Genome Center of Wisconsin, UW-Madison
Motif p-values GENOME 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble.
 .
BLAST.
Explore Evolution: Instrument for Analysis
Histogram of percent identity for significant and insignificant hits
Introduction to MATLAB
Amos Introduction In this tutorial, you will be briefly introduced to the student version of the SEM software known as Amos. You should download the current.
Regulatory Genomics Lab
Plot blast.out and blast.out.top
Regulatory Genomics Lab
How to search NCBI.
Plot blast.out and blast.out.top
Presentation transcript:

Computer lab exercises #8

Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations. From a protein sequence entry in NCBI select “BLINK” under related information. (You might need to scroll down, in case the upper tables are expanded). BLINK provides a GUI interface to pre-computed BLASTP searches.

...

If one is looking for homologs in other phyla or domains,

Comments on projects worth sharing: 2.The Taxonomy Browser at NCBI can facilitate finding genome and EST projects. In entrez select taxonomy as databank

Search for an organisms you are interested in,

Select the group of organisms you are interested in

Check genomes and ESTs and whatever else you like, THEN click display

If you move up or down the taxonomic hierarchy, the checked items will be displayed for each Taxon or Retrieves all ESTs from Genbank

You can select format or send to file to get a multiple sequence fasta file with all sequences

Short in class exercise Write and apply a script to calculate cumulative strand bias for a genome (fna file).

In DNA replication the two strands are not created equal. The differences between leading and lagging strand are reflected in the number or ORFs encoded on the strands and the presence of motifs that bind factors initiating and halting replication, and the composition with respect to nucleotides, and n-mers of nucleotides. See Drew Berry’s Ted talk at for illustration (start at 3 min – 4.50 min, if not much time)

One way to look at strand bias, is to calculate the GC content in a rolling window. Window=1000, printed every 100

Window=10000, printed every 100

Usually one plots the Cumulative Strand Bias to more clearly see the turning points

Usually, *.fna files of bacterial genomes start with the origin of replication, and the direction is chosen so that the first encoded protein is DnaA (chromosomal replication initiator protein). Sometimes things go wrong. Mummer Plot: HB27 versus SG0 Ori Should be here Cumulative Strand Bias HB27

The same can be done with oligonucleotide bias (how often does an oligonucleotide occur on one strand minus occurrence on the other strand) Tetramer bias for Thermus thermophilus SG0

Download a bacterial or archaeal genome of your choice from ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ We are only interested in the sequence of the main chromosome Write a script that 1.Reads in a genome 2.Assigns the nucleotides of the genome to an array 3.Goes through array and sequentially counts the numbers of Gs Cs As and Ts 4.Every 1000 (or 5000) nucleotides calculate the excess of Gs over Cs, As over Ts, and the excess of keto bases (G+T-A-C). Feel free to explore other biases. 5.Print the results into a table 6.Plot the columns of the table in Excel or gnuplot

Possibility for 1.Read in a genome The easiest will be to plagiarize a script you already wrote. The following works, if the script and the *.fna file are in the same directory. 1a: open input and output file, reset stuff

Possibility for 1b: read genome into array You have 2 possibilities either read and analyze the genome line by line, or read in everything and then start the analysis. The $_ is the variable the perl goes through in the foreach loop, see P13

Possibility for 1b’: read genome into array You have 2 possibilities either read and analyze the genome line by line, or read in everything and then start the analysis.....

Write a script that 1.Reads in a genome 2.Assigns the nucleotides of the genome to an array 3.Goes through array and sequentially counts the numbers of Gs Cs As and Ts 4.Every 1000 nucleotides calculates the excess of Gs over Cs, As over Ts, and the excess of keto bases (G+T-A-C). Feel free to explore other biases. 1.Print the results into a table Close loop(s), close files 6. Plot the columns of the table in Excel or gnuplot

Links to info on gnuplot is herehere If gnuplot is installed on you computer and able to communicate with your x11 terminal program, then > gnuplot Will invoke the gnuplot program > plot "my_table" using 1:2 with lines will plot the 2 nd columns against the 1st >set terminal x11 Will set the output to x11 (screen) >set terminal png Will set the output to a png file >set terminal postscript sets the output to a postscript file > set out "myplot.png” directs output to the file myplot.png > set multiplot Plots multiple curves into the same figure > plot "my_table" using 1:4 with lines,\ > "my_table" using 1:2 with lines,\ > "my_table" using 1:3 with lines plots multiple curves onto the same figure. gnuplot is installed on the cluster, but you need to direct the output to a file, which is inconvenient (extra credit, if you can make the cluster output to an x11-window on your laptop ) Often one uses a perl script to do the plotting example herehere