Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.

Similar presentations


Presentation on theme: "Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens."— Presentation transcript:

1 Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens

2 Today’s Objectives Prokaryotes vs. Eukaryotes Gene Prediction Structure Analysis File Formats Genbank Swiss-Prot PDB Pfam

3 Prokaryotes vs. Eukaryotes 10-100 µm0.2-2.0 µmSize Many (Lysosome,Golgi, ER, etc.) NoneMembrane Enclosed Organelles Splice out introns, keep exons Nearly NonePost-transcriptional Modification Multiple Linear chromosomes Single circular chromosome Chromosomal Arrangement NucleusCytoplasmLocation of Genetic Material DNA Genetic Material Eukaryote (plant, animal) Prokaryote (bacteria)Property

4 Gene Prediction

5 Similarity Search: Example: BLAST (GeneWise/PROCRUSTES) Strengths: Easy to implement Fast Weaknesses: No new genes Alternative splicing? Ab initio No similarity search Utilize models (HMM, NN) and dynamic programming

6 Evaluate Methods Sequence at: http://n8444l.com/cmp807/classLinks.html Click ‘Mystery Sequence’ Augustus http://augustus.gobics.de/submission Grail http://compbio.ornl.gov/grailexp/ GenScan http://genes.mit.edu/GENSCAN.html FGENESH http://www.softberry.com

7 Process Run mystery.fas sequence through gene finder Take the best ‘looking’ protein prediction and make a FASTA file out of it Call it.fas Run predicted protein through BLASTP Get best BLASTP hit in FASTA Call it.fas

8 Multiple Sequence Alignment ClustalW all vs. all pairwise alignment Phylogenetic tree – Neighbor-joining Align sequences sequentially Based on tree

9 Gene Finding/Multiple Sequence Alignment Questions

10 Structure Analysis

11 Structure Prediction Homology Modeling MODELLER

12 Structure Prediction Ab initio Folding at Home

13 Protein Data Bank pdb.org BLAST the pdb!

14 Structure Analysis Questions?

15 File Formats

16 GenBank Lines begin with field definitions LOCUS – Short Mnemonic, sequence length, molecule type, GenBank Division Modification Date Must be Unique ACCESSION – Unique unchanging code for each sequence DEFINITION – Short description of sequence SOURCE – Organism common name ORGANISM – Scientific name REFERENCE – Citations MEDLINE/PUBMED – ID for journal database FEATURES – Regions of biological significance http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=6 552315

17 Swiss-Prot (TrEMBL) 2 Letter Codes for Field Definitions http://www.expasy.org/cgi-bin/get-sprot- entry?Q92560

18 PDB Structure Like a GenBank file Includes x,y,z for each atom

19 File Formats Questions?


Download ppt "Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens."

Similar presentations


Ads by Google