Download presentation
Presentation is loading. Please wait.
Published byVivian Gilbert Modified over 8 years ago
1
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens
2
Today’s Objectives Prokaryotes vs. Eukaryotes Gene Prediction Structure Analysis File Formats Genbank Swiss-Prot PDB Pfam
3
Prokaryotes vs. Eukaryotes 10-100 µm0.2-2.0 µmSize Many (Lysosome,Golgi, ER, etc.) NoneMembrane Enclosed Organelles Splice out introns, keep exons Nearly NonePost-transcriptional Modification Multiple Linear chromosomes Single circular chromosome Chromosomal Arrangement NucleusCytoplasmLocation of Genetic Material DNA Genetic Material Eukaryote (plant, animal) Prokaryote (bacteria)Property
4
Gene Prediction
5
Similarity Search: Example: BLAST (GeneWise/PROCRUSTES) Strengths: Easy to implement Fast Weaknesses: No new genes Alternative splicing? Ab initio No similarity search Utilize models (HMM, NN) and dynamic programming
6
Evaluate Methods Sequence at: http://n8444l.com/cmp807/classLinks.html Click ‘Mystery Sequence’ Augustus http://augustus.gobics.de/submission Grail http://compbio.ornl.gov/grailexp/ GenScan http://genes.mit.edu/GENSCAN.html FGENESH http://www.softberry.com
7
Process Run mystery.fas sequence through gene finder Take the best ‘looking’ protein prediction and make a FASTA file out of it Call it.fas Run predicted protein through BLASTP Get best BLASTP hit in FASTA Call it.fas
8
Multiple Sequence Alignment ClustalW all vs. all pairwise alignment Phylogenetic tree – Neighbor-joining Align sequences sequentially Based on tree
9
Gene Finding/Multiple Sequence Alignment Questions
10
Structure Analysis
11
Structure Prediction Homology Modeling MODELLER
12
Structure Prediction Ab initio Folding at Home
13
Protein Data Bank pdb.org BLAST the pdb!
14
Structure Analysis Questions?
15
File Formats
16
GenBank Lines begin with field definitions LOCUS – Short Mnemonic, sequence length, molecule type, GenBank Division Modification Date Must be Unique ACCESSION – Unique unchanging code for each sequence DEFINITION – Short description of sequence SOURCE – Organism common name ORGANISM – Scientific name REFERENCE – Citations MEDLINE/PUBMED – ID for journal database FEATURES – Regions of biological significance http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=protein&val=6 552315
17
Swiss-Prot (TrEMBL) 2 Letter Codes for Field Definitions http://www.expasy.org/cgi-bin/get-sprot- entry?Q92560
18
PDB Structure Like a GenBank file Includes x,y,z for each atom
19
File Formats Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.