Download presentation
Presentation is loading. Please wait.
1
Genome analysis and annotation Part II
2
THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments Genewise alignments(predictions) nr Protein Alignments Caenorhabditis sp. Protein Alignments Brugia malayi Protein Alignments Modeling a gene
3
Sequence Database Hits Top: Protein matches Bottom: EST matches Gene Predictions Annotated Gene Top: editing panel Bottom: final curation Splice site predictions: red: acceptor sites blue: donor sites Not shown graphically: gene name, nucleotide and protein sequence, MW, pI, organellar targeting sequence, membrane spanning regions, other domains. Screenshot of a component within Neomorphic’s annotation station: www.neomorphic.com Attributes of individual annotated genes
4
Assigning function to predicted gene products
5
E.coli H. influenzae M. genitalium E.coli H. influenzae M. genitalium H. influenzae Assigning function to predicted gene products The primary tool for assigning function is homology to well characterized proteins …however transitive annotation can lead to errors that propagate.
6
The modular nature of proteins can provide the basis for functional annotation Proteins may share features that give clues to their structure and/or function A domain is a region of a protein that can adopt a particular three- dimensional structure. Together a group of proteins that share a domain is called a family. There are several databases of protein families such as Pfam (http://www.sanger.ac.uk/Software/Pfam/)http://www.sanger.ac.uk/Software/Pfam/ Motifs are short, conserved regions of proteins, typically consisting of a pattern of amino acids that characterizes a prrotein family (http://www.expasy.org/prosite/)http://www.expasy.org/prosite/ EF-hand: D-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)- [DE]-[LIVMFYW] 3)HMM domains can also be defined and used to group proteins into families
7
Top 20 PFAM domains in A. fumigatus Counts in A. nidulans and A. oryzae PF00400WD domain, G-beta repeat532598541 PF00023Ankyrin repeat368633430 PF00083major facilitator superfamily protein166281219 PF00172Fungal Zn(2)-Cys(6) binuclear cluster domain146179211 PF00515TPR Domain139142152 PF00096Zinc finger, C2H2 type124113142 PF04082Fungal specific transcription factor domain110163159 PF00153Mitochondrial carrier protein106114100 PF00069Protein kinase domain105 101 PF00005ABC transporter9312986 PF00076RNA recognition motif. (a.k.a. RRM, RBD, or RNP domain)93 99 PF00106oxidoreductase, short chain dehydrogenase/reductase family92135129 PF00271Helicase conserved C-terminal domain738069 PF00067Cytochrome P45063134102 PF00107oxidoreductase, zinc-binding dehydrogenase family6110780 PF00501AMP-binding enzyme617783 PF00560Leucine Rich Repeat475054 PF00550Phosphopantetheine attachment site465460 PF00036EF hand105250 AfuAnaAoa Protein domain frequencies can yield insights into the biology of an organism
8
Domain based Paralogous Families can be genrated Domain Content of Entire Proteome can be computed All the proteins from a genome HMM search against Pfam profiles Alignment search against homology-based domain alignments The search results are stored in the database in the form of domain-based alignments Organize the proteins into domain-based paralogous families Related families share one or more domains with other families Many putative novel domains are extensions of existing domains
9
ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC Hidden Markov Models (HMMs) Seed: Model: Statistical representations of sequence patterns. A query sequence is scored by how likely is it that the HMM would produce it.
10
Procedure for Preparing a HMM Seed Inspect and edit a pairwise aligned group of gene products: Inspect and edit a pairwise aligned group of gene products: - Eliminate fragments - Correct the alignment - Remove sequence outside domain - Eliminate redundancy - BLAST, annotate and possibly expand the seed.
11
THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Homology-Based Alignment: HMM Seed: Trusted Hits:
12
What is Gene Ontology (GO)? The Gene Ontology is a set of dynamic controlled vocabularies used to describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner (www.geneontology.org) The Three Ontologies Molecular function, biological process and cellular component are considered attributes of gene products. Molecular function, biological process and cellular component are considered attributes of gene products. Biological Process (a) A biological objective has more than one distinct step Molecular Function (b) what the gene product does Think ‘activity’ Cellular Component (c) location in the cell (or smaller unit) or part of a complex
13
Assigning GO IDs Each GO ID is qualified with an evidence code. Evidence codes are: IMP – inferred from mutant phenotype IGI—inferred from genetic interaction IPI—inferred from physical interaction IDA—inferred from direct assay IEP—inferred from expression pattern ISS—inferred from structural similarity IEA—inferred from electronic annotation IC—inferred by curator TAS—traceable author statement NAS—non-traceable author statement ND—no biological data available NR—no longer used Experimental evidence Sequence similarity Calculated by algorithm Author statement The “with/to” field ISS, IPI, IGI require the accession of the similarity hit, the interacting entity
14
Gene ontologies can help interpret large scale datasets K-means clustering using TIGR Multi-Experiment Viewer (TMEV)
15
Cluster 4 Cluster 10 methanogenesis Translation, transcription
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.