Sequence based searches:

Slides:



Advertisements
Similar presentations
Homology Based Analysis of the Human/Mouse lncRNome
Advertisements

Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Mutiple Motifs Charles Yan Spring Mutiple Motifs.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
The Protein Data Bank (PDB)
Protein Modules An Introduction to Bioinformatics.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Comprehensive Microbial Resource Bioinformatics Visualization Workshop Owen White May 30, 2002.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Protein Bioinformatics Course
Advancing Science with DNA Sequence Data Curation in IMG-ER Natalia Ivanova MGM Workshop May 16, 2012.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Adding GO for Large Datasets COST Functional Modeling Workshop April, Helsinki.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
Functional Annotation of Proteins via the CAFA Challenge Lee Tien Duncan Renfrow-Symon Shilpa Nadimpalli Mengfei Cao COMP150PBT | Fall 2010.
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Gene Product Annotation using the GO ml Harold J Drabkin Senior Scientific Curator The Jackson Laboratory.
Genome Annotation Rosana O. Babu.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Protein and RNA Families
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Motif discovery and Protein Databases Tutorial 5.
Analysis and comparison of very large metagenomes with fast clustering and functional annotation Weizhong Li, BMC Bioinformatics 2009 Present by Chuan-Yih.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.
S. pombe Unicellular archiascomycete Diverged from S. cerevisiae Ma Size ~14 Mb, 3 chromosomes No synteny Data stored in GeneDB.
Copyright OpenHelix. No use or reproduction without express written consent1.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Protein families, domains and motifs in functional prediction May 31, 2016.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Protein families, domains and motifs in functional prediction
Bio/Chem-informatics
Protein Families, Motifs & Domains.
Functional manual annotation including GO
Demo: Protein Information Resource
Basics of Comparative Genomics
Department of Genetics • Stanford University School of Medicine
Functional Annotation Final Results
Modified from slides from Jim Hu and Suzi Aleksander Spring 2016
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
Bioinformatics and BLAST
Protein Bioinformatics Course
Introduction to Bioinformatics II
Strategies for annotation of a genome
Protein structure prediction.
Basics of Comparative Genomics
Basic Local Alignment Search Tool
TF candidate selection pipeline.
Presentation transcript:

Sequence based searches: Genome sequence find coding genes Predicted protein coding genes translation RNA finding (tRNAscan, RFAM, homology searches) Collect any literature for the gene product Sequence based searches: Blast-type pairwise alignments; HMM searches (Pfam, TIGRFAM, etc.); InterPro; TMHMM; SignalP; TargetP; COGs; Paralogous families; and more….. predicted RNA genes Evaluate evidence presented in paper Evaluation of evidence pairwise alignments: Visually inspect alignments, look for conserved active sites, look for (generally) at least 35% identity across the full lengths of both proteins. If matches are not full length, look to see if there are recognized functional domains in the area where the match occurs. Decide how much information can be transferred from the match protein to the query. In order to assert that the query has the exact same function as the match protein, the match protein must be experimentally characterized. If any doubt about specificity of the function exists, back up to a more general level of annotation. family/domain based evidence: Review search results (InterPro, HMM). Look to see specificity of the family in question. Can a specific function be assigned based on membership in the family?, or is the family broad in functional scope? If so, can a general function such as “kinase” or “oxidoreductase” be given. If not, can a name be given based on family membership even if function is unknown? motif predictors: Look to see what the presence of membrane spans, signal peptides, etc. is telling you about the protein in light of other information coming from other search results - is it all consistent, does it add up to a particular cellular location or function? If all you have is a motif, perhaps you can still make some annotations (eg. “integral membrane protein” based on for example multiple TMHMM regions. Get Candidate GO terms -from match proteins -from matching families/ domains/motifs -from EC number mapping, InterPro2GO, other mappings, etc. Search for GO terms if no candidates present themselves -GO search/browse tool AmiGO -many other tools (eg. Manatee, QuickGO, etc.) Evaluate GO terms: Check that the quality of evidence supports candidate GO terms at a particular level of specificity. Read the literature relevant to the experimental characterization of any match proteins used as evidence. Check that any GO terms that may be assigned to the match protein are correct. Check GO trees and definitions to make sure the term makes sense for your organism. Generally it is safer to make function GO annotations than process ones based on sequence similarity t single proteins. See IGC chart for more on process annotatoins based on sequence.