Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Gene regulation /function card Anatomical network card Tassy et al., Figure S1: Navigation diagram of ANISEED Anatomical structure card Expression card.
Transcriptional regulation and promoter analysis
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Confessions/Disclaimers Ontologies and REDfly CARO SO OBO Foundry.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
Annotating Molecular Interactions in MINT
[Bejerano Spr06/07] 1 TTh 11:00-12:15 in Clark S361 Profs: Serafim Batzoglou, Gill Bejerano TAs: George Asimenos, Cory McLean.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
[Bejerano Aut07/08] 1 MW 11:00-12:15 in Redwood G19 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
CisGreedy Motif Finder for Cistematic Sarah Aerni Mentors: Ali Mortazavi Barbara Wold.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
A Computational Analysis of the H Region of Mouse Olfactory Receptor Locus 28 Deanna Mendez SoCalBSI August 2004.
Promoter structure and gene regulation. Bacterial Promoters Source:
Cis-Regulatory/ Text Mining Interface Discussion.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
ORegAnno: Open Regulatory Annotation ( An open access database and curation system for regulatory sequences Griffith OL 1,2, Montgomery.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
An Introduction to ENCODE Mark Reimers, VIPBG (borrowing heavily from John Stamatoyannopoulos and the ENCODE papers)
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
PreDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Department.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Identification of Compositionally Similar Cis-element Clusters in Coordinately Regulated Genes Anil G Jegga, Ashima Gupta, Andrew T Pinski, James W Carman,
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
CACAO Training Fall Community Assessment of Community Annotation with Ontologies (CACAO)
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Gene Regulatory Networks and Neurodegenerative Diseases Anne Chiaramello, Ph.D Associate Professor George Washington University Medical Center Department.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
An example of GO annotation from a primary paper Rebecca E. Foulger (UniProt Curator) GO Annotation Camp, June 2005 PMID:
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
2006 ICAR: TAIR workshop Organizers: Katica Ilic and Peifen Zhang Location: Reception Room, 4th floor A general overview of TAIR website and demonstration.
Finding genes in the genome
Accessing and visualizing genomics data
Welcome to the combined BLAST and Genome Browser Tutorial.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
The Transcriptional Landscape of the Mammalian Genome
University of Pittsburgh
Functional Annotation of the Horse Genome
Genome Annotation Continued
A Zero-Knowledge Based Introduction to Biology
Strategies for annotation of a genome
A User’s Guide to GO: Structural and Functional Annotation
Ensembl Genome Repository.
Identify D. melanogaster ortholog
Schematic representation of proteogenomic annotation strategy.
Novel p53 target genes identified by RNA-Seq, pSILAC and ChIP-Seq.
Problems from last section
Bernard Mulvey, Joseph D. Dougherty  Cell 
The Bov-A2 element is conserved in the NOS2 gene of bovid species.
Presentation transcript:

Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium

Goal or purpose of annotation standards 1.Positive and negative control datasets Develop motif detection algorithms 2.Training datasets for text-mining tools Automate annotation 3.Resource of known regulatory sequences

Minimal information for an ORegAnno record Publication identifier Regulatory sequence type Species Target Gene Transcription factor Sequence and flanking sequence Experimental evidence Outcome User information

Publication identifier and Species Publication identifier PMID Must be entered into queue and checked out prior to annotation and closed after annotation Ensures traceability, prevents redundancy. Species NCBI Taxonomy id

Regulatory sequence type Transcription factor binding site (TFBS) Regulatory region Regulatory polymorphism Regulatory haplotype

Regulatory elements for CYP3A4 in ORegAnno Matsumura et al Identification of a novel polymorphic enhancer of the human CYP3A4 gene. Mol Pharmacol. 65(2): Regulatory Region (Enhancer) TFBS (3) TFBS (4) Regulatory Polymorphism (TGT insertion in USF1 site)

Target gene and transcription factor (TF) Each record can be linked to one gene and one TF Entrez gene id, Ensembl gene id, or user-defined Tjian, R. (1995) “Molecular Machines That Control Genes”; Scientific American, Feb 1995, p. 38.

Sequence and flanking sequence Bound sequence in upper case, flanking sequence in lower case Minimum 40bp total flanking sequence (recommended: ~100bp) Use flanking sequence from current genome, not paper Verify final sequence for unambiguous mapping GTGACC actctgaagtggtctttgtccttgaacataggatacaaGTGACCcctgctctgttaattattggcaaattgcctaacttcaac

Unambiguous hit target genome Perfect alignment Expected position relative to gene

Experimental evidence Evidence Class Regulator (protein) or Regulator Site (sequence) Transcription, Transcript stability, Translation Evidence type and subtype Type: Reporter Gene Assay Subtype: Transient transfection luciferase assay Cell type eVOC cell type ontology Evidence comment

Experimental evidence (cont’d) Multiple evidence lines Minimum: one line of evidence In silico alone not sufficient For regulatory polymorphisms: Association study alone not sufficient Avoid use of “literature-derived” evidence type

Outcome and User info Outcome Positive, neutral or negative Was the sequence proven functional? Yes, no, or uncertain Sequences can only be considered negative or positive for the conditions under which they were tested User information Encourages ownership and accountability Associated with every record, comment, and validation a user creates Three roles: ‘User’, ‘Validator’, ‘Administrator’

Optional information for an ORegAnno record Dataset id Locus name Sequence search space For regulatory polymorphism records: Variant sequence and identifier Type of variant Meta-data Comment

Discussion items Discussion item: Should a record reference only one publication? What should be done in cases where several papers describe experimental validation of the same regulatory sequence? Discussion item: Should further sub-categorization of regulatory regions be allowed (e.g. Silencer, enhancer, locus-control region, etc) Discussion item: In a case where sequence conservation is perfect between the species of interest and model organism (e.g. both mouse and human have identical regulatory sequence upstream of an orthologous gene) could an assay in one be considered evidence for function of the sequence in both species? Discussion item: Should multiple TFs be allowed for a single record? Or should this form a second record? Discussion item: Should TF complexes be allowed? Discussion item: What is the minimal evidence we should allow? Discussion item: Are there other evidence classes that should be included? Discussion item: Should ORegAnno migrate to a more complex or formal ontology system for evidence.

Acknowledgements Supervisor: Steven Jones Oreganno developers and co-authors: Stephen Montgomery Monica Sleumer Casey Bergman Misha Bilenky Erin Pleasance Coop students: Yuliya Prychyna Maggie Zhang Bryan Chu Regcreative organizers and participants