D A S for ENCODE data coordination Felix Kokocinski, WTSI.

Slides:



Advertisements
Similar presentations
IGR-ANNOT: A Multiagent System for InterGenic Regions Annotation Sandro Camargo, João Valiati, Luis Otávio Álvares, Paulo Engel, Sergio Ceroni.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March.
Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
The Consensus CoDing Sequence (CCDS) Database
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Ensembl Developers Meeting September 2008 Xosé Mª Fernández European Bioinformatics Institute.
Gene prediction in ENCODE roderic guigó i serra crg-imim-upf, barcelona Advanced Bioinformatics, chsl, october 2005.
April 2006 March 2007 Xosé Mª Fernández European Bioinformatics Institute Browsing Genomes with Ensembl.
Structural Biology and Biocomputing Programme 1 Osvaldo Graña, CNIO Distributed Annotation System (DAS) part I Osvaldo Graña VIII.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
NGS Analysis Using Galaxy
1 of 34 Ensembl use of RNASeq Steve Searle. 2 of 34 Ways we use RNASeq data in Ensembl: Build complete gene set from scratch for individual or pooled.
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.
Arabidopsis Genome Annotation TAIR7 Release. Arabidopsis Genome Annotation  Overview of releases  Current release (TAIR7)  Where to find TAIR7 release.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
DNA PACKAGING. 8 histones make up the nucleosome core DNA wraps twice around the 8 histones Histone 1 helps maintain the nucleosome DNA is negatively.
COSMIC GBrowse Visualising cancer mutations in genomic context Dave Beare Cancer Genome Project Wellcome Trust Sanger Institute, Hinxton,
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
ParSNP Hash Pipeline to parse SNP data and output summary statistics across sliding windows.
Sackler Medical School
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
The Havana-Gencode annotation GENCODE CONSORTIUM.
DAS Current Situation and Future Developments Jonathan Warren DAS coordinator for the Sanger Institute
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Web Apollo Resources at the National Agricultural Library Christopher Childers NAL ARS USDA i5k.nal.usda.gov.
Analyzing digital gene expression data in Galaxy Supervisors: Peter-Bram A.C. ’t Hoen Kostas Karasavvas Students: Ilya Kurochkin Ivan Rusinov.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Maize Genome Project Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego, CA Shiran Pasternak January 13, 2006 Gramene SAB Meeting San Diego,
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Copyright OpenHelix. No use or reproduction without express written consent1.
Jalview Visualising DAS annotation on Multiple Sequence Alignments 26 th February 2007 Andrew Waterhouse
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
IMDB: A Generic Insertional Mutagenesis Database Xiaokang Pan and Lincoln Stein Cold Spring Harbor Laboratory.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
Sequence Curation Paul Davis Sanger Institute. Overview Sequence curation within WormBase consortium. Import of sequence data. Prediction stats. Work.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Welcome to the combined BLAST and Genome Browser Tutorial.
AceView Danielle and Jean Thierry-Mieg NCBI = global annotation of the whole human genome ● Restricted to the Gencode Regions ●
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Data Loading into Ensembl Database TGAC Browser
GENCODE: a rich dataset of all gene features in the human genome The GENCODE consortium aims to identify all gene features in the human genome, using a.
Annotating The data.
VectorBase genome annotation
The Ensembl Database Steven Jones August 18, 2004
Using RNA-seq data to improve gene annotation
Experimental Verification Department of Genetic Medicine
ENCODE Pseudogenes and Transcription
Eukaryotic Gene Finding
Access to Sequence Data and Related Information
Strategies for annotation of a genome
Ensembl Genome Repository.
closing in on the set of human genes. The ENCODE project.
Presentation transcript:

D A S for ENCODE data coordination Felix Kokocinski, WTSI

Project Overview Annotate all evidence-based gene features at a high accuracy across the human genome –protein-coding loci with isoforms –nc loci with transcript evidence –pseudogenes Goal: – HAVANA & EnsEMBL, Sanger Institute, UK – University of Lausanne, CH – Centre for Genomic Regulation, ES – Spanish Nat. Cancer Res. Centre, ES – University of California Santa Cruz, USA – Washington University St. Louis, USA – Broad Inst. of MIT and Harvard, USA – Yale University, USA Partners:

Manual Genome Annotation ~20 annotators working according to HAVANA guidelines computational pipeline for alignments Otterlace software input from partner groups, import of data source via DAS verification with RT-PCR, RACE & sequencing

Data Exchange using DAS Distributed Annotation Sources interface WWW GenTrack tracking system Otterlace ann. software high prior. issues exper. ver. issues Perl API Source Adaptors Update Scripts

GenTrack Annotation Tracking extension of open-source RoR ticketing system Redmine ( data import via DAS modules for analyzing and flagging data

GenTrack Annotation Tracking

Entry points: –List of all genes & transcripts in region –High-priority loci –Loci with specific tags Identify problem, compare in Otterlace Resolve by –Changing annotation or –Disbelieving other source –Note decision GenTrack: Workflow

DAS Specifics Format: Specialized 1.53E from sequence ontology (exon: SO: ) (havana_manual_annotation) Evidence code describing the type of method (inferred from RT-PCR experiment (ECO: )) - key=value pairs - parent, lastmod [req] (LASTMOD= T15:15: ) - transcripttype, etc. [opt]

DAS Specifics

Thanks Tim Hubbard ENCODE partners Andy Jenkinson Jonathan Warren Paul Bevan Jody Clements Steve Trevanion James Gilbert Anacode Adam Frankish Toby Hunt Bronwen Aken Steve Searle Jennifer Harrow Redmine.org