Introduction to DAS / State of the Union Tim Hubbard DAS developer workshop 10th March 2009 Wellcome Trust Genome Campus.

Slides:



Advertisements
Similar presentations
Protein Annotation Ontology The BioSapiens Virtual Institute for Genome Annotations Janet Thornton & Gabby Reeves AFP/BioSapiens Vienna: July 07.
Advertisements

Genome Annotation: A Protein-centric Perspective.
EBI is an Outstation of the European Molecular Biology Laboratory. DAS implementations Bernat Gel 01/03/11.
Trellis DAS/2 Server Framework Gregg Helt. DAS/2 Overview Same goal and overall strategy as DAS1 – HTTP transport, URL queries, XML responses – RESTful.
Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS.
Andy Jenkinson, EBI An Introduction to DAS. Summary of Topics What is Data Integration? Problems in Data Integration An architectural overview of DAS.
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
The North American Carbon Program Google Earth Collection Peter C. Griffith, NACP Coordinator; Lisa E. Wilcox; Amy L. Morrell, NACP Web Group Organization:
Rafael C Jimenez DAS DAS Workshop 2012 February 27-29, 2012 Using DAS software, an introduction to some DAS implementations.
Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Pfam Pfam & DAS Rob Finn 26 th Feb Pfam Acknowledgements John Tate Roger Pettett Andreas Prlic Andy Jenkinson But takes data from community…..!
ABSTRACT WormBase is a freely available information resource primarily for the nematode Caenorhabditis elegans but which progressively includes data from.
Spark Web 2.0 Tools for Communication and Collaboration David Grogan Manager, Curricular Technology Group UIT Academic Technology Tufts University What.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
Spark Web 2.0 Tools for Communication and Collaboration David Grogan Manager, Curricular Technology Group UIT Academic Technology Tufts University What.
Structural Biology and Biocomputing Programme 1 Osvaldo Graña, CNIO Distributed Annotation System (DAS) part I Osvaldo Graña VIII.
BigBed/bigWig remote file access Hiram Clawson UCSC Center for Biomolecular Science & Engineering.
Nov Copyright Galdos Systems Inc. November 2001 Geography Markup Language Enabling the Geo-spatial Web.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
Design of Web-based Systems IS Development: lecture 10.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
Client/Server Architectures
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Update on the DAS Registry DAS Workshop 2011 Jonathan Warren.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
Practical Project of the 2006 Joint International Master’s Degree.
M1G Introduction to Database Development 6. Building Applications.
Content Management Systems Week 14 LBSC 671 Creating Information Infrastructures.
WebApollo extending JBrowse to support DAS & genomic annotation editing Gregg Helt, Ed Lee, Nomi Harris, Mitch Skinner, Suzanna Lewis, Ian Holmes Lawrence.
DAS for Molecular Interactions Hagen Blankenburg.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Pfam, DAS and the future Rob Finn DAS Workshop 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tengcha – generic middleware for retrieving data from Chado Justin Reese GMOD Meeting April 5, 2012.
INTRODUCTION TO WEB APPLICATION Chapter 1. In this chapter, you will learn about:  The evolution of the Internet  The beginning of the World Wide Web,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
DAS Current Situation and Future Developments Jonathan Warren DAS coordinator for the Sanger Institute
3D-EM DAS Extending DAS to 3D-EM and Fitting /02/26.
A curated database of biological pathways.
DAS Writeback and its integration in Dasty2 as a proof of concept Gustavo Adolfo Salazar Orejuela Supervised by: Prof. Edwin Blake Cosupervised by: Dr.
A collaborative tool for sequence annotation. Contact:
Worldwide Lexicon Brian McConnell May, WWL – Brian McConnell Worldwide Lexicon Intro Automatic discovery of dictionary, semantic net and translation.
EBI is an Outstation of the European Molecular Biology Laboratory. Literature Resources at the EBI Information Workshop on European Bioinformatics Resources.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
DAS Advance Search and its prototype implementation in MyDas Gustavo Adolfo Salazar Orejuela Supervised by: Nicola Mulder Henning Hermjakob DAS workshop.
Dasty2 DAS workshop th March Rafael Jimenez.
BioMart Federated Database Architecture Arek Kasprzyk EBI 9 June 2005.
1 Annotations for CDS services Euro-VO Technology Forum, 17 March 2009 Web 2.0: annotations for CDS services Brice GASSMANN Sébastien DERRIERE Thomas BOCH.
Jalview Visualising DAS annotation on Multiple Sequence Alignments 26 th February 2007 Andrew Waterhouse
PDS4 Demonstration Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
EnVisioning Data Integration SME forum 2009, Vienna Henning Hermjakob Henning Hermjakob
Accessing and visualizing genomics data
System Architecture & Hardware Configurations Dr. D. Bilal IS 582 Spring 2008.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Michael Hucka1 The Modeler’s Workspace Current Design Plans and Status Current project members: Michael Hucka Kavita Shankar Sara Emardson David Beeman.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
System Architecture & Hardware Configurations
An ontology for e-Research
Lesson 3 Bioinformatics Laboratory
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Presentation transcript:

Introduction to DAS / State of the Union Tim Hubbard DAS developer workshop 10th March 2009 Wellcome Trust Genome Campus

Distributed Annotation System or How I Learnt to Stop Worrying and Love Data Federation Credit: Andreas Prli ć

Distributed Annotation System Origins: –xml client/server specification ( –Lincoln Stein, Sean Eddy, Robin Dowell and LaDeana Hillier –acedb based prototype server –Java based prototype client –Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R. & Stein, L. (2001) BioMedCentral Bioinformatics 2. Genome campus adoption –Initially via Ensembl becoming a DAS client (now also a DAS server) –Code: Dazzle and Proserver servers; Bio::DASLite and biojava client libraries –Hosts DAS registry (

DAS in a nutshell Standardized set of web services –Reference servers (the sequence) –Annotation servers (features: chr:start-end) –Alignment servers (chr:start-end matches chr:start-end) –Identifier based servers (ref item X rather than coordinate) Standardization allows clients to connect to different DAS sources without additional programming

Data integration Complete genomes provide the framework to pull all biological data together such that each piece says something about biology as a whole Biology is too complex for any organisation to have a monopoly of ideas or data The more organisations provide data or analysis separately, the harder it becomes for anyone to make use of the results

Utility of bioinformatics Scientific impact Too little bioinformatics Too many databases Too diverse interfaces

Split data and presentation Databases responsible for curating data and serving it as primitive datatypes defined by open standards (high cost) Different front ends or components of front ends compete for users (development of each low cost) c.f. browsers.

DataServices

Servers Campus DAS systems Clients e! contigview epigenome e! geneview Genome Coordinates Proserver Apollo Pfam 3D structure CDS Coordinates Protein Coordinates Stable Identifiers Dazzle LDAS Sources Ensembl Pfam UniProt PubMed COSMIC Sequence Alignments Registry otterlace

Rise of Federation Technologies DAS for features BioMart for data mining BioMart server is a DAS server New international genome data projects –routinely using the F word –frequently the D and B words too –e.g. International Cancer Genome Consortium

DAS infrastructure status Lots of progress –Servers: Dazzle, Proserver, MyDas, Bio::Daslite –Clients: Ensembl, Vega, Dasty, SPICE, Pfam, Jalview, Pepper, IGB –>500 sources in DAS registry ( –Broadly adopted by large scale projects: Ensembl, biosapiens, efamily, ZF- models, eProtein, ENCODE annotation –Extensions in 1.53E: stylesheets, semantic zooming, ontology support, timestamps, interactions –Planned 1.6: incorporating some features of DAS2 specification –Better adoption of DAS in US Opportunities –Searching, writeback –Source ranking, credit, social networking –Inter-client communications protocol –Async delivery/caching; servers built on servers/workflows –Alternative entry points from servers? Next left/right? Date of addition?

2008 the year of… Open access to publications –PMC, ukPMC, Zotero, Papers, MyNCBI, Citeulike, Connotea, 2collab and HubMed –All WT funded publications open in 6 months –All NIH funded publications open in 12 months DAS for publications? –Text is just a new coordinate system Links to Social Networks? –Google OpenSocial Still waiting…

2009 the year of… Massive datasets –Track likely to be 50 million solexa transcriptome reads Need: –Better ways for users to create tracks for large datasets

Problems of large user data (credits to Jim Kent, UCSC) Easy to generate 1 GB files with next gen sequencing. –25 million tag mappings at 40 bytes each –Potential to translate into histograms with 1 floating point number every 12 bases Slow to load into MySQL database backend to local DAS server; many users will not want to setup DAS servers Too large to upload to remote DAS server services (e.g. Ensembl) to create track Most users only look at 5-50 sites - less than 1% genome

Jim Kent’s idea User runs program to convert their data into single indexed file (BigWig & BigBed) Place on their website UCSC browser fetches parts of file on demand using http(s) “byte range” queries Relationship to DAS? –Potential to create DAS server plugin to serve BigWig/BigBed files as DAS servers

Acknowledgements Ewan Birney Tony Cox Thomas Down Rob Finn Stefan Graf David Jackson Andreas Kahari Eugene Kulesha Henning Hermjakob Roger Pettett Matt Pocock James Smith Jim Stalker Janet Thornton Ensembl/Sanger Web team efamily, biosapiens, eProtein Zebrafish analysis (ZF-models) Anacode/Acedb (otterlace/Zmap) Jonathan Warren Andy Jenkinson Andreas Prlic

2009 the year of… Massive datasets –Track likely to be 50 million solexa transcriptome reads Private datasets –EGA requires registration and logins –Even summary data currently not public Need: –Better ways for users to create tracks for large datasets –Federated access controls for patient data

DAS stylesheet magic ( Eugene Kulesha ) Todo: tilling array