(The Encyclopedia of Life (EOL)) medicine researcheducation The Annotation and Cataloging of Proteins, Life's Building Blocks for… The Open Notebook.

Slides:



Advertisements
Similar presentations
SDM center All-hands breakout session notes March 2002 Gatlinburg TN.
Advertisements

UNDERSTANDING JAVA APIS FOR MOBILE DEVICES v0.01.
Interactive Systems Technical Design Seminar work: Web Services Janne Ojanaho.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
INTERNET DATABASE Chapter 9. u Basics of Internet, Web, HTTP, HTML, URLs. u Advantages and disadvantages of Web as a database platform. u Approaches for.
Outline IS400: Development of Business Applications on the Internet Fall 2004 Instructor: Dr. Boris Jukic Server Side Web Technologies: Part 1.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
VCT May 20, 2009 Sapna Blesson Advisor: Dr.Christopher Pollett.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Creating Smart Clients with the Collaboration Notebook Greg Quinn Principal Investigator Desktop and Mobile Data Management San Diego Supercomputer Center.
E-Commerce The technical side. LAMP Linux Linux Apache Apache MySQL MySQL PHP PHP All Open Source and free packages. Can be installed and run on most.
Live Meeting APIs Robert Devine Program Manager Microsoft Corporation.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Web Programming Language Dr. Ken Cosh Week 1 (Introduction)
The World Wide Web By: Brittney Hardin, Carlos Smith, and David Wilkins.
Web service testing Group D5. What are Web Services? XML is the basis for Web services Web services are application components Web services communicate.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
INTRODUCTION TO WEB DATABASE PROGRAMMING
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Beyond DHTML So far we have seen and used: CGI programs (using Perl ) and SSI on server side Java Script, VB Script, CSS and DOM on client side. For some.
Web Application Architecture and Communication. Displaying a Web page in a Browser
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
GIS technologies and Web Mapping Services
DATA COMMUNICATION DONE BY: ALVIN SAMPATH CARLVIN SAMPATH.
Adapting Legacy Computational Software for XMSF 1 © 2003 White & Pullen, GMU03F-SIW-112 Adapting Legacy Computational Software for XMSF Elizabeth L. White.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
11/16/2012ISC329 Isabelle Bichindaritz1 Web Database Application Development.
Shib-Grid Integrated Authorization (Shintau) George Inman (University of Kent) TF-EMC2 Meeting Prague, 5 th September 2007.
The Encyclopedia of Life (EOL) Project An initiative to analyze and provide annotation for putative protein sequences from all publicly available genome.
Web Services Kanda Runapongsa Dept. of Computer Engineering Khon Kaen University.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MobileMAN Internal meetingHelsinki, June 8 th 2004 NETikos activity in MobileMAN project Veronica Vanni NETikos S.p.A.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
10/25/20151 Single Sign-On Web Service Supervisors: Viktor Kulikov Alexander Sherman Liana Lipstov Pavel Bilenko.
What is the VSO? (and what isn’t it?). The VSO …  Allows you to search multiple archives in a single search  Keeps you from needing to keep track of.
Web Services. Abstract  Web Services is a technology applicable for computationally distributed problems, including access to large databases What other.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Implementing computational analysis through Web services Arnaud Kerhornou CRG/INB Barcelona - BioMed Workshop IRB November 2007.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
World Wide Web “WWW”, "Web" or "W3". World Wide Web “WWW”, "Web" or "W3"
RSISIPL1 SERVICE ORIENTED ARCHITECTURE (SOA) By Pavan By Pavan.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
1 SHAWEL Sharable and Interactive Web-Lexicon Greg Gulrajani - Max-Planck-Institute in collaboration with David Harrison & Peter Wittenburg Max Planck.
EMBL-EBI MSD Search and Visualization tools Jawahar Swaminathan.
Web Technologies Lecture 8 Server side web. Client Side vs. Server Side Web Client-side code executes on the end-user's computer, usually within a web.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Ubiquitous Access for Collaborative Information System Using SVG July Sangmi Lee, Geoffrey Fox, Sunghoon Ko, Minjun Wang, Xiaohong Qui
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 9 Web Services: JAX-RPC,
1 1 High Throughput Proteomics and the Encyclopedia of Life Mark A. Miller, Ph.D. Integrative BioScience Program San Diego Supercomputer Center.
1 Building Web-base SIP Analyzer with Ajax Approach Yan-Hsiang Wang & Dr. Quincy Wu National Chi Nan University Graduate Institute of CSIE
iGAP: Integrative Grid-enabled Genome Annotation Pipeline
Overview of the Encyclopedia of Life (EOL) Project
PHP / MySQL Introduction
Lecture 1: Multi-tier Architecture Overview
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
SDMX IT Tools SDMX Registry
Presentation transcript:

(The Encyclopedia of Life (EOL)) medicine researcheducation The Annotation and Cataloging of Proteins, Life's Building Blocks for… The Open Notebook

A Multitude of Data Sites

Current Problem Using Data Sites Difficult to keep track of data files Data often returned in various formats Searches are often frequently repeated in entirety, tying up server resources

Developments in Data Transfer XML increasingly being used to encapsulate data SOAP-based access to data services, an XML-based method for exchanging information, springing up string[] getGenomeAnnotationStatus ( int Format_option) SOAP server SOAP consumer invokes SOAP method over HTTP protocol SOAP server processes request and returns any data in an XML- formatted SOAP packet SOAP consumer

Notebook Overview XML/RDF store Background SOAP Queries BLAST Data Keyword data Stored queries Annotations SOAP Server Session info Scheduler BLAST Keyword queries Metadata sharing Virtual community messaging Application invoked by mime type Web Services Interface Open Notebook Notebook link getIncrementalUpdate(string sequence, string date) … Annotations

Open Notebook Protocol Agreed set of protocols for invoking and then feeding with data a client-side application to enable client-side data persistence Not tied to one programming language

Invocation of Client-side Application Experimental mime type (as per RFC2048 ) application/x-opennotebook Application registers with web browser/OS to handle this mime type. Data then streams to application in agreed XML schema format …

Data would describe required data viewers Specialized viewers and their current availability specified in XML data download blast available Java;win32;macosx

Data updates Indication whether data is updatable yes getGenomes(string seq) yes …

Programming Language-Neutral Important to just specify protocols and activation scenarios Enables development of a variety of different and branded versions Java is envisaged an excellent programming language choice for starting development of an open source version

Encyclopedia of Life The Encyclopedia of Life (EOL) project is a joint development of the San Diego Supercomputer Center (SDSC) and scientists and biological resources worldwide EOL involves SDSC staff from HPC (High Performance Computing), DAKS (Distributed Annotation and Knowledge System), Grids, Clusters and Visualization EOL has three parts: –Putative functional and 3-D structure assignment through the largest computation ever attempted in biology –Integration of key biological resources –Make this data available to end-user through an intuitive interface Opportunity to start from ground up

integrated Genomic Annotation Pipeline - iGAP Deduced Protein sequences Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) Structural assignment of domains by PSI-BLAST on FOLDLIB Only sequences w/out A-prediction Structural assignment of domains by 123D on FOLDLIB Create PSI-BLAST profiles for Protein sequences Store assigned regions in the DB Functional assignment by PFAM, NR, PSIPred assignments FOLDLIB NR, PFAM Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) Domain location prediction by sequence structure info sequence info SCOP, PDB

Deduced Protein sequences Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) Structural assignment of domains by PSI-BLAST on FOLDLIB Only sequences w/out A-prediction Structural assignment of domains by 123D on FOLDLIB Create PSI-BLAST profiles for Protein sequences Store assigned regions in the DB Functional assignment by PFAM, NR, PSIPred assignments FOLDLIB NR, PFAM Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) Domain location prediction by sequence structure info sequence info SCOP, PDB ~800 10k-20k per =~10 7 ORF’s 4 CPU years 228 CPU years 3 CPU years 9 CPU years 252 CPU years 3 CPU years 10 4 entries integrated Genomic Annotation Pipeline - iGAP

EOL Data Flow MySQL DataMart(s) Structure assignment by PSI-BLAST Structure assignment by 123D Domain location prediction Data warehouse Pipeline data Load/update scripts Integrated Genome Annotation Pipeline (iGAP) Sequence data from genomic sequencing projects Normalized DB2 schema Web Server/ Web Services Application Server JBOSS v3.1 Apache AXIS Query databases Return data Web Services consumers Web Interface Retrieve Web pages & Invoke SOAP methods Putative Functional and 3D Assignment Integrated with Other Resources

Local Data Aggregation EOL Registry iGAP Oracle db Java Application Server Local lookup tables Temporary session search data PHProjekt Keyword search BLAST NLQ search

EOL Front End: Web Interface

Interactive Data Rendering Need for interactive client side graphical data rendering Flash used in EOL prototype but… – development time high – thin client capabilities limited by player parsing capabilities Scalable Vector Graphics (SVG) –Described by an XML-based text file –graphic description can be created server-side –standards based –Interactivity provided by embedded ECMA scripting Negatives: – Little native support in web browsers – Must use proprietary plugin (Adobe) in practice

SVG Data Rendering Embedded ECMA Script makes calls to EOL server for data Data is returned to the SVG component EOL Web Server EOL Data SVG XML-based graphic is generated in real-time on the server …

Session Data Persistence EOL Server Temp Data Session Object retains pointers to temp data

Web Server Application Server JBOSS v3.1 Open Notebook Apache AXIS org.eolproject.ejb Package: getDomains(int id, int format_option) getDomains( , 1) Flash XML rendering getDomains( , 0) Integration into enterprise applications HTML rendering EOL Front End: Web Services (cont) Open Notebook General data access

Open Notebook Software Wish List Multi-Platform application Easy installation and update Local search functionality Data annotation Built-in basic data viewers for popular data, i.e. BLAST, sequence alignments, basic molecular rendering Automated download of specialized data viewers Automatic data updates via background use of web services User notification of new data Point-and-click interface to support new breed of PDA’s and Tablets Peer-to-peer querying of annotation data

Easy Installation and Update Idiot-proof installation Java Network Launch Protocol (JNLP) good contender, i.e. WebStart JNLP has ability to provide application updates

Local search functionality Whatever kind of database is used, it needs to be able to support some kind of search functionality For the open notebook project we would seek an open source XML-based database, look to xml:db API for a means to interact with a native XML database EXIST is one example of an open source, native XML database

Data annotation & Peer-to-peer querying of annotation data Personal annotations on local data a useful and relatively easy feature to implement Peer-to-peer access contentious and needs to be well controlled Potentially could create a real community of online scientists Effectively a scientific “Napster”

Built-in Basic Data Viewers Need to have minimum built-in capability –Text viewer –SVG Graphics viewer –NCBI DTD-based BLAST browser –Multiple sequence alignment viewer –Molecule renderer

Automatic data updates via SOAP calls Server-side must be set up for providing SOAP method calls Potential to drastically reduce server load by performing incremental search getBlastData( string sequence, string last-queried )

Point-and-click interface Intuitive interface Constructed with an eye on developments in personal computing e.g. PDA’s and Tablet computers

What Next…? Upload a seed Java-based project onto the Bioinformatics.org site together with an RFC Discuss online the merits of the project

Summary A genuine need for a means to: –Collate data –Automatic updates of data –Enable shared data annotations –Specialized data processing Java provides a compelling platform to develop an open version of this client-side application

Dave Archbell Kim Baldridge Chaitanya Baru Fran Berman Philip Bourne Robert Byrnes Henri Casanova Eliot Clingman Neil Cotofana Cassie Ferguson Tony Fountain Jerry Greenberg Michael Gribskov Dana Jermanis Wilfred Li Jennifer Matthews Mark Miller Julie Mitchell Coleman Mosley Greg Quinn Vicente Reyes Jerry Rowley Peter Shin Ilya Shindyalov Chris Smith David Stoner Stella Veretnik EOL Team

Further information: