Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia

Slides:



Advertisements
Similar presentations
WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
Advertisements

1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Design of Web-based Systems IS Development: lecture 10.
Tutorial 6 Working with Web Forms
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Tutorial 6 Working with Web Forms. XP Objectives Explore how Web forms interact with Web servers Create form elements Create field sets and legends Create.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Web Development & Design Foundations with XHTML Chapter 9 Key Concepts.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
1 Web Developer & Design Foundations with XHTML Chapter 6 Key Concepts.
GUS Overview June 18, GUS-3.0 Supports application and data integration Uses an extensible architecture. Is object-oriented even though it uses.
M. Taimoor Khan * Java Server Pages (JSP) is a server-side programming technology that enables the creation of dynamic,
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Copyright OpenHelix. No use or reproduction without express written consent1.
Marcel Casado NCAR/RAP WEATHER WARNING TOOL NCAR.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
First GUS Workshop July 6-8, 2005 Penn Center for Bioinformatics Philadelphia, PA.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
GUS Plugin System Michael Saffitz Genomics Unified Schema Workshop July 6-8th, Philadelphia, Pennsylvania.
WDK Overview How the WDK implements MVC and provides a base from which custom sites can be created.
Chapter 8 Collecting Data with Forms. Chapter 8 Lessons Introduction 1.Plan and create a form 2.Edit and format a form 3.Work with form objects 4.Test.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
Web Development Kit (WDK) Y. Thomas Gan
Annotator Interface Sharon Diskin GUS 3.0 Workshop June 18-21, 2002.
The Public Face of TAIR User Interface Design Responsiveness to User Input.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
NCBI Genome Workbench Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 15, 2004 Slides from Michael Dicuccio’s Genome Workbench.
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Preface IIntroduction Objectives I-2 Course Overview I-3 1Oracle Application Development Framework Objectives 1-2 J2EE Platform 1-3 Benefits of the J2EE.
Tutorial 6 Working with Web Forms. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Explore how Web forms interact with.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
TrypDB Analysis Workflow Common Analysis T Cruzi Analysis T Brucei Analysis L Braziliensis Analysis L Infantum Analysis L Major Analysis Mercator.
Running GBrowse and DAS/1 on GUS Haiming Wang Jessica Kissinger Laboratory, Genetics C210 University of Georgia GUS Workshop August 8, 2005.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Annotation of eukaryotic genomes
What is BLAST? Basic BLAST search What is BLAST?
B Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Working with PDF and eText Templates.
10 Copyright © 2004, Oracle. All rights reserved. Building ADF View Components.
Welcome to the combined BLAST and Genome Browser Tutorial.
Legend Global = Subgraph call Make Data Dir = Step Load Genomic Sequence & Annotation = Subgraph reference Proteome Analysis = Optional step [Taxon] Pk.
TrypDB Analysis Workflow Common Analysis T Cruzi Analysis T Brucei Analysis L Braziliensis Analysis L Infantum Analysis L Major Analysis Mercator.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Web Databases for Drosophila
What is BLAST? Basic BLAST search What is BLAST?
Integrating ArcSight with Enterprise Ticketing Systems
Integrating ArcSight with Enterprise Ticketing Systems
Basics of BLAST Basic BLAST Search - What is BLAST?
Functional Annotation of the Horse Genome
GEP Annotation Workflow
INFORMATION FLOW AARTHI & NEHA.
Comparative Genomics.
Welcome to the Markers Database Tutorial
Supporting High-Performance Data Processing on Flat-Files
Welcome - webinar instructions
Annotator Interface GUS 3.0 Workshop June 18-21, 2002.
Presentation transcript:

Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia

Genomic Data Analysis Results GUS Plugins Tomcat WDK Apache

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper script

Site Design Considerations data types we wanted to warehouse additional analyses desired how to load data into GUS how to visualize data –tables –text –graphics (interactive, static) what types of questions will be asked of the data

Deciding Factors What data was available. What the research community needed. What we could accomplish by the contractual deadline for our first release.

Crypto External Resource Data Genomic sequence and gene annotations for two species (GenBank) –sequence –CDS translations –gene product descriptions –exon coordinates –RNA type (mRNA, tRNA, snoRNA, rRNA) –other features EST/mRNA (GenBank)

Auxillary Data Required NRDB NCBI Taxonomy Reference Sequence Ontology Definitions

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

GUS Plugins Perl modules for loading data into GUS –facilities to connect to the GUS perl object layer and the database –process command line arguments –create tracking information in the database –log and handle errors

GUS Plugins Supported and Community plugins bundled with GUS Plugins are versioned Each plugin version must be registered with GUS before use –records cvs version and md5 checksum –auditing

Data Loading at CryptoDB Install GUS Register selected plugins Load Controlled Vocabularies –NCBI Taxonomy –Sequence Ontology Definitions Load Crypto annotated sequences from GenBank records Load NRDB from FASTA file

Data Loading at CryptoDB Load Crypto mRNA GenBank records Load ESTs from U Penn's database of NCBI's dbEST

CryptoDB Analyses BLASTP - compare annotated proteins to nrdb BLASTX - compare whole genome to nrdb BLASTN - synteny comparison of the two Crypto species we host EST/mRNA clustering and alignment signal peptide predictions transmembrane predictions

Analysis Workflow Load Source Data into GUS (NRDB, genomic seqs) Dump same data from GUS with GUS Ids Perform analysis with this data (BLASTX) Load results into GUS GUS Ids allow results to be linked back to analysis input data

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis helper script Analysis Results

>336 source_id= B secondary_identifier= tubulin alpha length=411 TIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAA NNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFSVFHSFGGGTGSGFTSLLMERLSVD YGKKSKLEFSIYPARQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIE RQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIE Data Analysis - BLASTP Dump NRDB records from GUS to FASTA file - with GUS Ids Dump annotated protein sequences from GUS to FASTA file - with GUS Ids

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis helper scripts Analysis Results

Data Analysis - BLASTP Run BLASTP algorithm with these two GUS Id labeled datasets –used a Perl wrapper to BLAST executable, included with GUS... plugin compatible output Load BLAST results with plugin –ga GUS::Common::Plugin::LoadBlastSimFast --file blastSimilarity.out --restartAlgInvs "" --queryTable DoTS::ExternalNASequence --subjectTable DoTS::ExternalAASequence --commit

Post Data Loading Find where the results were loaded –read documentation ga GUS::Common::LoadBLAST --help –looked in plugin source code –asked other users –gusdb.org schema browser –fishing expeditions in GUS tables

Getting Our Database On Line

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

Web Development Kit (WDK) provides accelerated development of database driven web sites –define questions and records in model XML file –default JavaServer Pages (JSP) views provided not specific to GUS can be used with any RDBMS

Users supply parameter values to a canned question on the website –"Which genes have at least __ exons?" The result is returned in summary pages that list links to the record pages Record page - detailed view of data object –text –graphics –tables WDK Question - Summary - Record Paradigm

QuestionsSummaryRecord

WDK Model - View - Controller architecture Model XML configuration defines –questions –answer summaries –records View –displays the model –defined in customizable JavaServer pages Controller –internal, not configurable

WDK Setup build write WDK model (WDK comes with Toy site - spent some time with that before hand) test model from command line install WDK into Tomcat customize the view (jsp) pages integrate Tomcat with Apache - personal preference

WDK Model: Defining Questions <question name="GeneByContig" displayName="Genes by Contig" queryRef="GeneFeatureIds.GeneByContig" summaryAttributesRef="source_id,product,organism,contig" recordClassRef="GeneRecordClasses.GeneRecordClass"> Find gene located on a given contig

Find Genes By Contig ID. <![CDATA[ select g.source_id from dots.genefeature g, dots.naentry nae, dots.sequencetype st, dots.externalNAsequence enas where nae.na_sequence_id = g.na_sequence_id and enas.sequence_type_id = st.sequence_type_id and enas.na_sequence_id = nae.na_sequence_id and st.name = 'contig' and nae.source_id = '$$contig$$' ORDER BY g.source_id ]]>

WDK Model - Record <recordClass idPrefix="" name="GeneRecordClass" type="Gene" attributeOrdering="source_id,exoncount,overview, product,linkout,dnaContext,genomeCompare,tmdata,blastpgraphic, translation,sequence,reference"> <![CDATA[ This $$organism$$ gene spans positions $$start_max$$ - $$end_min$$ of contig $$contig$$ which maps to chromosome $$chromosome$$ ]]>

Testing the Model command line tools wdkXml - check xml syntax wdkSummary - test a summary wdkQuery - run specific query wdkRecord - test a record wdkSanityTest - exercises all queries and records wdkCache

Install WDK into Tomcat follow the installation instructions carefully relies on symbolic links from Tomcat webapp to $GUS_HOME –disallowed by default Tomcat configuration keep an eye on Tomcat logs for troubleshooting reload the webapp when model changes –retest on command line –don't forget about the cache

WDK Default View

CryptoDB Custom View Made style changes, added site branding Added additional form elements –radio buttons, check boxes 'Flattened out' the questions

CryptoDB Custom View Record pages - alterations to acheive the desired ordering and placement of text, tables and graphics Standard JSP tags to embed external objects –GBrowse graphic

Integrate Tomcat with Apache Apache front end answers all web requests Serves the static pages and cgi tools –BLAST interface –motif search –BLASTX keyword search Calls to the WDK are passed to Tomcat

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts Pipeline