Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia

Similar presentations


Presentation on theme: "Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia"— Presentation transcript:

1 Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia mheiges@uga.edu

2 Genomic Data Analysis Results GUS Plugins Tomcat WDK Apache

3

4

5

6

7

8

9 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper script

10 Site Design Considerations data types we wanted to warehouse additional analyses desired how to load data into GUS how to visualize data –tables –text –graphics (interactive, static) what types of questions will be asked of the data

11 Deciding Factors What data was available. What the research community needed. What we could accomplish by the contractual deadline for our first release.

12 Crypto External Resource Data Genomic sequence and gene annotations for two species (GenBank) –sequence –CDS translations –gene product descriptions –exon coordinates –RNA type (mRNA, tRNA, snoRNA, rRNA) –other features EST/mRNA (GenBank)

13 Auxillary Data Required NRDB NCBI Taxonomy Reference Sequence Ontology Definitions

14 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

15 GUS Plugins Perl modules for loading data into GUS –facilities to connect to the GUS perl object layer and the database –process command line arguments –create tracking information in the database –log and handle errors

16 GUS Plugins Supported and Community plugins bundled with GUS Plugins are versioned Each plugin version must be registered with GUS before use –records cvs version and md5 checksum –auditing

17 Data Loading at CryptoDB Install GUS Register selected plugins Load Controlled Vocabularies –NCBI Taxonomy –Sequence Ontology Definitions Load Crypto annotated sequences from GenBank records Load NRDB from FASTA file

18 Data Loading at CryptoDB Load Crypto mRNA GenBank records Load ESTs from U Penn's database of NCBI's dbEST

19 CryptoDB Analyses BLASTP - compare annotated proteins to nrdb BLASTX - compare whole genome to nrdb BLASTN - synteny comparison of the two Crypto species we host EST/mRNA clustering and alignment signal peptide predictions transmembrane predictions

20 Analysis Workflow Load Source Data into GUS (NRDB, genomic seqs) Dump same data from GUS with GUS Ids Perform analysis with this data (BLASTX) Load results into GUS GUS Ids allow results to be linked back to analysis input data

21 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis helper script Analysis Results

22 >336 source_id=0703290B secondary_identifier=223280 tubulin alpha length=411 TIGGGDDSFNTFFSETGAGKHVPRAVFVDLEPTVIDEVRTGTYRQLFHPEQLITGKEDAA NNYARGHYTIGKEIIDLVLDRIRKLADQCTGLQGFSVFHSFGGGTGSGFTSLLMERLSVD YGKKSKLEFSIYPARQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIE RQVSTAVVEPYNSILTTHTTLEHSDCAFMVDNEAIYDICRRNLDIE Data Analysis - BLASTP Dump NRDB records from GUS to FASTA file - with GUS Ids Dump annotated protein sequences from GUS to FASTA file - with GUS Ids

23 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis helper scripts Analysis Results

24 Data Analysis - BLASTP Run BLASTP algorithm with these two GUS Id labeled datasets –used a Perl wrapper to BLAST executable, included with GUS... plugin compatible output Load BLAST results with plugin –ga GUS::Common::Plugin::LoadBlastSimFast --file blastSimilarity.out --restartAlgInvs "" --queryTable DoTS::ExternalNASequence --subjectTable DoTS::ExternalAASequence --commit

25 Post Data Loading Find where the results were loaded –read documentation ga GUS::Common::LoadBLAST --help –looked in plugin source code –asked other users –gusdb.org schema browser –fishing expeditions in GUS tables

26 Getting Our Database On Line

27 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

28 Web Development Kit (WDK) provides accelerated development of database driven web sites –define questions and records in model XML file –default JavaServer Pages (JSP) views provided not specific to GUS can be used with any RDBMS

29 Users supply parameter values to a canned question on the website –"Which genes have at least __ exons?" The result is returned in summary pages that list links to the record pages Record page - detailed view of data object –text –graphics –tables WDK Question - Summary - Record Paradigm

30 QuestionsSummaryRecord

31 WDK Model - View - Controller architecture Model XML configuration defines –questions –answer summaries –records View –displays the model –defined in customizable JavaServer pages Controller –internal, not configurable

32 WDK Setup build write WDK model (WDK comes with Toy site - spent some time with that before hand) test model from command line install WDK into Tomcat customize the view (jsp) pages integrate Tomcat with Apache - personal preference

33 WDK Model: Defining Questions <question name="GeneByContig" displayName="Genes by Contig" queryRef="GeneFeatureIds.GeneByContig" summaryAttributesRef="source_id,product,organism,contig" recordClassRef="GeneRecordClasses.GeneRecordClass"> Find gene located on a given contig

34 Find Genes By Contig ID. <![CDATA[ select g.source_id from dots.genefeature g, dots.naentry nae, dots.sequencetype st, dots.externalNAsequence enas where nae.na_sequence_id = g.na_sequence_id and enas.sequence_type_id = st.sequence_type_id and enas.na_sequence_id = nae.na_sequence_id and st.name = 'contig' and nae.source_id = '$$contig$$' ORDER BY g.source_id ]]>

35 WDK Model - Record <recordClass idPrefix="" name="GeneRecordClass" type="Gene" attributeOrdering="source_id,exoncount,overview, product,linkout,dnaContext,genomeCompare,tmdata,blastpgraphic, translation,sequence,reference"> <![CDATA[ This $$organism$$ gene spans positions $$start_max$$ - $$end_min$$ of contig $$contig$$ which maps to chromosome $$chromosome$$ ]]>

36 Testing the Model command line tools wdkXml - check xml syntax wdkSummary - test a summary wdkQuery - run specific query wdkRecord - test a record wdkSanityTest - exercises all queries and records wdkCache

37 Install WDK into Tomcat follow the installation instructions carefully relies on symbolic links from Tomcat webapp to $GUS_HOME –disallowed by default Tomcat configuration keep an eye on Tomcat logs for troubleshooting reload the webapp when model changes –retest on command line –don't forget about the cache

38 WDK Default View

39 CryptoDB Custom View Made style changes, added site branding Added additional form elements –radio buttons, check boxes 'Flattened out' the questions

40 CryptoDB Custom View Record pages - alterations to acheive the desired ordering and placement of text, tables and graphics Standard JSP tags to embed external objects –GBrowse graphic

41 Integrate Tomcat with Apache Apache front end answers all web requests Serves the static pages and cgi tools –BLAST interface –motif search –BLASTX keyword search Calls to the WDK are passed to Tomcat

42 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts

43 GUS External Resources: NCBI Taxonomy (SRes) SO (SRes) NRDB (DoTS) Our data (DoTS) Plugins Analysis Input: contigs proteins NRDB Web Development Kit Analysis Results helper scripts Pipeline

44


Download ppt "Building CryptoDB using GUS Mark Heiges Center for Tropical and Emerging Global Diseases University of Georgia"

Similar presentations


Ads by Google