GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree
Outline " Current web interfaces examples: allgenes.org, PlasmoDB.org Java Servlet, CGI-based reusable Java and Perl code, install scripts " The future? PHP and JSP "GUSWWW" schema redesign
GUS - Multiple Views & Projects AllGenes.orgPlasmoDB.org EPConDB CoreSResTESSRADDoTS Oracle RDBMS Perl Object Layer for Data Loading Java Servlets + Perl CGI Other sites Other projects
allgenes.org query: "Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have been localized to mouse chromosome 5?"
Select the allgenes.org boolean query page Click on the "AND" button
Choose the RH map and GO function queries Select mouse chromosome 5 and "transcription factor"
There are 22 mouse RNAs (assemblies) that meet these criteria: This query result set now appears on the query "history" page:
Now use the BLAST page to identify RNAs similar to my cDNA The results of the BLAST search appear in the query history
Intersect ("AND") the BLAST search with the previous query: And we have our answer (the third row on the query history page):
Predicted GO function(s) (some manually reviewed) predicted protein CAP4 assemblyEST expression profile UCSC BLAT Other transcripts from the same gene External links Mapping information Protein/motif hits Gene trap insertions, etc.
PlasmoDB: Combining Expression and Sequence Data "List all genes whose proteins are predicted to contain a signal peptide and for which there is evidence that they are expressed in Plasmodium falciparum's late schizont stage."
Web Interface Components GUS/www/allgenes/htdocs/ GUS/www/allgenes/htdocs/index.html.in... GUS/www/allgenes/cgi-bin/ GUS/www/allgenes/cgi-bin/rnaProtSimPng.pl.in... GUS/java/cbil/gus/servlet/ GUS/java/cbil/gus/servlet/SiteServlet.java... GUS/www/install/ GUS/www/install/allgenes-config.in GUS/www/install/installServlet.pl GUS/perl/servlet/allgenes/ GUS/perl/servlet/allgenes/rnaProtSim.pl.in...
rnaProtSimPng.pl.in # # rnaProtSimPng.pl # # $Revision: 1.3 $ $Date: 2001/03/22 14:44:57 $ $Author: crabtree $ # use strict; require 'cgi_lib.perl'; require # Input using cgi_lib.perl # my %rq = &get_request(); my $naSeqId = $rq{'id'} || ; $naSeqId =~ s/[^\d]//g; my $maxHits = $rq{'max_hits'}; $maxHits =~ s/[^\d]//g; # Generate image using rnaSimilarityPng.pm # $| = 1; my $mapName = "$naSeqId-prot"; my $imgData = &getImage($mapName, $naSeqId, 'ExternalAASequence'); print "Content-type: image/png\n\n$imgData";
cbil.gus.servlet.SiteServlet " extends javax.servlet.http.HttpServlet and is the only actual servlet in our Java code " reads a configuration file and instantiates the set of JavaBeans defined therein: instances of PageGeneratorI - content generators SqlQuery - parameterized SQL queries "Param" and "Formatter" classes " implements logging, dispatches requests
allgenes-config.in # Oracle-specific routines # gusOraSql.class=cbil.gus.servlet.db.oracle.SQL # Set of logins to GUS or GUSdev # gusLogin.class=cbil.gus.servlet.db.ConnectionPool gusLogin.NumConnections=6 gusLogin.MaxQueryTime=120 gusLogin.CheckInterval=30 gusLogin.JDBCDrivers=oracle.jdbc.driver.OracleDriver gusLogin.Sql=gusOraSql gusLogin.PrintStatusMessages=true.
# Retrieve an RNA's sequence from the DB # rnaSeqQ.class=SqlQuery rnaSeqQ.DisplayName=RNA sequence rnaSeqQ.Name=rnaSeqQ rnaSeqQ.Abbrev=rnaSeq rnaSeqQ.SQL=select nas.sequence \ from dots.NASequenceImp nas, dots.ProjectLink pl \ where nas.na_sequence_id = $$0$$ \ and nas.na_sequence_id = pl.id \ and pl.project_id = 813 \ and pl.table_id in (56, 89) rnaSeqQ.HtmlBrief=RNA sequence for RNA DT. rnaSeqQ.Params=rnaID rnaSeqQ.ResultFormatter=rnaSeqF
# RH map location (DOTS only) # rhLocnID.DisplayName=Chromosomal location based on RH mapping rhLocnID.Name=rhmap_locn_id rhLocnID.Abbrev=rhLocn rhLocnID.SQL=select distinct epcr.na_sequence_id \ from dots.EPCR epcr, dots.RHMapMarker rmm, dots.RHMarker rm, dots.ProjectLink pl \ where rmm.chromosome = '$$0$$' and rmm.centirays >= $$1$$ and rmm.centirays <= $$2$$ \ and rm.rh_marker_id = rmm.rh_marker_id \ and rm.taxon_id $$3$$ \ and epcr.map_table_id = 366 \ and rmm.rh_map_marker_id = epcr.map_id \ and epcr.na_sequence_id = pl.id \ and pl.project_id = \ and pl.table_id = 56 rhLocnID.HtmlBrief= RNAs radiation hybrid mapped to \ chromosome between and cR rhLocnID.HtmlLong=This query returns DoTS predicted transcripts that can be \ linked to a specific chromosomal location by the radiation hybrid map data. A DoTS \ predicted transcript consists of an... rhLocnID.Params=humanOrMouseChromP,centirayStartP,centirayEndP,taxonIdP rhLocnID.ResultFormatter=dotsIdListF1
humanOrMouseChromP.class=EnumParam humanOrMouseChromP.Prompt=Select a chromosome: humanOrMouseChromP.Description=Human or mouse chromosome humanOrMouseChromP.Values=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18, 19,20,21,22,X,Y humanOrMouseChromP.Help=Please select a human or mouse chromosome from \ the list provided; note that chromosomes 'Y', '20', '21', and '22' are \ only valid for humans. centirayStartP.class=DoubleParam centirayStartP.Prompt=Start position in centirays: centirayStartP.Description=Start position in centirays centirayStartP.Min=0.0 centirayStartP.Max=10290 centirayStartP.Initial=0.0 centirayStartP.Help=Enter a "start" position in centirays. The centiray \ is the unit of distance used in radiation hybrid mapping \ assays and the form should indicate the range of values \ that are valid for this particular parameter.
"GUSwww" Cache Tables SQL> describe queries; Name Null? Type QUERY_ID NOT NULL NUMBER(12) SERVLET_NAME NOT NULL VARCHAR2(30) QUERY_NAME NOT NULL VARCHAR2(100) PARAM0 VARCHAR2(100) PARAM1 VARCHAR2(100). PARAM74 VARCHAR2(100) PARAM75 VARCHAR2(100) RESULT_TABLE NOT NULL VARCHAR2(30) START_TIME NOT NULL DATE END_TIME DATE SQL> describe cache435; Name Null? Type SPOT_FAMILY_RESULT_ID NOT NULL NUMBER(10) I NUMBER SQL> describe cache30687; Name Null? Type NA_SEQUENCE_ID NUMBER(12) I NUMBER(12)
installServlet.pl install]$./installServlet.pl --port=9000 \ --cgiDir=/world/ \ --htdocsDir=/world/ \ --cgiURL= \ --htdocsURL= \ --installDir=/world/ \ --servletName=allgenes-zeus \ --servletFilePrefix=allgenes \ --servletConfig=allgenes-zeus \ --production \ --servletURL= -install htdocs and cgi-bin files perform substitutions defined by 'allgenes-zeus' (e.g. ORA_LOGIN, ORA_PASSWORD, PROJECT_ID) -compile Java code, create.jar file and install -install servlet configuration file
Features of Current [Servlet] Implementation " Automatic generation of HTML FORMs Automated input checking Integrated help features INPUT elements populated from the database " Query history facility " Boolean queries (AND, OR, SUBTRACT) " Declarative configuration file " Base system is relatively independent of GUS
Limitations of Current Implementation " Relatively steep learning curve " Monolithic solution No support for modifying configuration at runtime All objects instantiated when config. file read " Limited ability to customize presentation layer (i.e., HTML) without programming in Java " Technical problems with Servlets/Tomcat Must restart all servlets as a group Not currently using Serializable sessions
Dynamic Web Content " HTML fragments embedded in a program: CGI programs (e.g. Perl - interpreted) Java Servlets (compiled) " Program fragments embedded in HTML: PHP (interpreted) JSP (compiled; once, as needed) " Another axis: persistent vs. not (CGI/FastCGI)
Program Fragments in HTML " Advantages: faster development cycle; can edit in place easier to see/validate structure of HTML pages HTML has no functions, Java and PHP do " Disadvantages: must take care to manage complexity of application " Recommendations: move towards adopting this approach move all persistent state into the database
PHP: PHP Hypertext Processor " " Scripting language; can be embedded in HTML " (Netcraft survey):
JSP - Java Server Pages " Based on and can interact with Java Servlets " Essentially Java embedded in HTML " XML-based tags, scriptlets, and JavaBean calls " Standard tag libraries available " Pages typically compiled on demand " Multiple implementations? (vs. single for PHP)
Next steps " Agree on desired user interface functionality saving queries for PlasmoDB persistent preferences for genome browser " Design parts of the schema to support it " Migrate old code/write new code " Easier to migrate existing code with JSP