BUSINESS SENSITIVE 1 SAAW - Sequence Annotation and Analysis Workshop Boyu Yang and Gene Godbold Battelle Memorial Institute, Charlottesville Operations 1001 Research Park Blvd, Charlottesville, VA Abstract The Sequence Annotation and Analysis Workshop (SAAW) is a semi-automated system that supports sequence annotation and analysis. It consists of a Java client application, a variety of web services and a relational database. The Java client provides the capability to manually examine a sequence at the single residue level. The annotated features are graphically aligned and displayed. Some sequence analysis tools are integrated to the system in the form of web services that support semi-automated annotation, these tools include Blast, PHI-Blast, RPS-Blast, MSA(Clustal-W), InterPro, InterProMatch, IDMap, UniProt, Pfam/CDD, PDB, Prints and GO. Jmol is integrated as a visualization plug-in for the display of 3D structure. Some basic functions such as drawing dot or hydropathy plots, finding binding sites for various proteins, finding stem-loops, finding transcription terminators, finding ORFs, finding restriction enzyme cutting sites, showing statistics regarding numbers of residues, finding subsequence by pattern and translating nucleotide into protein sequences are hardcoded in the client. The user interfaces are customizable and pluggable. A rich set of visualization plug-ins have been developed to support various visualization needs including (1) displaying BLAST results (2) aligning multiple sequences, (3) displaying 3D structures and (4) showing domain/family information. An automatic annotation pipeline is designed for automatically assigning features from UniProt, InterPro/InterProMatch, Pfam/CDD and Prints so that curators do not have to manually annotate features from these data sources. The web services tier provides functions to access the relational database and do some computation intensive jobs. All sequences and annotation data are stored in a mySQL database. Front End User Interface The front end user interface is a Java application. It has three major views: editor view, graph view and table view (figures 1, 2 and 3). The editor view is for showing, editing, searching and annotating a sequence. The graph view shows sequence and annotations in graph mode. The table view manages all sequences in memory. Figure 1 Editor ViewFigure 2 Graph View Embedded Basic Functions Basic functions that are directly embedded in the client include finding a protein binding site on a nucleotide sequence, finding stem-loops in RNA, finding transcription terminators, identifying restriction enzyme cutting sites for a nucleotide sequence, finding subsequence by pattern and translating nucleotide sequences into proteins. Figure 4 provides a demonstration for finding subsequence by pattern. Figure 5 shows all embedded functions. Figure 4 Finding a sequence pattern Figure 5 Embedded basic functions Creating Features Features can be created from either the editor view or the graph view, a feature is defined by its type, e.g., a domain, its start and end position, a note that describes the feature and a note source describe the source reference. A feature does not have to be a continuous region, it may be a few sequence pieces spread across the sequence (figure 6). Integrated Web Services Many applications that involve intensive computations or accessing huge amounts of data are implemented as web services including Blast, InterPro, and Pfam (figure 7 in the Tools menu). All Web services are accessed through a universal user interface (figure 8) which is the UI for Blast web service. The interface is constructed at runtime based on the WSDL of the web service. Figure 3 Table View Figure 6 Window for creating a feature Figure 7 Menu for all integrated web services Figure 8 Universal web service interface Discussion Biological data curators at Battelle have used the sequence annotation and analysis workshop to annotate hundreds of sequences manually and thousands of sequences automatically. The system is easy to learn. The graph view assists in viewing all annotated features. The integrated tools help to verify that all entered features are valid through cross-checking against BLAST or aligning it with other annotated sequences. The automatic annotation pipeline of integration of InterPro, UniProt, Pfam, Prints, etc. allows the curators to focus on features that are not available from the publicly available resources, making annotation a much easier job. The 3D structure display feature provides another way to examine the annotated features. It is designed as a three-tier system; the Java client is expandable by adding new visual plugins; the web services tier loosely coupled the backend database and the client; the universal web services interface makes system integration a much easier job.