Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show. Tour of BioBIKE Motif Discovery BioBIKE (Biological Integrated Knowledge Environment) combines: Knowledge: All known genomes of interest to a specific scientific community. Analytical Tools: A powerful graphical language that permits creative expression to those with no programming experience Various BioBIKEs are available through:
Log onto CyanoBIKE Find a gene from a short description of it Speak BioBIKE (the language of CyanoBIKE) Find orthologs of a gene Obtain upstream sequences of a gene or list of genes Search a set of sequences for common motifs In this tour, you'll see how to: Tour of BioBIKE Motif Discovery Slide 4 13, You can go to any slide in this tour at any time by typing the slide number and pressing Enter. Or go to the next slide by clicking the mouse.
Coming Attractions! Display a sequence Find similar sequences amongst metagenomes, known viruses, everything in GenBank Make a sequence alignment from a set of similar sequences Construct a phylogenetic tree If you like this tour, you might also try: Sequence Analysis Find the number of contigs in a metagenome Find the average contig size in a metagenome Find the average GC content within a metagenome Visualize the distribution of GC content amongst the contigs of a metagenome Analysis of Metagenome Aggregates
To get to CyanoBIKE, click a link to one of the public sites/ To see more tours like this one, click Guided tours of BioBIKE Access this site at htpp://biobike.csbc.vcu.edu
- Enter anything you like as a login name, but no spaces or symbols. - address is optional but may be useful if you want to send in questions or complaints. - Click New Login Your name (no spaces)
The BioBIKE environment is divided into three areas as shown. You'll bring functions down from the function palette to the workspace, execute them, and note the results in the results window Function palette Workspace Results window
Two very important buttons on the function palette: On-line help (general) Something went wrong? Tell us! HELP! PROBLEM
Two very important buttons in the workspace: Undo (return to workspace before last action) Redo (Get back the workspace you undid)
Our Story The glnA gene in the cyanobacterium Anabaena PCC 7120 encodes glutamine synthetase, a critical enzyme in nitrogen metabolism. The transcription of this gene is regulated by the availability of a nitrogen source. Suppose you want to understand the molecular mechanism by which the regulation takes place.
Our Story Your strategy is to presume that this highly conserved gene possesses the same upstream regulatory sequences in related organisms. You will collect orthologs of glnA in related organisms, collect their upstream sequences, and examine them for a conserved sequence motif. The first step is to get in hand one glnA gene, the one you already know about in Anabaena. Mouse over the GENES-PROTEINS button.
Mousing over a button in the function pallette causes a menu to appear. You know the unofficial name of the gene, "glnA", and from that you want to get the official name of the gene described by "glnA". Mouse over DESCRIPTION-ANALYSIS.
Click on the function GENE-DESCRIBED-BY.
A GENE-DESCRIBED-BY function box is now in the workspace. Before continuing with the problem, let's consider what function boxes mean.
General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag The basic unit of BioBIKE is the function box. It consists of the name of a function, perhaps one or more required arguments, and optional keywords and flags. A function may be thought of as a black box: you feed it information, it produces a product.
Function-name (e.g. SEQUENCE-OF or LENGTH-OF ) Argument: Required, acted on by function Keyword clause: Optional, more information General Syntax of BioBIKE Flag: Optional, more (yes/no) information Function-name Argument (object) Keyword object Flag Function boxes contain the following elements:
General Syntax of BioBIKE Function-name Argument (object) Keyword object Flag … and icons to help you work with functions: Option icon: Brings up a menu of keywords and flags Clear/Delete icon: Removes information you entered or removes box entirely Action icon: Brings up a menu enabling you to execute a function, copy and paste, information, get help, etc
Back to our story. Click on the Argument box to open it for entry…
…then type in the description you know, "glnA".
A very common error is to forget to close an entry box. A function can't be executed until all entry boxes are closed, either by pressing Enter or Tab. Do one or the other.
Left to it's own devices, BioBIKE will search every organism it knows about for genes described by "glnA". You'll get a much faster response if you modify the function to search only Anabaena. Do this by mousing over the Option Icon…
… and clicking the IN option.
Then open the IN object box for entry by clicking on it.
You could type in the official name or nickname of the organism, but if you don't happen to know it, find it by mousing over the DATA button…
Anabaena PCC 7120 is a nitrogen- fixing cyanobacterium. Mouse over that choice.
Mouse over Anabaena PCC 7120,…
… and click on its official nickname, A7120.
That causes the name to appear in the selected box. The function is now ready for execution. Mouse over the Action Icon…
… and click Execute.
A result now appears in the Result Window. With the name of the gene in hand, you want to find all orthologs of it in cyanobacteria, to extract their upstream sequences. Mouse over the GENES-PROTEINS button…
… and click ORTHOLOG-OF.
Open the argument box of the function for entry by clicking on it…
And type in the nickname of Anabaena's glnA gene, alr2328.
Close the entry box by pressing Enter or Tab…
… and execute the function.
Lots of orthologs! It would be helpful to be able to refer to them as a group. To define such a group, mouse over the DEFINITION button…
… and click the DEFINE function.
The DEFINE function asks for two things: the name of the variable to be defined and the value it is to be given. The value will be all those orthologs. The name is up to you. Click on the variable argument box to open it up for entry…
… and type a name that makes sense to you, closing the box afterwards by pressing Tab.
Tab closes the entry box and automatically opens the next one (if it exists). There are many ways of getting that list of orthologs. You could copy and paste that list from the Result pane to the open value box, but it might be more clear to cut/paste the function that produced it. Let me show you. Click on the Action icon of ORTHOLOG-OF.
Click Cut. The function box will disappear but will be retained in the BioBIKE clipboard.
… then mouse over the Action Icon of the value argument box…
… and click Paste.
The definition is now complete (and reads well for future reference). But it will not take effect until the function is executed Click the Action icon.
… and click Execute.
Notice that a new VARIABLES button appears. We'll use it later to access the newly defined list. For now, we need to get upstream regions from all those genes. Mouse over the GENES-PROTEINS button…
… then mouse over GENES-NEIGHBORHOOD…
… and click the SEQUENCE-UPSTREAM-OF function.
The function seems to call for a gene as the argument. However, like most BioBIKE functions, this one has the following useful property: - Give it a single item, it returns a single answer - Give it a list of items, it returns a list of answers. Open the argument box for input.
We want the function to act on the group of genes we just defined. Mouse over the VARIABLES button…
… and click the name of the group you just defined. That will bring the group into the selected box.
We could execute the completed function, and then take those upstream sequences and look within them for sequence motifs. Alternatively we could skip the intermediate step and have the sequences go directly into the motif finder. To do that, we surround the function with the motif finder. To surround, mouse over the Action Icon…
… and click Surround with.
The entire function is now selected. We need to specify that we want to surround the function with a function that searches for motifs within sequences. Mouse over the STRINGS-SEQUENCES button…
… mouse over BIOINFORMATIC-TOOLS…
… and click MOTIFS-IN. (By the way, if the categories aren't sufficiently intuitive, you can always find functions alphabetically, through the ALL button on the Function Palette)
The upstream sequences returned by SEQUENCES-UPSTREAM-OF will now be given to the MOTIFS-IN function. Executing that function will execute everything inside of it. You might think it's time to go over to the Action Icon of MOTIFS-IN and execute …
… but hold that mouse! MOTIFS-IN, unless told otherwise, looks for amino-acid motifs. Eventually we'll get around to teaching it how to distinguish DNA from protein sequences automatically, but for now, mouse over the Options Icon…
… and click the DNA option.
Now execute the function.
Notice "Executing" in the message bar. MOTIFS-IN might take seconds to execute. Don't try to do any other function during that time. MOTIFS-IN formats the sequences in a way a motif-finding program (Meme) likes to see and supplies its results in a separate window.
A new window opens, which you can save to your own computer if you like. For now just scroll down.
Meme has found a motif with a very good E-value. It provides a histogram, showing the information content of each position of the motif. The higher the bar, the more conserved the position. Scroll further.
You get the sequence of the motif for each upstream sequence in which it was found. Scroll further.
Meme also found a second good motif. Scroll to the end of the file.
At the end you get a map of all the motifs found and where in the upstream sequences they appear. Evidently, Motif 2 and 3, when present, generally precedes Motif 1.
BioBIKE Knowledge and tools are integrated. Data conversion is seldom necessary. The language is uniform, facilitating access to many popular tools through a common interface. The language is as flexible as any general purpose language, permitting construction of new tools. The programming language is easy to pick up, using graphical conventions familiar to those who don't program. The environment is well suited for teaching the concepts of molecular biology through computational experiment. You've seen a knowledge environment in which:
Collaborators Michael Chaplin Johnny Casey (Sequoia Cons.) Sarah Cousins (now Wistar Institute) Michiko Kato (now UC Davis) Hailan Liu JP Massar (Berkeley) James Mastros (now Philip Morris) Bogdan Mihai John Myers (Sequoia Cons.) Nihar Sheth Jeff Shrager (Carnegie Inst.) Arnaud Taton Hien Truong Andy Whittam (Washington & Jefferson) … and many participating students Development of BioBIKE was funded by a grant from the National Science Foundation Contact Jeff Elhai, Center for the Study of Biological Complexity, Virginia Commonwealth University ( ) (Tel)