The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Workbench National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Outline Meandre Data Flows Overview of Meandre Workbench Overview of Repositories Constructing Flows
Meandre: Data Driven Execution Execution Paradigms –Conventional programs perform computational tasks by executing a sequence of instructions. –Data driven execution revolves around the idea of applying transformation operations to a flow or stream of data when it is available. Dataflow Approach –May have zero to many inputs –May have zero to many outputs –Performs a logical operation when data is available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Meandre: Dataflow Example The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Value1 Value2 Sum
Meandre: Dataflow Example Dataflow Addition Example –Logical Operation ‘+’ –Requires two inputs –Produces one output When two inputs are available –Logical operation can be preformed –Sum is output When output is produced –Reset internal values –Wait for two new input values to become available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Value1 Value2 Sum
Meandre: The Dataflow Component Data dictates component execution semantics Component P Inputs Outputs Descriptor in RDF of its behavior The component implementation The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Meandre: Component Metadata Describes a component Separates: –Component semantics (black box) –Component implementation Provides a unified framework: –Basic building blocks or units (components) –Complex tasks (flows) –Use of standardized ontologies The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Meandre: Components Types Components are the basic building block of any computational task. There are two kinds of Meandre components: –Executable components Perform computational tasks that require no human interactions during runtime Components are initialized during flow startup and are fired in accordance to the policies defined –Control components Used to pause dataflow during user interaction cycles WebUI may be a HTML Form, Applet, or Other user interface The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Meandre: Flow Connectivity Defined by connecting outputs from one component to the inputs of another. –Cyclical connections are supported –Components may have Zero to many inputs Zero to many output Properties that control runtime behavior Described using RDF –Enables storage, reuse, and sharing like components –Allows discovery and dynamic execution The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Meandre: Flow (Complex Tasks) A flow is a collection of connected components Read P Merge P Do P Show P Get P Dataflow execution The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
A Little more on the Nuts & Bolts! Programming Paradigm What does Meandre Execution Engine Do? What are the possible Component Scenarios Data Driven Flow Creation (Workbench/Zigzag)
The Meandre Server prepares Data Intensive Flow by reading the RDF component descriptors –Executable Components and the connections between them are prepared by using a Queue mechanism to store data as it becomes available on the ports. Meandre provides each component an executing thread for processing. Meandre manages the logic queues for component connections in a flow Meandre activates component for initialization, data events, and termination Meandre provides components with access to runtime resources Context AContext BQueue Meandre Server Flow Execution
Meandre Server Infrastructure Defines Firing.Policy ALL or ANY Input & Ouput Data Ports that require a logical queue to be managed by server Component RDF Descriptor defines: Component Pull Inputs Meandre Server Push Outputs Meandre Server Component Meandre Server Relationship to Component
Flows can have any number of components with “None to Many” Input data ports “None to Many” Output data ports Flow components may have multiple connectors assigned to any input data port Flows may contain connectors that are cyclical over one or more components Flows are made up of “One or More” components with “None to Many” connectors that are described to the Mendre Server for management Flows must contain at minimum one component with NO Inputs to cause an Execute call to be made. *Outputs are Always Optional. Meandre Server Flows & Connectors
Meandre: Programming Paradigm The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components developed by third parties to be assembled in a new flow. There are two ways to develop flows : –Meandre’s Workbench visual programming tool –Meandre’s ZigZag scripting language The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation
Workbench Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow
What is it? Visual programming environment Thankfully, no code writing skills are required. Provides a mechanism to create and execute flows Built on top of GWT (Google Web Toolkit) – accessible from all major browsers
Getting Started Fire up your favorite browser and connect If you installed the Workbench on your local machine, use to access it, otherwise replace “localhost” with the correct address of the computer where the Workbench is running at. Log in
The Workbench
The Workspace used as a main staging area for building / editing flows The Output Panel The Details Panel The Repository Panel
The Workspace Components can be dragged into this region from the “Components” panel and interconnected to create flows.
The Flow Toolbar Provides access to frequently used functions o save flows o remove components o control flow execution
Saving a Flow Required metadata: - Name - Base URL Separate tags with commas
Removing Components Two ways: 1. Select the component and click “Remove” on the toolbar 2. Right-click the component you want to remove and select “Remove”
Controlling Flow Execution Executes the current flow loaded in the Workspace. Any output from the flow will be displayed in the Output panel. If the flow contains interactive components, they will be displayed automatically. Run Flow Important: Please be sure to set your browser to allow pop-ups from the Workbench, otherwise the web interactive components will not display! Stop Flow Sends a request to the Meandre server to abort the currently executing flow. May take a while – the server waits for components to finish their current operation.
The Repository Panel Three sections: - Components - Flows - Locations Searching is supported Display is customizable
Components Software units that are designed to accomplish a particular task May have inputs, outputs, and properties Components with properties can be identified by a symbol appearing in the lower left-hand side of a component icon.
Flows A Flow is essentially an application — a group of components connected together to perform a certain task Click on the Flows tab in the Repository panel to view the flows in your Workbench. Double click on a flow to load that flow into the Workspace.
Locations Adding a repository location causes all the components and flows hosted at that location to be imported in the user’s private repository on the server You can find a list of available repository locations at Removing a location also removes the associated components and flows from the server.
The Details Panel Shows the properties and description of a selected component or flow Properties Description For components, the Description displays information about the component function. For flows, the Description displays information about the flow and the components it contains and their property values.
The Output Panel Displays output and error messages generated by the Workbench
Using the Workspace Placing Components The first step in building a flow is to choose components from the Repository panel and place them into the Workspace. To place a component, click on the Components section in the Repository panel and drag the desired component over into the Workspace area. Note: A flow must have at least one component with no inputs to be able to be executed by the Meandre server. Selecting Components Components can be selected by single clicking on them in the Workspace. When a component is selected, other selected items are deselected. While selected, a component can be moved about the Workspace or deleted. A selected component (or flow) can be unselected by using CTRL+click on that component (or flow).
Using the Workspace Labeling Components Editing the component label only changes the name of the component in the given flow. The label must remain unique among the other component labels in the flow. The label can be edited by single-clicking on it and entering the desired text. Pressing ESC while editing a label cancels the labeling operation and restores the original label. Connecting and Disconnecting Components To make a connection, click on the output port of the desired source component (the port you clicked will be colored red), and then click on the input port to which you wish to connect. You should now have a line connecting the output and input port. If, after selecting a port, you wish to cancel the operation, simply clicking the same port again will unselect it. The ports of two components should only be connected if their data types are compatible with one another. Any errors resulting from data incompatibilities will occur at runtime. To remove a connection, simply right-click one of the ports and select “Disconnect” from the context menu. Alternatively, you can remove groups of ports by right- clicking the component and selecting the appropriate menu option.
Using the Workspace Connecting and Disconnecting Components A component’s output port may only be connected to one input port. However, a component’s input port may be connected to several different output ports. This could be useful when you are retrieving the same data format from multiple components. The connection line is highlighted if the user hovers over an input or output port. This is useful for verifying connections in a complex flow. When hovering over a component port, the description of that port is also briefly displayed.
Demonstration We will be demonstrating the use of the Workbench for creating flows –Use TagCloudViewer as an example and explain how it was created
Learning Exercises Explore the functionality of the Meandre Workbench Usage of existing components to create a data-driven flow for creating a basic Tag Cloud Viewer flow so they can become familiar with the mechanics of drag-drop, creating connections, setting properties, saving, executing –1. Retrieve text from a url –2. Count the words –3. Visualize with the Tag Cloud Viewer
Learning Exercises Improve the Tag Cloud Flow that you created to "clean" it up a bit –Filter HTML tags from the text –Convert all words to lower case –Remove stop words –Filter to specific number of words
Learning Exercises – Dunnings Extend the tag cloud viewer you just created by performing Dunning Loglikelihood
Discussion Questions What are the possible obstacles for humanities scholars in using an environment like the Meandre Workbench to assemble and create flows for accomplishing their research needs? Are there parts of the workbench that are unclear or that need extra explanation? Do you have any feature requests? Are there any tools that you would like to see componentized such that you can work with these tools in the Meandre Workbench? What are three advantages of using a component driven environment for text analytics?