Unclassified// Protected Information// Proprietary Information Building Applications with the MeDICi Integration Framework Scientist/Analyst Miners Plumbers Tool Builders Ian Gorton, Justin Almquist, Jack Chatterton, Adam Wynne
Unclassified// Protected Information// Proprietary Information 2 OutlineOutline What is the MeDICi Integration Framework (MIF)? What can the MIF do for you? How does it do it? What’s available right now and it better be fast … How do I get started?
Unclassified// Protected Information// Proprietary Information 3 What is the MeDICi Integration Framework (MIF)? Java-based integration technology Component-based API for creating analytical pipelines Asynchronous component model for Java or non-Java (eg.exe, C/C++, R, Haskell, etc) codes (flexible) Components can be distributed or run in MIF container (scalable) Component communicate over a variety of protocols (e.g. JMS, Web Services, sockets, etc) (configurable) Non-pipeline architectures supported (e.g. feedback loops, worker pools) Built on robust, industry-tested Java technologies Mule (ESB/SOA compliant) JMS (eg JBoss, ActiveMQ, SonicMQ) ehcache
Unclassified// Protected Information// Proprietary Information 4 Filter Calc1 Proxy Merge Viz Proxy DB Query Format Useful Code Reference Database Results Database Calc1 Data Example analytical pipeline message flow
Unclassified// Protected Information// Proprietary Information 5 What can the MIF do for you? Provide a common API for designing components Make downstream integration straightforward Make iterative development and integration testing easy Make it easy to create applications: using new/legacy components that were not designed to work together that must execute in a distributed environment Support flexible deployments Components are loosely-coupled Components can be configured to suit deployment needs MIF API MIF API MIF API MIF API MIF API New MIF API
Unclassified// Protected Information// Proprietary Information 6 How does it do it? Some components execute in MIF container – Java Some execute outside MIF container – language (who cares) MIF containers can be partitioned/replicated Filter Calc1 Proxy Merge Viz Proxy DB Query Format Useful Code Distributed Component code Configurable protocol
Unclassified// Protected Information// Proprietary Information 7 Scaling MIF applications Filter DB Query Format Useful Code Filter DB Query Format Useful Code Filter DB Query Format Useful Code Replicated MIF Partitioned MIF
Unclassified// Protected Information// Proprietary Information 8 Example: Calculating Functional Overrepresentation Pipeline
Unclassified// Protected Information// Proprietary Information 9 Component Composition in MIF Single Module Simple Pipeline MIF Component
Unclassified// Protected Information// Proprietary Information 10 Chat Traffic Analysis Example A “real-world” example application Analysis of chat messages Utilizes many MIF constructs: Pipeline Components Modules Aggregators Routing Endpoints Package structure
Unclassified// Protected Information// Proprietary Information 11 Chat Traffic Analysis Model
Unclassified// Protected Information// Proprietary Information 12 Chat Example Code - Main Create a pipeline Setup the pipeline endpoints (input & output to application) MifPipeline pipeline = new MifPipeline(); MifEndpoint inEndp = pipeline.addMifEndpoint("inEndp", EndpointType.JMS, "topic/ChatDataTopic"); MifEndpoint outEndp = pipeline.addMifEndpoint("outEndp", EndpointType.STREAM, "console.out?outputMessage=CHAT RESULT: "); Wire the pipeline and start listening for messages Map endps = new HashMap (); endps.put("chat-in", inEndp); endps.put("chat-out", outEndp); pipeline.addMifComponent(new ChatComponent("ChatComponent", endps)); pipeline.start();
Unclassified// Protected Information// Proprietary Information 13 Chat Example Code - Component Get the input/output endpoints Ingest (subset) MifEndpoint inChatEndp = getEndpoint("chat-in"); MifEndpoint outChatEndp = getEndpoint("chat-out"); //construct the ingest module MifEndpoint outIngestKeywordEndp = pipeline.addMifEndpoint("outIngestKeywordEndp", EndpointType.VM, "ingest.keyword.queue"); MifModule ingestModule = new MifModule("IngestModule", Ingest.class.getName(), inChatEndp, outIngestKeywordEndp, null); //add the module to the pipeline pipeline.addMifModule(ingestModule); Keyword MifEndpoint inKeywordEndp = pipeline.addMifEndpoint("inKeywordEndp", EndpointType.VM, "ingest.keyword.queue"); MifEndpoint outKeywordEndp = pipeline.addMifEndpoint("outKeywordEndp", EndpointType.VM, "keyword.queue"); pipeline.addMifModule("KeywordModule", Keyword.class.getName(), inKeywordEndp, outKeywordEndp, null);
Unclassified// Protected Information// Proprietary Information 14 Chat Example Code – Component cntd… Get the input/output endpoints MifEndpoint inKeywordAggEndp = pipeline.addMifEndpoint("inKeywordAggEndp", EndpointType.VM, "keyword.queue"); // create the aggregator module which is just a place holder for the actual aggregator construct. Note that this // is the final module in the component so the outbound endpoint is one specified outside // the component (outChatEndp). MifModule chatAggregateModule = new MifModule("AggregateModule", ChatAggregate.class.getName(), inKeywordAggEndp, outChatEndp, null); // Add the aggregator to the pipeline and assign it to the module itself MifAggregator chatAnalysisAggregator = pipeline.addMifAggregator(new ChatAnalysisAggregator()); chatAggregateModule.setAggregator(chatAnalysisAggregator); // finally, add the module to the pipeline and we're done configuring the component. pipeline.addMifModule(chatAggregateModule);
Unclassified// Protected Information// Proprietary Information 15 Chat Example Code – Processing Module Blackout.java Delegate to “real” implementation blackout.processContentAnalysis(message); public class Blackout implements MifInOutProcessor { Logger log = Logger.getLogger(Blackout.class); private String pathToBlackoutFile = "blackout.txt";String private static BlackoutId blackout = null; public Blackout() { initBlackout(); } public Serializable listen(Serializable input) {Serializable MapWrapper data = (MapWrapper) input; HashMap message = data.getMap(); HashMap if(blackout != null) { blackout.processContentAnalysis(message); } return new MapWrapper(message); } … … … }
Unclassified// Protected Information// Proprietary Information 16 What’s available right now? The MIF API Used/tested in several DICI projects Documentation Hooks for connecting to our provenance technology …
Unclassified// Protected Information// Proprietary Information 17 Capturing Provenance Metadata about workflows What processes ran What data we used in each step MIF API has extensions to communicate provenance data Asynchronous JMS events Current implementation captures raw in/out data Useful but not scalable Designing a data virtualization layer to support refs from provenance to real data PNNL Provenance Architecture
Unclassified// Protected Information// Proprietary Information 18 Using Provenance
Unclassified// Protected Information// Proprietary Information 19 and it better be fast … And of course scalable So we created a benchmark A friction test A measure of ‘middleware’ overhead
Unclassified// Protected Information// Proprietary Information 20 But you can trust us – we’re scientists 1650 m/sec for 1K messages Scales linearly to 7 servers Peak throughput of 5.4TB/day for 128K messages on 2 servers that rate swamped the cluster switch – hardware limitation! 290 m/sec on 1 server (3.3 TB/day throughput) Grove specs 9 nodes All connected via a single 1Gb switch Hardware 1 Dell 2850 connected to RAID 8 Dell 1850 Dual Intel Xeon processors 3.0 GHz 4GB RAM 1 ~5TB Software Red Hat Enterprise Linux 4 Linux kernel Elsmp SonicMQ 7,5 java version "1.6.0_03"
Unclassified// Protected Information// Proprietary Information 21 How do I get started? We have a wiki - medici.pnl.gov/wiki API docs and installation guide Examples Design and programming guidelines More being added every day :-} And we’re available to help Initial adoption/design Support ‘Consulting’
Unclassified// Protected Information// Proprietary Information 22 And finally - the MeDICi ‘Vision’
Unclassified// Protected Information// Proprietary Information 23 That’s all folks! We believe that the MIF can: Help you deliver high quality solutions to clients Faster, cheaper, especially for ‘integration’ projects Help you easily leverage other internal/external codes in your solutions Give us a ‘lingua franca’ – a step towards wide-scale component reuse But we’re just humble plumbers … We need application partners to deliver to clients You take the kudos, we write invisible plumbing sat in dark corners … We need feedback on how to improve the technology
Unclassified// Protected Information// Proprietary Information 24 Questions?Questions?