CERN IT Department CH-1211 Genève 23 Switzerland t Messaging Systems for the Grid Daniel Rodrigues
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary Messaging Systems Overview Monitoring context in the Grid The MSG – Messaging System for Grids Fast Forward
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Before going any further, the philosophy: “Software development trend is to somehow mimic real world!” – Daniel Rodrigues –Procedural Programming Beaureaucracy –Object Oriented World entities and interaction –Aspects Cut through the mess! –Agents Real People. –Messaging Systems Communication It might be sound, image, snailmail, etc.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Why use messaging? –For communicating we could use: File transfer Shared Databases Remote Procedure Invocation Web Services Mail CORBA –They do exist; –They have common ideas; –They share implementations; –You might be using more than one to achieve a result that suits your needs! “Now look, you know different people think about life in different ways. Lawyers think life is a big court room; Doctors probably thinks life is like a big operation; Bus drivers think life is...er...a big bus I guess. Who knows what the hell those guys think. Anyway, I've always thought of life as a big football game...” Black Grape, England’s Irie
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Why use messaging? –Key ideas and benefits: Loosely coupled distributed communication; Exceptional interoperability; Asynchronous; Reliable; Configurable Persistence (just like your tax collector) –Drawbacks: More complex programming model (we do like bureaucracy after all ) Harder to do sequenced and synchronous model Performance? (maybe FTP could do the trick)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Ok, may we finally see a picture? Publisher Consumer
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems That’s all? Enterprise Integration Patterns –Designing, Building and deploying Messaging Solutions –Gregor Hohpe / Bobby Woolf Core Patterns Some not so wild Patterns
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Patterns: Message –Header Routing information Description –Body Data Ignored by the messaging system –EventMessage, CommandMessage, DocumentMessage, RequestReply –Could be SOAP, JMS, Stomp, etc.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Patterns: Message Channel –Point-to-Point Snail Mail Queues –Publish-Subscribe Television/radio Broadcast Topics –DataTypeChannel, InvalidMessageChannel, DeadLetterChannel, ChannelAdapter, MessageBus, MessagingBridge
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Patterns: Message Endpoint –Publisher Gets data from application and creates a message. –Consumer Extracts data from a message and passes it on to the application. –SelectiveConsumer, CompetingConsumer, DurableSubscriber, MessageDispatcher, TransactionalClients, EventDrivenConsumer. –Endpoints either sends or receives messages, and are channel specific. (Ears mouth,eyes are not the same thing)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Other Patterns – Message Routers Message may be routed to different channels depending on its characteristics; Simple Example: use a wild card topic! –grid.usage.transfer.*, where it will be forwarded to grid.usage.transfer. –MessageTranslators Translation at different layers (data structure, types, representation, or transport). e.g. transport protocols: TCP => HTTP => SOAP => JMS –Pipes and Filters Message may need processing in different steps. A Message goes through filtering and pipes that perform different functions (e.g, authN, authZ)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Isn’t it too complex to implement all this? –Indeed. But someone has already done most of the work for you: –Commercial solutions: Tibco Rendezvous, IBM WebSphere MQ, SUN Java Message Service, Microsoft MSMQ, BEA MessageQ, SonicMQ, 29West UME/LBM. –OpenSource providers: Apache ActiveMQ, ObjectWeb JORAM, Open JMS. Each are adequate to different problems. –Integration on different platforms; –Latency concerns; –High throughputs;
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Messaging Systems Where is it used? –Financial Services exchanges, brokerages, hedge funds; –Insurance Companies –Banking Industry –Telecoms Usually embedded in integrated solutions –Enterprise Backbones; WebsphereMQ example (March 2007): – customers –10 billion messages carrying US$1 quadrillion (US$ ) worth of business transactions.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary Messaging Systems Overview Monitoring context in the Grid The MSG – Messaging System for Grids Fast forward
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context How does Message Oriented Middleware fit into the WLCG monitoring context? Grid is a complex infrastructure, with many different services deployed in different environments. We need to monitor the services in order to: –Know when an action to repair is necessary; –Help improve the overall reliability; –Provide stakeholders with current and historical status information. A vast amount of monitoring data is produced –Local fabric monitoring( e.g., Nagios, LEMON) –Remote monitoring (e.g., SAM)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context Who is involved (stakeholders)? –Site Administrators –Grid Operators CIC on Duty Regional Operation center –WLCG Project management –Virtual Organizations WLCG Experiments –Monitoring developers + operators
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context High Level Model: LEMON Nagios SAM R-GMA SAME GridView Experiment Dashboard GridIce HTTP LDAP GOCDB Dashboard GridView
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context WLCG Monitoring Working Group: –Initially focused on stakeholder requirements Distill into a set of architectural principles Propose some new technologies to help –Reuse of standard commodity components –Used to design site-local monitoring prototype –An attempt to extend this to a more global view Knowing that operations model is changing from central to regional/national/local Looking on the architectural principles…
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context Reduce time to respond –“Site administrators are closest to the problems, and need to know about them first” Tell others what you want to know –“If you’re monitoring a site remotely, it’s only polite to give the data to the site” Chris Brew –Remote systems should feed back information to sites Don’t impose systems on sites –Cannot dictate a monitoring system
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context No monolithic systems –Different systems should specialize in their areas of expertise No central bottlenecks –“Local problems detected locally shouldn’t require remote services to work out what the problem is” Specific Visualization for each stakeholder –All are using same underlying data
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context The starting point is what we have now: –Availability testing framework – SAM/RSV –Job and Data reliability monitoring – Gridview –Grid topology – GOCDB/Registration DB –Dynamic view of the grid – BDII/CeMon –Accounting – APEL/Gratia –Experiment views – Dashboards –Fabric monitoring – Nagios, LEMON, … –Grid operations tools – CIC Portal They work together right now –To a certain extent !
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context We need: –Loose coupling of systems –Distributed components –Reliable delivery of messages –Standard methods of communication –Flexibility to add new producers and consumers of the information without having to reconfigure everything Message Oriented Middleware provides this –And is widely used in similar scenarios
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Reliablity and persistence of messaging built into the broker network. –Mitigates the single point of failures we’ve had with previous solutions Monitoring context
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Not a silver bullet –Still can end up with spaghetti Tight specification of interaction of components –Message format specifications –Standard metadata schema –Message Queue naming schemas –Protocols System management is key –You’ve got code for free from the messaging system –But you need to write your management layer Component co-ordination Configuration Message tracing Debugging Monitoring context
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Conclusion –The monitoring context is highly distributed; –Many components could benefit from gathering common information in a reliable, flexible way; –MOM is a way of leveraging the current underlying infrastructures; Monitoring context
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services A real life working example: Monitoring Context
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Monitoring Context WLCG Monitoring – some worked examples - 28 Application Database archiver component Transparent Broker Network Messaging System Adaptor Standard process Standard components
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary Messaging Systems Overview Monitoring context in the Grid The MSG – Messaging System for Grids Fast forward
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG Overview An infrastructure providing an easy way to send messages; –Each message has a well defined format adhering to a message class specification Well defined set of message classes Three main components: –Apache ActiveMQ broker; –msg-publish-simple; –msg-consume2oracle; Using file-based SAN persistency; Publish-Subscribe Channels (Topics) Durable Subscribers
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: Message Message endpoints on a topic should: –Consumers: expect a well formatted message –Producers: send a properly formatted message Message Classes: –To each corresponds a specification –One message may contain multiple records –Each record consists of plain text key-value pairs, terminated by “EOT” –A few fields are mandatory: Consumers are expecting them! –Some fields may be sent as an header (for later filtering using selectors)
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Example: destination: /topic/grid.usage.transfer persistent:true transferProtocol: GridFTP msgEncodedTime: T22:29:57,712Z MSG: Message transferProtocol: GridFTP publishingHost: lxfsrc5807.cern.ch voName: cms srcHost: lxfsrc5807.cern.ch destHost: c2fs008.grid.sinica.edu.tw gridftpStreams: 10 numberBytes: fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_3e6 startTime: T13:17: Z endTime: T13:33: Z userName: cms001 EOT transferProtocol: GridFTP publishingHost: lxfsrc5807.cern.ch voName: cms srcHost: lxfsrc5807.cern.ch destHost: diskserv-san-20.cr.cnaf.infn.it gridftpStreams: 3 numberBytes: fileName: //castor/cern.ch/cms/store/PhEDEx_LoadTest07_4/LoadTest07_CERN_F1 startTime: T13:17: Z endTime: T13:34: Z userName: cms001 EOT
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: Apache ActiveMQ Powerful OpenSource MessageBroker –Currently running v4.1 & v5.1 Message Channels –Publish-Subscribe; –Point to Point; –VirtualDestinations, Wildcards, CompositeDestinations; –Synchronous / Asynchronous sending. Wide range of supported protocols and clients –Open Wire for high performance clients; –STOMP (Simple Text Oriented Protocol); –REST, XMMP, AMQP;
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: Apache ActiveMQ Configurable persistence –JDBC + High performance journal –File based MessageStore (Since 5.0) Clustering –Master/Slave failover Provides High Availability –Network of Brokers Avoid Client/server || hub/spoke single point of failure Store and forward with consumer priority Increasing Scalability Consumers and Producers load balancing Selectors Discovery
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: msg-publish-simple Send messages into the Message Channel –Validates well formatted against message class; –Reassembles records according to selected headers; Very lightweight script –Depends only on Python > 2.3 –Uses python asyncore Designed to run anywhere (e.g. WN’s) –Can use many broker endpoints (will select one which is available) –Use either STOMP or plain HTTP
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: msg-consume2oracle Consumes messages –Creates a durable subscription; –Can read different message classes on different topics (one durable subscription per topic!) Publishes into Oracle. –Extracts records from incoming messages; –Inserts records into an Oracle View, corresponding to the message class definition. –Only need to worry about the trigger! Configurable system management –Publishes back client status information Messages received in a topic; Records inserted of a given message class; Very lightweight script –Depends only on Python > 2.3 –also cx_oracle
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: performance Extensive testing of broker many features under different configurations Test results available on twiki, here are some: Broker ran for 6 weeks with no crashes –50 million messages of several sizes (0 to 10 kB) forwarded to consumers; –12 million incoming messages from producers; –Up to 40 producers/80 consumers; –Stable under irregular testing pattern; Setting persistence limits throughput.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: performance Throughput testing
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: performance Testing persistency
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: performance Testing persistency
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: performance Testing clustering –Fast internal openwire!
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: results Flagship: OSG RSV – SAM bridge –Running since January. –Crashed once, because there were not enough file descriptors configured. Gridview - GridFTP transfers. –Currently publishing from 27 cms t1transfer machines; –In testbed right now, a validation consumer;
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Summary Messaging Systems Overview Monitoring context in the Grid The MSG – Messaging System for Grids Fast Forward –In the monitoring context.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: results Migrating to Regions
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services MSG: results Messaging based archiving & reporting
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Thank you for your attention. Additional Questions?
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Thank you for your attention. Additional Questions?