Download presentation
Presentation is loading. Please wait.
Published byJudith Bridges Modified over 9 years ago
1
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March 5 2015 Geoffrey Fox gcf@indiana.edu http://www.infomall.org School of Informatics and Computing Digital Science Center Indiana University Bloomington
2
Functionality of 21 HPC-ABDS Layers 1)Message Protocols: 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: 6)DevOps: 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)A) File management B) NoSQL C) SQL 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe, MPI: Part 1 14)A) Basic Programming model and runtime, SPMD, MapReduce: B) Streaming: 15)A) High level Programming: B) Application Hosting Frameworks 16)Application and Analytics: 17)Workflow-Orchestration: Here are 21 functionalities. (including 11, 14, 15 subparts) 4 Cross cutting at top 17 in order of layered diagram starting at bottom
3
Publish-Subscribe Technology Helped by Supun Kamburugamuve
4
Apache Kafka (LinkedIn) http://kafka.apache.org/ Apache Kafka is a message brokering system designed for high throughput and low latency, as well as a high level of durability and fault tolerance. It uses a publish/subscribe model, where producers publish messages to a cluster of brokers, consumers poll messages from the brokers. Kafka was originally developed to track web site activity such as page views and searches, but it has been applied to varied activities. Kafka was developed at LinkedIn, and was made open source in 2011. It became a top level Apache project in 2012. Kafka uses another Apache project, Zookeeper, to maintain its broker clusters.
5
Apache ActiveMQ http://activemq.apache.org/ Apache ActiveMQ is a very popular publish-subscribe message broker Mainly supports the JMS 1.1. standard. JMS is an API and not a wire protocol. Has its own wire protocol called OpenWire and supports standard protocols like Stomp and MQTT. ActiveMQ supports clustered brokers and networked brokers for scalability and fault tolerance. Producers and consumers can be written in different programming languages
6
RabbitMQ http://www.rabbitmq.com/ RabbitMQ is an open source messaging framework under the Mozilla Public License. Developed to support the AMQP open messaging standard and has support available for other protocols like MQTT. Developed using erlang programming language and boasts on its high throughput and low latency. RabbitMQ supports clustering for fault tolerance and scalability Any AMQP compliant client can publish messages to RabbitMQ as well as consume messages. Python interface py-librabbitmq used by Celery
7
Apache Qpid https://qpid.apache.org/ Qpid is the Apache project implementing the AMQP protocol. The broker is mainly written in Java and has clients written in different programming languages Support clustering for high availability
8
Kestrel http://robey.github.io/kestrel/ Kestrel is a simple distributed messaging queue. The nodes in a Kestrel cluster doesn’t communicate with each other, resulting in loosely ordered queues across the cluster. Because of this simple design Kestrel can scale to thousands of nodes. The project is developed at Twitter and available in Github. Kestrel supports memcache protocol and thrift based protocol for sending and receiving messages
9
ZeroMQ http://zeromq.org/ ZeroMQ is a embeddable library for creating custom messaging solutions for applications Provides sockets that can be used to do inter-process, intra-process, TCP multicast messaging. The sockets can be connected 1 to 1, N to N with patterns like fan-out, pub-sub etc. The library is asynchronous in nature and provide very efficient and fast communication channels to the applications. Primarily developed in C but the library can be used from difference programming languages like Java, PHP etc.
10
Netty http://netty.io/ Netty is a NIO based Java framework which enables easy development of high performance network applications like protocol Servers and Clients. Netty was developed at Red Hat JBoss and now available in Github under the Apache license version 2.0. The library provides out of the box support for popular application protocols like HTTP. The library can be used to build custom transport protocols using TCP or UDP.
11
NaradaBrokering (NB) http://www.naradabrokering.org/ Development stopped in 2009 and ideas further developed into Granules (layer 14B) For 10 years NB was state of art in exploring publish subscribe systems and their use in collaboration and computing but now other systems have caught up as described in this slide deck NB supported security, robustness (fault-tolerance), quality of service (such as variation in arrival time, synchronization of streams, delivery order), Web Services. Multiple transport protocols, high performance, distributed efficient set of multiple cooperating brokers, sophisticated topic specification and search
12
Public Cloud Pub/Sub Google Cloud Pub Sub https://developers.google.com/pubsub is a publish-subscribe messaging system offered by Google as a cloud service https://developers.google.com/pubsub – Supports many to many, one to many and many to one communications. – The publishers can use the HTTP API for sending the data. – The subscribers can use a pull based API or push based API for receiving the data. – The service is available as a developer preview and free of charge. – Part of Google Cloud Dataflow that also has FlumeJava and Google MillWheel See Simple Notification Service (Amazon SNS) http://aws.amazon.com/sns/ for Amazon equivalent and http://aws.amazon.com/sns/ Azure Queues and Service Bus Queues (advanced functionality) http://msdn.microsoft.com/en- us/library/azure/hh767287.aspx for Azure equivalenthttp://msdn.microsoft.com/en- us/library/azure/hh767287.aspx See Azure Event Hubs later
13
~2010 Comparison Important Brokers changed since then
14
Azure Event Hubs I This is used by Azure Stream Analytics It appears to be built on Azure service-bus publish subscribe messaging system It has usual feature of supporting input and output to Stream Analytics with multiple subscribers
15
Azure Event Hubs II These figures show that a given event hub can support multiple stream hosts Also the event hub brokers are auto scaled in response to increased load
16
Amazon Lambda Event triggered computing http://aws.amazon.com/lambda/ http://aws.amazon.com/lambda/ AWS Lambda can automatically run code in response to modifications to objects in Amazon S3 buckets, messages arriving in Amazon Kinesis streams, or table updates in Amazon DynamoDB. At launch AWS Lambda supports user defined functions written in Node.js (JavaScript). Your code can include existing Node.js libraries, even native ones. Lambda has scaling, fault tolerance, security, automated administration Could also/instead be in layer 14B as a stream programming model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.