Big Data Open Source Software and Projects ABDS in Summary XVII: Layer 13 Part 2 Data Science Curriculum March 5 2015 Geoffrey Fox

Slides:



Advertisements
Similar presentations
Eclipse, M2M and the Internet of Things
Advertisements

Overview of Web Services
Distributed Processing, Client/Server and Clusters
Big Data Open Source Software and Projects ABDS in Summary XIV: Level 14B I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary I I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XIX: Layer 14B Data Science Curriculum March Geoffrey Fox
Distributed components
Technical Architectures
Big Data Open Source Software and Projects ABDS in Summary XVI: Layer 13 Part 1 Data Science Curriculum March Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary II: Layers 3 to 4 Data Science Curriculum March Geoffrey Fox
V1.00 © 2009 Research In Motion Limited Introduction to Mobile Device Web Development Trainer name Date.
Big Data Open Source Software and Projects ABDS in Summary XIII: Level 14A I590 Data Science Curriculum August Geoffrey Fox
Big Data Open Source Software and Projects ABDS in Summary XII: Level 13 I590 Data Science Curriculum August Geoffrey Fox
CS 415 N-Tier Application Development By Umair Ashraf July 6,2013 National University of Computer and Emerging Sciences Lecture # 9 Introduction to Web.
Web Services Michael Smith Alex Feldman. What is a Web Service? A Web service is a message-oriented software system designed to support inter-operable.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
Indiana University Faculty Geoffrey Fox, David Crandall, Judy Qiu, Gregor von Laszewski Dibbs Research at Digital Science
Messaging Technologies Group: Yuzhou Xia Yi Tan Jianxiao Zhai.
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Iterative computation is a kernel function to many data mining and data analysis algorithms. Missing in current MapReduce frameworks is collective communication,
© 2006 IBM Corporation SOA on your terms and our expertise Software Overview IBM WebSphere Message Broker Extender for TIBCO RV.
.NET, and Service Gateways Group members: Andre Tran, Priyanka Gangishetty, Irena Mao, Wileen Chiu.
By Mihir Joshi Nikhil Dixit Limaye Pallavi Bhide Payal Godse.
Networks – Network Architecture Network architecture is specification of design principles (including data formats and procedures) for creating a network.
Lecture 3: Sun: 16/4/1435 Distributed Computing Technologies and Middleware Lecturer/ Kawther Abas CS- 492 : Distributed system.
Enterprise Java Beans Java for the Enterprise Server-based platform for Enterprise Applications Designed for “medium-to-large scale business, enterprise-wide.
BIG DATA APPLICATIONS & ANALYTICS LOOKING AT INDIVIDUAL HPCABDS SOFTWARE LAYERS 1/26/2015 Cloud Computing Software 1 Geoffrey Fox January BigDat.
Big Data Open Source Software and Projects ABDS in Summary I: Layers 1 to 2 Data Science Curriculum March Geoffrey Fox
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Harp: Collective Communication on Hadoop Bingjing Zhang, Yang Ruan, Judy Qiu.
Big Data Open Source Software and Projects ABDS in Summary XVIII: Layer 14A Data Science Curriculum March Geoffrey Fox
Message Queuing
Distribution and components. 2 What is the problem? Enterprise computing is Large scale & complex: It supports large scale and complex organisations Spanning.
XML and Web Services (II/2546)
Hwajung Lee.  Interprocess Communication (IPC) is at the heart of distributed computing.  Processes and Threads  Process is the execution of a program.
CORBA1 Distributed Software Systems Any software system can be physically distributed By distributed coupling we get the following:  Improved performance.
Ipgdec5-01 Remarks on Web Services PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce, Shrideep Pallickara, Choonhan Youn Computer Science,
Big Data Open Source Software and Projects ABDS in Summary IV: Level 7 I590 Data Science Curriculum August Geoffrey Fox
Kemal Baykal Rasim Ismayilov
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Big Data Directions Greg.
1 Pertemuan 5 Networking Models. Discussion Topics Using layers to analyze problems in a flow of materials Using layers to describe data communication.
AMQP, Message Broker Babu Ram Dawadi. overview Why MOM architecture? Messaging broker like RabbitMQ in brief RabbitMQ AMQP – What is it ?
Big Data Open Source Software and Projects ABDS in Summary XII: Level 13 I590 Data Science Curriculum August Geoffrey Fox
Panel Discussion Software Defined Ecosystems June BigSystem Software-Defined Ecosystems at HPDC Vancouver Canada Geoffrey Fox.
Big Data Open Source Software and Projects ABDS in Summary II: Layer 5 I590 Data Science Curriculum August Geoffrey Fox
SALSASALSA Large-Scale Data Analysis Applications Computer Vision Complex Networks Bioinformatics Deep Learning Data analysis plays an important role in.
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
Distributed Systems Architecure. Architectures Architectural Styles Software Architectures Architectures versus Middleware Self-management in distributed.
BIG DATA/ Hadoop Interview Questions.
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Chapter 9 – RPCs, Messaging & EAI
University of Technology
connectivity | autonomous | electrification | architecture
connectivity | autonomous | electrification | architecture
I590 Data Science Curriculum August
Data Science Curriculum March
Scalable Parallel Interoperable Data Analytics Library
Inventory of Distributed Computing Concepts
Evolution of messaging systems and event driven architecture
Big Data Open Source Software and Projects ABDS in Summary I
Department of Intelligent Systems Engineering
Message Queuing.
Enterprise Integration
Big Data, Simulations and HPC Convergence
Motivation Contemporary big data tools such as MapReduce and graph processing tools have fixed data abstraction and support a limited set of communication.
Convergence of Big Data and Extreme Computing
I590 Data Science Curriculum August
Presentation transcript:

Big Data Open Source Software and Projects ABDS in Summary XVII: Layer 13 Part 2 Data Science Curriculum March Geoffrey Fox School of Informatics and Computing Digital Science Center Indiana University Bloomington

Functionality of 21 HPC-ABDS Layers 1)Message Protocols: 2)Distributed Coordination: 3)Security & Privacy: 4)Monitoring: 5)IaaS Management from HPC to hypervisors: 6)DevOps: 7)Interoperability: 8)File systems: 9)Cluster Resource Management: 10)Data Transport: 11)A) File management B) NoSQL C) SQL 12)In-memory databases&caches / Object-relational mapping / Extraction Tools 13)Inter process communication Collectives, point-to-point, publish-subscribe, MPI: Part 2 14)A) Basic Programming model and runtime, SPMD, MapReduce: B) Streaming: 15)A) High level Programming: B) Application Hosting Frameworks 16)Application and Analytics: 17)Workflow-Orchestration: Here are 21 functionalities. (including 11, 14, 15 subparts) 4 Cross cutting at top 17 in order of layered diagram starting at bottom

Messaging Standards

JMS, AMQP, Stomp, MQTT I These are messaging standards typically used to describe messages sent from clients to brokers (like Kafka) where they are queued and inspected by subscribers JMS Java Message Service was historically very important as first broadly used/available technology for “message oriented middleware” although commercial solutions like “MQ Series” from IBM were available earlier. JMS is supported by many brokers like RabbitMQ that also support the protocols AMQP, Stomp, MQTT JMS is an API i.e. defines how accessed in software but does not define nature of messages so different implementations can not be mixed AMQP, Stomp, MQTT your-messaging-protocol-amqp-mqtt-or-stomp.html are protocols i.e. nature of messages is defined so messages in these protocols should be interoperable between implementations. your-messaging-protocol-amqp-mqtt-or-stomp.html

JMS, AMQP, Stomp, MQTT II AMQP, which stands for Advanced Message Queuing Protocol, was designed as an open replacement for existing proprietary messaging middleware. Two of the most important reasons to use AMQP are reliability and interoperability. is an OASIS standard AMQP is a binary wire protocol which was designed for interoperability between different vendors. Where other protocols have failed, AMQP adoption has been strong. Companies like JP Morgan use it to process 1 billion messages a day. As the name implies, it provides a wide range of features related to messaging, including reliable queuing, topic-based publish-and-subscribe messaging, flexible routing, transactions, and security. AMQP exchanges route messages by topic, and also based on headers. There’s a lot of fine-grained control possible with such a rich feature set. You can restrict access to queues, manage their depth, and more. Features like message properties, annotations and headers make it a good fit for a wide range of enterprise applications. This protocol was designed for reliability at the many large companies who depend on messaging to integrate applications and move data around their organization. In the case of RabbitMQ, there are many different language implementations and great samples available, making it a good choice for building large scale, reliable, resilient, or clustered messaging infrastructures.

JMS, AMQP, Stomp, MQTT III STOMP is the Simple Text Oriented Messaging Protocolhttp://stomp.github.io/ Like AQMP, STOMP provides an interoperable wire format so that STOMP clients can communicate with any STOMP message broker to provide easy and widespread messaging interoperability among many languages, platforms and brokers. STOMP is a very simple and easy to implement protocol, coming from the HTTP school of design; the server side may be hard to implement well, but it is very easy to write a client to get yourself connected. STOMP does not, however, deal in queues and topics—it uses a SEND semantic with a “destination” string. The broker must map onto something that it understands internally such as a topic, queue, or exchange. Consumers then SUBSCRIBE to those destinations. Since those destinations are not mandated in the specification, different brokers may support different types of destination. So, it’s not always straightforward to port code between brokers.

JMS, AMQP, Stomp, MQTT IV MQTT (Message Queue Telemetry Transport) is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol standardized in OASIShttp://mqtt.org/ It was originally developed out of IBM’s pervasive computing team and their work with partners in the industrial sector. Over the past couple of years the protocol has been moved into the open source community, seen significant growth in popularity as mobile applications have taken off. The design principles and aims of MQTT are much more simple and focused than those of AMQP—it provides publish-and-subscribe messaging (no queues, in spite of the name) and was specifically designed for resource-constrained devices and low bandwidth, high latency networks such as dial up lines and satellite links, for example. Basically, it can be used effectively in embedded systems. One of the advantages MQTT has over more full-featured “enterprise messaging” brokers is that its intentionally low footprint makes it ideal for today’s mobile and developing “Internet of Things” style applications. In fact, companies like Facebook are using it as part of their mobile applications because it has such a low power draw and is light on network bandwidth. Some of the MQTT-based brokers support many thousands of concurrent device connections. It offers three qualities of service: 1) fire-and-forget / unreliable,2) “at least once” to ensure it is sent a minimum of one time (but might be sent more than one time), and 3) “exactly once”. MQTT’s strengths are simplicity (just five API methods), a compact binary packet payload (no message properties, compressed headers, much less verbose than something text-based like HTTP), and it makes a good fit for simple push messaging scenarios such as temperature updates, stock price tickers, oil pressure feeds or mobile notifications. It is also very useful for connecting machines together, such as connecting an Arduino device to a web service with MQTT.

Inter process communication

MPI MPI or Message Passing Interface is dominant parallel computing technology developed by a community standardization effort – the MPI forum – buildings many previous technologies MPI supports two basic types of communication: Point to Point and Collective Collective communication involves many to many, one to many (Scatter) or many to one (gather) communication. It also involves reduction operations i.e. ability to form add, min, max etc. operations as messages flow through system. MPI includes rich variety of messaging semantics allowing communication and computation to be overlapped or not It also includes one sided operations: write to remote memory (put), a read from remote memory (get) MPI-IO supports parallel access to disk storage MPI just finished its third revision

Harp Hadoop Plugin (on Hadoop and Hadoop 2.2.0) from Indiana University Hierarchical data abstraction on arrays, key-values and graphs for easy programming expressiveness. Collective communication model to support various communication operations on the data abstractions (will extend to Point to Point) Caching with buffer management for memory allocation required from computation and communication BSP style parallelism Fault tolerance with checkpointing Parallelism Model Architecture Shuffle M M MM Optimal Communication M M MM RR Map-Collective or Map- Communication Model MapReduce Model YARN MapReduce V2 Harp MapReduce Applications Map-Collective or Map- Communication Applications Application Framework Resource Manager Work of Judy Qiu Bingjing Zhang