Fundamental Operations Scalability and Speedup

Slides:

Advertisements

Similar presentations

Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Advertisements

Distributed Processing, Client/Server and Clusters

MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.

Chapter 13 (Web): Distributed Databases

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.

Coda file system: Disconnected operation By Wallis Chau May 7, 2003.

Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.

Enterprise Applications & Java/J2EE Technologies Dr. Douglas C. Schmidt Professor of EECS.

Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.

An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame

Distributed Computations MapReduce

The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin

Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.

CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.

Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)

Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.

Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.

MapReduce M/R slides adapted from those of Jeff Dean’s.

The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.

Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.

N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.

Advanced Principles of Operating Systems (CE-403).

Large Scale Parallel File System and Cluster Management ICT, CAS.

CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.

VMware vSphere Configuration and Management v6

HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.

Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.

3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.

1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.

Shared Nothing Architecture Allen Archer. What is Shared Nothing architecture? It is a distributed architecture in which each node is independent and.

Seminar On Rain Technology

RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.

BIG DATA/ Hadoop Interview Questions.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.

Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:

OPERATING SYSTEM CONCEPT AND PRACTISE

Scaling Network Load Balancing Clusters

Lecture 2: Performance Evaluation

Table General Guidelines for Better System Performance

Chapter 1: Introduction

Applied Operating System Concepts

Managing Multi-User Databases

Hadoop Aakash Kag What Why How 1.

By Chris immanuel, Heym Kumar, Sai janani, Susmitha

Client/Server Databases and the Oracle 10g Relational Database

Cloud computing-The Future Technologies

Maximum Availability Architecture Enterprise Technology Centre.

CT1303 LAN Rehab AlFallaj.

Security Engineering.

Introduction to NewSQL

Software Engineering Introduction to Apache Hadoop Map Reduce

Introduction to client/server architecture

Chapter 1: Introduction

MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner

Storage Systems for Managing Voluminous Data

A Cloud System for Machine Learning Exploiting a Parallel Array DBMS

An Introduction to Computer Networking

Computer Architecture

Fault Tolerance Distributed Web-based Systems

Ch 4. The Evolution of Analytic Scalability

Hadoop Technopoints.

Operating Systems : Overview

Table General Guidelines for Better System Performance

Operating Systems : Overview

Cloud Computing Architecture

Subject Name: Operating System Concepts Subject Number:

Database System Architectures

Abstractions for Fault Tolerance

Presentation transcript:

Fundamental Operations Scalability and Speedup DataLab: Active Storage for Data-Driven Scientific Computing Brandon Rich and Douglas Thain, University of Notre Dame Many data-intensive scientific computing applications are constrained not by the amount of CPU cycles available, but by the ability of the I/O system to deliver data. To serve such applications, we present DataLab, an active storage solution in which a cluster of conventional machines is used primarily for its aggregate I/O capacity. Each node, called an active storage unit (ASU) is equipped with a local disk and processing capability. Large data sets are partitioned across the distributed storage units. Small programs are then dispatched to the location of the data that they wish to process, rather than vice versa. Fundamental Operations Apply F on A into B,C,D Select F from B into D Compare F on X and Y into Z Example Application in Biometrics: Convert 58,000 iris images from TIFF to BMP. Select all images with a particular artifact. Reduce all of those into a feature space. Compare all features against each other to produce a matrix. Retrieve matrix of values from the system. Why Active Storage? After deploying data once, you never have to move it again Processing data is a matter of transmitting the function, not moving data After deploying data once, you never have to move it again Processing data is a matter of transmitting the function, not moving data Useful Abstractions Create typed sets and populate them with data files Define functions to act on that data Apply, select, or compare (see figure at right) Output can be another data set or a report of results Architecture Fault Tolerance Job are transaction-oriented, with fault tolerance at both job and set level We log and report errors on each individual execution Our underlying execution model re-attempts failed executions We maintain state information during job startup so that interrupted jobs may be resumed (see figure below) What Comes Next? Better ways to accommodate adding or removing hosts from the pool Data duplication to avoid data loss and facilitate runtime job optimization Failure Recovery Scalability and Speedup Tiff to BMP image conversion of 58,000 images over n hosts. Superlinear scalability on fast cluster, sublinear on workstations. Job Startup has three phases Generate an execution plan for each host and record the job in the database Distribute the execution batch to each host ; “begin” the host’s work without starting Commit the execution on each host Jobs that fail during any phase can be resumed by restarting the client Presented at High Performance Desktop Computing, 23 June 2008. This work was supported by National Science Foundation Grants CCF-0621434 and CNS-0643229 Web address: http://www.cse.nd.edu/~ccl