Fundamental Operations Scalability and Speedup

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Distributed Processing, Client/Server and Clusters
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Chapter 13 (Web): Distributed Databases
Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame Cloud Computing and Applications (CCA-08) University.
Coda file system: Disconnected operation By Wallis Chau May 7, 2003.
Deconstructing Clusters for High End Biometric Applications NSF CCF June Douglas Thain and Patrick Flynn University of Notre Dame 5 August.
Enterprise Applications & Java/J2EE Technologies Dr. Douglas C. Schmidt Professor of EECS.
Cooperative Computing for Data Intensive Science Douglas Thain University of Notre Dame NSF Bridges to Engineering 2020 Conference 12 March 2008.
An Introduction to Grid Computing Research at Notre Dame Prof. Douglas Thain University of Notre Dame
Distributed Computations MapReduce
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Programming Distributed Systems with High Level Abstractions Douglas Thain University of Notre Dame 23 October 2008.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
MapReduce M/R slides adapted from those of Jeff Dean’s.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
Distributed Framework for Automatic Facial Mark Detection Graduate Operating Systems-CSE60641 Nisha Srinivas and Tao Xu Department of Computer Science.
N. GSU Slide 1 Chapter 05 Clustered Systems for Massive Parallelism N. Xiong Georgia State University.
Advanced Principles of Operating Systems (CE-403).
Large Scale Parallel File System and Cluster Management ICT, CAS.
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
VMware vSphere Configuration and Management v6
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
1 Christopher Moretti – University of Notre Dame 4/30/2008 High Level Abstractions for Data-Intensive Computing Christopher Moretti, Hoang Bui, Brandon.
Shared Nothing Architecture Allen Archer. What is Shared Nothing architecture? It is a distributed architecture in which each node is independent and.
Seminar On Rain Technology
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
BIG DATA/ Hadoop Interview Questions.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.
Hadoop Javad Azimi May What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data. It includes:
OPERATING SYSTEM CONCEPT AND PRACTISE
Scaling Network Load Balancing Clusters
Lecture 2: Performance Evaluation
Table General Guidelines for Better System Performance
Chapter 1: Introduction
Applied Operating System Concepts
Managing Multi-User Databases
Hadoop Aakash Kag What Why How 1.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Client/Server Databases and the Oracle 10g Relational Database
Cloud computing-The Future Technologies
Maximum Availability Architecture Enterprise Technology Centre.
CT1303 LAN Rehab AlFallaj.
Security Engineering.
Introduction to NewSQL
Software Engineering Introduction to Apache Hadoop Map Reduce
Introduction to client/server architecture
Chapter 1: Introduction
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Storage Systems for Managing Voluminous Data
A Cloud System for Machine Learning Exploiting a Parallel Array DBMS
An Introduction to Computer Networking
Computer Architecture
Fault Tolerance Distributed Web-based Systems
Ch 4. The Evolution of Analytic Scalability
Hadoop Technopoints.
Operating Systems : Overview
Table General Guidelines for Better System Performance
Operating Systems : Overview
Cloud Computing Architecture
Subject Name: Operating System Concepts Subject Number:
Database System Architectures
Abstractions for Fault Tolerance
Presentation transcript:

Fundamental Operations Scalability and Speedup DataLab: Active Storage for Data-Driven Scientific Computing Brandon Rich and Douglas Thain, University of Notre Dame Many data-intensive scientific computing applications are constrained not by the amount of CPU cycles available, but by the ability of the I/O system to deliver data. To serve such applications, we present DataLab, an active storage solution in which a cluster of conventional machines is used primarily for its aggregate I/O capacity. Each node, called an active storage unit (ASU) is equipped with a local disk and processing capability. Large data sets are partitioned across the distributed storage units. Small programs are then dispatched to the location of the data that they wish to process, rather than vice versa. Fundamental Operations Apply F on A into B,C,D Select F from B into D Compare F on X and Y into Z Example Application in Biometrics: Convert 58,000 iris images from TIFF to BMP. Select all images with a particular artifact. Reduce all of those into a feature space. Compare all features against each other to produce a matrix. Retrieve matrix of values from the system. Why Active Storage? After deploying data once, you never have to move it again Processing data is a matter of transmitting the function, not moving data After deploying data once, you never have to move it again Processing data is a matter of transmitting the function, not moving data Useful Abstractions Create typed sets and populate them with data files Define functions to act on that data Apply, select, or compare (see figure at right) Output can be another data set or a report of results Architecture Fault Tolerance Job are transaction-oriented, with fault tolerance at both job and set level We log and report errors on each individual execution Our underlying execution model re-attempts failed executions We maintain state information during job startup so that interrupted jobs may be resumed (see figure below) What Comes Next? Better ways to accommodate adding or removing hosts from the pool Data duplication to avoid data loss and facilitate runtime job optimization Failure Recovery Scalability and Speedup Tiff to BMP image conversion of 58,000 images over n hosts. Superlinear scalability on fast cluster, sublinear on workstations. Job Startup has three phases Generate an execution plan for each host and record the job in the database Distribute the execution batch to each host ; “begin” the host’s work without starting Commit the execution on each host Jobs that fail during any phase can be resumed by restarting the client Presented at High Performance Desktop Computing, 23 June 2008. This work was supported by National Science Foundation Grants CCF-0621434 and CNS-0643229 Web address: http://www.cse.nd.edu/~ccl