DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

Weed File System Simple and highly scalable distributed file system (NoFS)
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. GridScaler™ Overview Vic Cornell Application Support Consultant.
Database Architectures and the Web
Distributed Hyperscale Collaborative Storage
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Introduction to Databases
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
A Very Brief Introduction to iRODS
Chapter 9 Designing Systems for Diverse Environments.
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Chapter 4 Chapter 4: Planning the Active Directory and Security.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Network Printing. Printer sharing Saves money by only needing one printer Increases efficiency of managing resources.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 7: Planning a DNS Strategy.
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Understanding Active Directory
Printing Terminology. Requirements for Network Printing At least one computer to operate as the print server Sufficient RAM to process documents Sufficient.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Chapter 5 Roles and features. objectives Performing management tasks using the Server Manager console Understanding the Windows Server 2008 roles Understanding.
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
Windows Server MIS 424 Professor Sandvig. Overview Role of servers Performance Requirements Server Hardware Software Windows Server IIS.
1 The Google File System Reporter: You-Wei Zhang.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Performance Testing of DDN WOS Boxes Shaun de Witt, Roger Downing Future of Big Data Workshop June 27 th 2013.
Windows 2003 Overview Lecture 1. Windows Networking Evolution Windows for Workgroups – peer-to-peer networking built into the OS Windows NT – separate.
Microsoft Active Directory(AD) A presentation by Robert, Jasmine, Val and Scott IMT546 December 11, 2004.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
Session-8 Data Management for Decision Support
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Module 7 Active Directory and Account Management.
Active Directory Harikrishnan V G 18 March Presentation titlePage 2 Agenda ► Introduction – Active Directory ► Directory Service ► Benefits of Active.
Ddn.com ©2012 DataDirect Networks. All Rights Reserved. The Future of Cloud Infrastructure Cloud Scale Storage Jean-Luc Chatelain EVP, Strategy and Technology.
1 Administering Shared Folders Understanding Shared Folders Planning Shared Folders Sharing Folders Combining Shared Folder Permissions and NTFS Permissions.
Database Architectures Database System Architectures Considerations – Data storage: Where do the data and DBMS reside? – Processing: Where.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Introduction to Microsoft Windows 2000 Welcome to Chapter 1 Windows 2000 Server.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Introduction to The Storage Resource.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Data Evolution: 101. Parallel Filesystem vs Object Stores Amazon S3 CIFS NFS.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
CEG 2400 FALL 2012 Windows Servers Network Operating Systems.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Next Generation of Apache Hadoop MapReduce Owen
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
An Introduction to GPFS
1 © 2014 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
File Systems for Cloud Computing Chittaranjan Hota, PhD Faculty Incharge, Information Processing Division Birla Institute of Technology & Science-Pilani,
Introduction to Distributed Platforms
CSS534: Parallel Programming in Grid and Cloud
Introduction to Data Management in EGI
Cloud based Open Source Backup/Restore Tool
Design Unit 26 Design a small or home office network
Hadoop Technopoints.
Administering Your Network
Presentation transcript:

DDN Web Object Scalar for Big Data Management Shaun de Witt, Roger Downing (STFC) Glenn Wright (DDN)

Topics A brief introduction to Object Stores What is WOS –Additional features Integration with Other Tools Some performance testing metrics Costs Wrap up

Object Stores

What is an Object Store? NOT a traditional file system –No hierarchical namespace No directory hierarchy –No POSIX (by default) No ‘ls’, ‘stat’, ‘open’, etc methods –Not mountable (‘by default’) Mostly limited functionality –Read, write, delete

So why do you want one? Massively scalable –10’s of petabytes –Billions of objects (that’s UK billions!) –Ideal for ‘cloudy’ storage Easier to administer –Survives losses of drives or storage nodes –No central point of control –Faster rebuild times Fast access –Data distributed across many nodes –Fast lookup of object location Simple Interface –Write, Read, Delete (again)

So where can I get one? Kinetic

What is WOS?

WOS Summary DataDirect Networks Web Object Scaler For the geeks, each WOS ‘appliance’ has –Up to 60 drives, –4x10GigE, 40GigE, IB –Up to 128MB RAM –4U chassis

So what makes it different? Global, policy based, federation –Based on ‘zones’ –See next slides Security (kerberos) ‘Heat map’ of network connections –Makes sure you can get to data as fast as possible from wherever you are Policy based self-healing

A Simple Policy Make 2 copies in Zone A Client

A More Complex Policy Write to Zone A Replicate to Zone B Zone A Zone B Client

More Complex Still… Write to Zone A Replicate to Zone B asynchronously Zone A Zone B Client

Synchronously create replicas in zones A and B then async. create replicas in zones C,D and E Zone A Zone B Zone C Zone D Zone E Client And policies can be quite complex……

Summary of Policies Based on ‘zones’ which can overlap Simple to set up web based admin interface But to make the best use of them… –understand your use case –have some knowledge of network latencies –ask if you really need this functionality

Data Integrity (per object) Traditional Replication Local Copy ObjectAssure Replicate ObjectAssure Global ObjectAssure 3x Capacity performance efficiency reliability scalability  1.25x Capacity performance efficiency reliability scalability  2.5x Capacity performance efficiency reliability scalability  <1.88x Capacity performance efficiency reliability scalability 

Integration with Other Tools

IRODS + WOS User search, access, add and manage data & metadata IRODS iRODS Rule Engine Track Policies iRODS Metadata Catalogue Track data WOS Data Locality Manager policy driven replication Object Supervisor Global data mgmt.

GPFS + WOS GPFS NFS/CIFS WOS Bridge WOS Access

Performance Testing

Info and Caveats Box was on loan from OCF –limited testing time Two WOS appliances available –at two separated sites WOS software has been updated since testing –figures are probably better now Not enough repetition

Testing through IRODS Namespace scalability Sequentially add small (100 byte) files with checkpointing every 5k 2.26 Mobjects inserted in 1 week –Performance impacted by erasure encoding security –Latest software much more performant

Number of Objects Time to Write 5k Files (secs)

Scale-out Tests Read/Write/Delete jobs on 1GB file Increasing number of compute nodes used wget and curl for all commands –First two slides shows performance in ‘typical’ compute where jobs are scheduled –Third shows impact of all jobs starting concurrently

More testing Also tested failover between two sites and loss of disks Rebuild times very quick compared with out current RAID6

COSTS

Costs NOT A SALESMAN OR REPRESNTING DDN But most purchasers know you pay a premium for DDN Hardware BUT… WOS is available software only –Can be (and is) run on modern commodity hardware If you want to know more – contact your local salesman

Wrap-Up

Summary WOS is potentially a very useful tool –multi site/ multi domain data sharing –Simple and resilient storage –Flexibility in defining rules –Good for multiple projects –Seems very scalable from initial test For more info, check the DDN web pages – web-object-scaler-wos/

Thanks… To OCF and DDN for supplying the hardware and support Part of this work was performed on behalf of EUDAT ( … and to all of you for listening