Digital Preservation Partners Meeting July 21, 2010 Andrew Woods DuraCloud: Data Integrity Monitoring in the Cloud.

Slides:



Advertisements
Similar presentations
Ivan Pleština Amazon Simple Storage Service (S3) Amazon Elastic Block Storage (EBS) Amazon Elastic Compute Cloud (EC2)
Advertisements

By Jeremy Lesniak Vermont Computing, Inc  Accessible from any computer  Tremendous storage space  Fast searching for old  Easily ties.
P3- Represent how data flows around a computer system
Archiving research data in the cloud or in a local repository Michele Kimpton, CEO DuraSpace CNI Dec 2014.
Cloud Testing – Guidelines and Approach. Agenda Understanding “The Cloud”? Why move to Cloud? Testing Philosophy Challenges Guidelines to select a Cloud.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Big Data: Beyond Bare-Metal? Mike Wendt Feb
Features Scalability Availability Latency Lifecycle Data Integrity Portability Manage Services Deliver Features Faster Create Business Value.
Unified Logs and Reporting for Hybrid Centralized Management
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Preservation In The Cloud Markus Wust NCSU Libraries.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
DuraCloud A service provided by Sandy Payette and Michele Kimpton.
DuraCloud Managing durable data in the cloud Michele Kimpton, Director DuraSpace.
TDL Forum WEDNESDAY, APRIL 16, Agenda - Updates & Announcements ◦TCDL 2014 (Kristi) ◦Vireo Users Group Meeting (Kristi) ◦Staffing (Ryan) ◦SHARE.
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
Cloud Computing 1. Outline  Introduction  Evolution  Cloud architecture  Map reduce operation  Platform 2.
Annual User Group Meeting 2007 What’s New at CLI? Gary Snyder Business Development Carrier Logistics Inc.
Human Resources Nancy Esposito May 18,
1 Intern Project Presentation Connor Richardson Big Data August 4, 2015.
EASI a free web database application for collecting and managing monitoring records.
Module 11: Implementing ISA Server 2004 Enterprise Edition.
DuraCloud Enabling services for managing data in the cloud Michele Kimpton, CBO DuraSpace Bill Branan, Senior Developer DuraSpace.
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
Hybrid Cloud and Windows Server 2003 end of support on Azure Rod Kruetzfeld Data Center Technical Strategist Microsoft Canada.
Ceph: A Scalable, High-Performance Distributed File System
CENTER FOR HIGH PERFORMANCE COMPUTING Introduction to I/O in the HPC Environment Brian Haymore, Sam Liston,
Two-level Content Delivery System in PIONIER Optical Network for interactive TV services Cezary Mazurek Poznan Supercomputing and Networking Center, Poland.
Stairway to the cloud or can we take the highway? Taivo Liik.
DuraCloud Open technologies and services for managing durable data in the cloud Michele Kimpton, CBO DuraSpace.
DuraCloud for Dummies Should I stay or should I go [to the cloud]? ~Carissa Smith, DuraSpace.
The DuraCloud Workshop Your hosts: Bill Branan & Carissa Smith.
Archiving and Preservation Michele Kimpton CEO, DuraSpace Bryan Beecher Director, ICPSR DuraSpace Webinar November 2, 2011.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Steps Identify - what digital content do you have? Select - what portion of that content will be preserved? Store - what issues are there for long term.
GOOGLE APP ENGINE By Muktadiur Rahman. Contents  Cloud Computing  What is App Engine  Why App Engine  Development with App Engine  Quote & Pricing.
Fronting Tomcat With Apache V0.1 – Nguyễn Bá Thành Software Manager, Game Platform & Integration.
The Claromentis Digital Workplace An Introduction
Avantida is Helping Ocean Carriers to Optimize Their Empty Shipping Container Flows, Based on the Highly Scalable Microsoft Azure Platform MICROSOFT AZURE.
Tekla Model Sharing and Microsoft Azure Create Secure and Seamless Collaboration Environment for Construction Projects, Locally and Globally MICROSOFT.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Module 1: Designing IIS Web Farms Changes in a Nutshell shell-shared-hosting-improvements-on-iis7.aspx.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
WHAT SURF DOES FOR RESEARCH SURF’s Science Engagement TNC15 June 18, 2015 Sylvia Kuijpers (SURFnet)
What is it and why it matters? Hadoop. What Is Hadoop? Hadoop is an open-source software framework for storing data and running applications on clusters.
Task Performance Group Provides Cutting-Edge E-Commerce B2B EDI Integration Using MegaXML SaaS Solution on Microsoft Azure Cloud Platform MICROSOFT AZURE.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
Connected Living Connected Living What to look for Architecture
Smart Building Solution
Connected Maintenance Solution
CSS534: Parallel Programming in Grid and Cloud
Cloud Data platform (Cloud Application Development & Deployment)
Smart Building Solution
Connected Maintenance Solution
Connected Living Connected Living What to look for Architecture
Unit OS10: Fault Tolerance
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Designed for powerful live monitoring of larger installations
Michele Kimpton Project Director, DuraCloud NDIPP Partner meeting
Jisc Research Data Shared Service (RDSS)
Adobe uses global cloud to create, deliver, and manage better digital experiences Adobe wanted to help its customers understand how people engage with.
Microsoft Virtual Academy
Procurement & Contract Management Solution on Azure Helps to Boost Business Performance “Microsoft Azure gives us the cloud infrastructure to quickly and.
Archiving and preservation services in the cloud
Presentation transcript:

Digital Preservation Partners Meeting July 21, 2010 Andrew Woods DuraCloud: Data Integrity Monitoring in the Cloud

What is DuraCloud? Fixity service use case Basic flow Cost and performance Next steps Overview

What is it? Cloud-based service offered by the not for profit organization, DuraSpace An open source, cloud storage/compute application – Focused on preservation support and – Data access for reuse and sharing Cloud storage across multiple commercial & non-commercial providers An open canvas for cloud-based services

Fixity use case DuraCloud user has replicated content across one or more cloud stores Need for periodic verification of bit integrity Seeking balance between cost & trust

0: Content Topology

1: Data load

1a: Replicate

1b: MD5 export

2: Determine MD5s*...running fixity service

3: Compare & Report

0: Trust vs. Cost Trust in... – Underlying storage providers – DuraCloud and opensource software – Requester of service (administrator)

1: Trust vs. Cost Three approaches: – Request stored value [inexpensive & fast] – Stream out content & re-calculate [compute intensive & slow] – Stream out content & re-calculate with salt [user intensive, compute intensive & slow]

2: Determine MD5s* Options for providing expected MD5  With initial listing  After MD5 calculation

2a: MD5 at non-primary Additional cost of processing content not local to compute

Next steps Scalability – MD5 calculation across Hadoop cluster Multi-administration efficiency – On-demand compute at secondary provider Event logging

Thank you Requesting comments & review