Virtual Global File System

Slides:



Advertisements
Similar presentations
Implementing Tableau Server in an Enterprise Environment
Advertisements

Configuration management
© 2009 VMware Inc. All rights reserved Confidential VMware Data Protection Integration Overview Paul Vasquez – Staff Technologist – Backup and Recovery.
A Very Brief Introduction to iRODS
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Tom Sheridan IT Director Gas Technology Institute (GTI)
© Copyright , Cambridge Computer Services, Inc. – All Rights Reserved – Lightning Talks Automated Fixity.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
System Center Configuration Manager Push Software By, Teresa Behm.
AppManager 7: Deep Technical Dive Tim Sedlack & Michi Schniebel Sr. Product Managers.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Data - Information - Knowledge
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
Enterprise Reporting with Reporting Services SQL Server 2005 Donald Farmer Group Program Manager Microsoft Corporation.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Current Job Components Information Technology Department Network Systems Administration Telecommunications Database Design and Administration.
Module 8 Configuring and Securing SharePoint Services and Service Applications.
SharePoint and SharePoint Online: Today and what's next? Presented by Luke Abeling – IT Platforms.
Promoting Open Source Software Through Cloud Deployment: Library à la Carte, Heroku, and OSU Michael B. Klein Digital Applications Librarian
© Copyright , Cambridge Computer Services, Inc. – All Rights Reserved – End to End Life Cycle Management.
IPv6 Network Assessor 111 © 2005 Cisco Systems, Inc. All rights reserved. Susan Shareshian Solutions Manager, Cisco Systems, Inc.
Content Strategy.
SCSC 311 Information Systems: hardware and software.
Configuration Management (CM)
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
SUSE Linux Enterprise Desktop Administration Chapter 6 Manage Software.
1 Administering Shared Folders Understanding Shared Folders Planning Shared Folders Sharing Folders Combining Shared Folder Permissions and NTFS Permissions.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GAAIN Virtual Appliances: Virtual Machine Technology for Scientific Data Analysis Arihant Patawari USC Stevens Neuroimaging and Informatics Institute July.
Coding Compliance Components Writing Custom Policies for Auditing, Expiration and More Jason Morrill Program Manager Windows SharePoint Services.
Managing and Monitoring the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
@ulccwww.ulcc.ac.uk IRMS Cymru October 2015 From EDRMS to digital archive: a wish-list for ways to preserve digital records.
Cognos 8 BI Configuration, Administration, and Upgrade Cognos 8 BI.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Migrating from Legacy ECM Repositories to Alfresco Ray Wijangco Technology Services Group Alfresco Practice Lead.
 Cloud Computing technology basics Platform Evolution Advantages  Microsoft Windows Azure technology basics Windows Azure – A Lap around the platform.
De Rigueur - Adding Process to Your Business Analytics Environment Diane Hatcher, SAS Institute Inc, Cary, NC Falko Schulz, SAS Institute Australia., Brisbane,
International Planetary Data Alliance Registry Project Update September 16, 2011.
PROV NETWORK MEETING Linda Tolson, Corporate Records Manager 6 May, 2016.
Enhancements to Galaxy for delivering on NIH Commons
Building a Data Warehouse
SharePoint 101 – An Overview of SharePoint 2010, 2013 and Office 365
Chapter 1 Computer Technology: Your Need to Know
Integrating ArcSight with Enterprise Ticketing Systems
Managing, Storing, and Executing DTS Packages
Integrating ArcSight with Enterprise Ticketing Systems
Discovering Computers 2010: Living in a Digital World Chapter 14
Joseph JaJa, Mike Smorul, and Sangchul Song
Chapter 2: System Structures
Microsoft SharePoint Server 2016
Modernization of Navigation Statistics Publishing
Exploring Azure Event Grid
Storage & Digital Asset Management CIO Council Update
Multi-Farm, Cross-Continent SharePoint Architecture
Chapter 2: System Structures
Unit 9 NT1330 Client-Server Networking II Date: 8/9/2016
Administering Your Network
Microsoft Virtual Academy
Technical Capabilities
BluSync by ParaBlu Offers Secure Enterprise File Collaboration and Synchronization Solution That Uses Azure Blob Storage to Enable Secure Sharing MICROSOFT.
Business Document Platform
BMC Automation Portal Update
Features Overview.
Remedy Integration Strategy Leverage the power of the industry’s leading service management solution via open APIs February 2018.
GNFC Architecture and Interfaces
Presentation transcript:

Virtual Global File System

Background & Short Story Launched in 2011 as a skunkworks project at Cambridge Computer Spun out into a separate company in January, 2014 $5M investment Inspired by SRB and IRODS as well as home grown software by our many clients 100% original code, built from the ground up by Starfish Storage Early customers (running like a managed service by our R&D team) Harvard Med School US Library of Congress

2016 – Moving Toward GA Release The software has been running in production for approximately 5 years. Foundation for managed services Archiving Fixity checking Other file systems rules automation Professional services tool File systems analysis Data migrations We are working aggressively toward a GA release But in the meantime, the product is very usable Get in on our early adopter program!

Technical Leadership Jacob Farmer, Founder and Chief Evangelist CTO for Cambridge Computer Storage industry veteran, 30 years’ experience Don Preuss, CTO for Starfish Storage Former CTO of NIH (US National Institutes of Health) Formerly head of systems at the NIH National Center for Biotechnology Information (NCBI) at the National Library of Medicine. Pubmed dbGaP SRA

Target Markets Research Computing Digital Libraries Policy-based management of all files across a large, diverse institution. Capture metadata throughout scientific pipeline Curate files in place It is not necessary to transfer files to an archival storage facility Digital Libraries File system middle-ware for automating the housekeeping of digital collections. Enterprise Computing Storage management for NAS and large-scale file systems

The Grand Vision Manage the Lifecycle of Institution-scale File Collections Publication/Preservation (Librarians, Archivists, Curators) Open Links / DOIs Metadata Extraction Curation Workflows Version Controls Access Controls Fixity Checks Content Creation (Scientists, Engineers, Artists) Metadata Tagging Workflow Automation Data Management Plans Open Access Data Reusability Collaboration IT Operations (Storage & Backup Administrators, IT Governance) Data Movement Tiered Storage Backup Restore Data Migration Governance Permissions Management Auditing Chargeback / Show-back Reporting Capacity Planning Aging / Utilization File System Analysis

What Starfish Does And How it Works Modular Architecture Designed to Enable Highly Customized File-based Workflows Also Designed to be Useful Right out of the Box

Starfish is Made Up of 3 Main Components File System Catalog Syncs the metadata of your file systems to a database Allows you to assign additional metadata to files and directories via API Jobs Manager Runs batch jobs that act on your files Enforces policies Access Gateways Provides alternative methods for addressing and accessing files Allows programmatic manipulation of access controls

The File System Catalog Starfish scans your file systems and make a database that reflects the metadata contents of your file systems. You can add additional metadata (via API) that better describe the files and directories.

What Makes the File System Catalog Awesome? Massively scalable Handles billions of files Multi-threaded and multi-host for greater parallelization Highly tunable and configurable Agents for specific file systems Agents capture file system events reducing the need to crawl and compare Agents capture device-specific metadata Versioning We track version changes in the file system The next release of our catalog supports directory tree versioning Navigate the file system tree throughout time Reconcile path names, even if the directory has been moved or renamed

What Can You Do With Metadata? Generating more meaningful reports Utilization, aging, trending, chargeback/ show-back Inventory management for your files Content classification Deeper understanding of file system contents Rules-based administration Backup, archiving, publishing, processing Addressing and retrieval A virtual global namespace Search Tracking provenance Facilitate Collaboration Define meaningful subsets of files and directories Enable more purposeful access controls (compared with LDAP/POSIX/ACLs)

Where Does the Metadata Come From? Metadata can be added via the API Metadata can be inherited from a parent directory A job could be executed that extracts metadata from a file and updates the database via API Examples Reading tags from a TIFF file Extracting CODECs from video files An externally defined workflow might make API calls to add metadata to the catalog as files are created and manipulated. Worst case, metadata can be data entered through a GUI

The Jobs Manager

The Jobs Manager Jobs work by querying the database on an ad hoc or scheduled basis. Query results go into a queue Agents take work out of the queue if they can mount the file systems in question You can have any number of agents They are easy to install. The essential components discover each other automatically They can run on dedicated hardware or virtual machines

What Kinds of Jobs Can You Run? Copy-Move-Delete With versioning enabled a copy command is just like enterprise backup, except much more powerful and flexible. To and from POSIX file systems and Object / Cloud Stores Calculate hashes Fixity checking (data integrity checking) Duplicate file detection Content addressing (address files by their hashes) Metadata extraction Format conversion Anything you can imagine Refactor your batch scripts as Starfish commands

Starfish System Topology

Out-of-band for Live File Systems In-band for backup and Archive Confidential

A “Virtual” Global File System

Common Use Cases For Digital Libraries Backup and Restore for Large File Systems Granular control of backup policies Ensure data integrity with hash compares Direct files to appropriate backup target based on metadata Tiered Storage Migrate files to archival tiers while maintaining namespace Reporting Utilization, Aging, Duplicate File Access controls Fixity Checking Metadata Extraction

Bragging Rights The software typically installs in less than 10 minutes. The major components discover themselves. We can perform an initial scan of a billion files in typically less than a day. Our largest single installation has over 8 billion file system objects. The upgrade process is invoked by a single command from CLI. We have over 25 sites using the software as of October, 2016. Most are top tier data centers and/or household names