Presentation is loading. Please wait.

Presentation is loading. Please wait.

Virtual Global File System

Similar presentations


Presentation on theme: "Virtual Global File System"— Presentation transcript:

1 Virtual Global File System

2 Background & Short Story
Launched in 2011 as a skunkworks project at Cambridge Computer Spun out into a separate company in January, 2014 $5M investment Inspired by SRB and IRODS as well as home grown software by our many clients 100% original code, built from the ground up by Starfish Storage Early customers (running like a managed service by our R&D team) Harvard Med School US Library of Congress

3 2016 – Moving Toward GA Release
The software has been running in production for approximately 5 years. Foundation for managed services Archiving Fixity checking Other file systems rules automation Professional services tool File systems analysis Data migrations We are working aggressively toward a GA release But in the meantime, the product is very usable Get in on our early adopter program!

4 Technical Leadership Jacob Farmer, Founder and Chief Evangelist
CTO for Cambridge Computer Storage industry veteran, 30 years’ experience Don Preuss, CTO for Starfish Storage Former CTO of NIH (US National Institutes of Health) Formerly head of systems at the NIH National Center for Biotechnology Information (NCBI) at the National Library of Medicine. Pubmed dbGaP SRA

5 Target Markets Research Computing Digital Libraries
Policy-based management of all files across a large, diverse institution. Capture metadata throughout scientific pipeline Curate files in place It is not necessary to transfer files to an archival storage facility Digital Libraries File system middle-ware for automating the housekeeping of digital collections. Enterprise Computing Storage management for NAS and large-scale file systems

6 The Grand Vision Manage the Lifecycle of Institution-scale File Collections Publication/Preservation (Librarians, Archivists, Curators) Open Links / DOIs Metadata Extraction Curation Workflows Version Controls Access Controls Fixity Checks Content Creation (Scientists, Engineers, Artists) Metadata Tagging Workflow Automation Data Management Plans Open Access Data Reusability Collaboration IT Operations (Storage & Backup Administrators, IT Governance) Data Movement Tiered Storage Backup Restore Data Migration Governance Permissions Management Auditing Chargeback / Show-back Reporting Capacity Planning Aging / Utilization File System Analysis

7 What Starfish Does And How it Works
Modular Architecture Designed to Enable Highly Customized File-based Workflows Also Designed to be Useful Right out of the Box

8 Starfish is Made Up of 3 Main Components
File System Catalog Syncs the metadata of your file systems to a database Allows you to assign additional metadata to files and directories via API Jobs Manager Runs batch jobs that act on your files Enforces policies Access Gateways Provides alternative methods for addressing and accessing files Allows programmatic manipulation of access controls

9 The File System Catalog
Starfish scans your file systems and make a database that reflects the metadata contents of your file systems. You can add additional metadata (via API) that better describe the files and directories.

10 What Makes the File System Catalog Awesome?
Massively scalable Handles billions of files Multi-threaded and multi-host for greater parallelization Highly tunable and configurable Agents for specific file systems Agents capture file system events reducing the need to crawl and compare Agents capture device-specific metadata Versioning We track version changes in the file system The next release of our catalog supports directory tree versioning Navigate the file system tree throughout time Reconcile path names, even if the directory has been moved or renamed

11 What Can You Do With Metadata?
Generating more meaningful reports Utilization, aging, trending, chargeback/ show-back Inventory management for your files Content classification Deeper understanding of file system contents Rules-based administration Backup, archiving, publishing, processing Addressing and retrieval A virtual global namespace Search Tracking provenance Facilitate Collaboration Define meaningful subsets of files and directories Enable more purposeful access controls (compared with LDAP/POSIX/ACLs)

12 Where Does the Metadata Come From?
Metadata can be added via the API Metadata can be inherited from a parent directory A job could be executed that extracts metadata from a file and updates the database via API Examples Reading tags from a TIFF file Extracting CODECs from video files An externally defined workflow might make API calls to add metadata to the catalog as files are created and manipulated. Worst case, metadata can be data entered through a GUI

13 The Jobs Manager

14 The Jobs Manager Jobs work by querying the database on an ad hoc or scheduled basis. Query results go into a queue Agents take work out of the queue if they can mount the file systems in question You can have any number of agents They are easy to install. The essential components discover each other automatically They can run on dedicated hardware or virtual machines

15 What Kinds of Jobs Can You Run?
Copy-Move-Delete With versioning enabled a copy command is just like enterprise backup, except much more powerful and flexible. To and from POSIX file systems and Object / Cloud Stores Calculate hashes Fixity checking (data integrity checking) Duplicate file detection Content addressing (address files by their hashes) Metadata extraction Format conversion Anything you can imagine Refactor your batch scripts as Starfish commands

16 Starfish System Topology

17 Out-of-band for Live File Systems In-band for backup and Archive
Confidential

18 A “Virtual” Global File System

19 Common Use Cases For Digital Libraries
Backup and Restore for Large File Systems Granular control of backup policies Ensure data integrity with hash compares Direct files to appropriate backup target based on metadata Tiered Storage Migrate files to archival tiers while maintaining namespace Reporting Utilization, Aging, Duplicate File Access controls Fixity Checking Metadata Extraction

20 Bragging Rights The software typically installs in less than 10 minutes. The major components discover themselves. We can perform an initial scan of a billion files in typically less than a day. Our largest single installation has over 8 billion file system objects. The upgrade process is invoked by a single command from CLI. We have over 25 sites using the software as of October, 2016. Most are top tier data centers and/or household names


Download ppt "Virtual Global File System"

Similar presentations


Ads by Google