Seafile - Scalable Cloud Storage System

Slides:



Advertisements
Similar presentations
Introduction To GIT Rob Di Marco Philly Linux Users Group July 14, 2008.
Advertisements

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Mobile Sync Cloud Alejandro M. Ramallo - Group Head of Technology, BAT Phil Shotton - Cloudscape.
© 2014 VMware Inc. All rights reserved. BlazeMeter Load Testing Solution with vCloud Air High-level Overview Jan 2015.
om om GIT - Tips & Tricks / git.dvcs git-tips.com
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
Acceleratio Ltd. is a software development company based in Zagreb, Croatia, founded in Acceleratio specializes in developing high-quality enterprise.
LYU0101 Wireless Digital Information System Lam Yee Gordon Yeung Kam Wah Supervisor Prof. Michael Lyu Second semester FYP Presentation 2001~2002.
Other File Systems: AFS, Napster. 2 Recap NFS: –Server exposes one or more directories Client accesses them by mounting the directories –Stateless server.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
ObjectStore Martin Wasiak. ObjectStore Overview Object-oriented database system Can use normal C++ code to access tuples Easily add persistence to existing.
Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Secure storage for your data in the Internet! If you have any question, you can contact us on: om.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
MetaSync File Synchronization Across Multiple Untrusted Storage Services Seungyeop Han Haichen Shen, Taesoo Kim*, Arvind Krishnamurthy,
Network File System (NFS) in AIX System COSC513 Operation Systems Instructor: Prof. Anvari Yuan Ma SID:
File Systems (2). Readings r Silbershatz et al: 11.8.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
22-Aug-15 | 1 |1 | Help! I need more servers! What do I do? Scaling a PHP application.
Programming in Teams And how to manage your code.
1 The Google File System Reporter: You-Wei Zhang.
AnyShare File Synchronization and Sharing for Enterprise
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
#devshark welcome to #devshark. #devshark HELLO! I’M Ville Rauma Fingersoft Product Owner Web
Distributed Systems. Interprocess Communication (IPC) Processes are either independent or cooperating – Threads provide a gray area – Cooperating processes.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
Internet Information Services 7.0 Infrastructure Planning and Design Series.
GIT An introduction to GIT Source Control. What is GIT (1 of 2) ▪ “Git is a free and open source distributed version control system designed to handle.
Tejasvi Kumar Technology Specialist – VSTS Microsoft Corporation
1 Introductory Notes on the Git Source Control Management Ric Holt, 8 Oct 2009.
Bonrix SMPP Client. Index Introduction Software and Hardware Requirements Architecture Set Up Installation HTTP API Features Screen-shots.
Windows Azure Conference 2014 Deploy your Java workloads on Windows Azure.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Distributed File Systems Overview  A file system is an abstract data type – an abstraction of a storage device.  A distributed file system is available.
Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.
Amit Warke Jerry Philip Lateef Yusuf Supraja Narasimhan Back2Cloud: Remote Backup Service.
SWGData and Software Access - 1 UCB, Nov 15/16, 2006 THEMIS SCIENCE WORKING TEAM MEETING Data and Software Access Ken Bromund GST Inc., at NASA/GSFC.
Future home directories at CERN
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
FCM Workflow using GCM.
Sofia Event Center May 2014 Martin Kulov Git For TFS Developers.
Digital Vault Kick-off 02/12/2015.  Fast & scalable object level storage  Secure content persistence  Secure bi-directional content sharing  Secure.
A Technical Overview Bill Branan DuraCloud Technical Lead.
Introduction TO Network Administration
Getting Started with your Cloud File Sync Tool. Part I: Getting Started.
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Wingu A Synchronization Layer For Safe Concurrent Access to Cloud Storage Craig Chasseur and Allie Terrell December 20, 2010.
GIT: (a)Gentle InTroduction Bruno Bossola. Agenda About version control Concepts Working locally Remote operations Enterprise adoption Q&A.
Introduction to threads
File Syncing Technology Advancement in Seafile -- Drive Client and Real-time Backup Server Johnathan Xu CTO, Seafile Ltd.
Scaling HDFS to more than 1 million operations per second with HopsFS
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
Slicer: Auto-Sharding for Datacenter Applications
Large Scale Parallel Print Service
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
Simple Storage Service
XenFS Sharing data in a virtualised environment
Version Control System
Replication Middleware for Cloud Based Storage Service
NFS and AFS Adapted from slides by Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift.
Ch 9 – Distributed Filesystem
OpenStack for the Enterprise
Presentation transcript:

Seafile - Scalable Cloud Storage System Johnathan Xu Seafile Ltd.

Agenda Seafile Introduction Feature Overview System Design & Performance Roadmap

What is Seafile? VS Seafile is a FAST, SCALABLE, and PRIVATE file sync & share solution

What can Seafile do? Fast and reliable file sync between cloud and devices Scales to millions of files, PB class storage High performance, light weight Productive file collaboration Groups File prview, discussion Message and notification

Who are using Seafile? https://github.com/haiwen/seafile 2400+ stars Estimated at least 100K users worldwide, most in Europe Universities in Rhineland-Palatine (Germany) belgian royal institute of natural sciences

Agenda Seafile Introduction Feature Overview System Design & Performance Roadmap

File Sync and Share Files are organized into Libraries Selective sync library to devices Sync with existing folder Client-side end-to-end data encryption Full platform support: Win, OSX, Linux, mobile Share to a person or a group Read-write and read-only share LDAP/AD integration

View all your libraries in the home page

All libraries shared to a group

Desktop Client Selective sync library Cloud file browser Starred files Notifications

Desktop Client

Collaboration File activities Group discussion File discussion Message notifications

File Activities

Message Notifications

Agenda Seafile Introduction Feature Overview System Design & Performance Roadmap

Server Architecture Seafile is a “file system” built on top of object storage Non-POSIX, User space, Light weight

File System Design Data model similar to Git Head Commit ID Relational DB SHA-1 ID Object Storage Data model similar to Git

Design Advantage Object storage is more scalable than file system Heavy DB + Filesystem v.s. Light DB + Object Storage No database bottleneck Metadata is in object storage Filesystem level versioning v.s. File-level versioning File system designed for syncing Storage/Network deduplication No upload/download limit, fast upload Backend daemons implemented in C

Deduplication Dedup with Content Defined Chunking (CDC) algorithm Only store/send delta between file system snapshots Back link Commit 1 Commit 2 Dir Dir File 1 v1 File 2 File 1 v2 Block 1 Block 2 Block 3 Block 4 Block 5

Cluster Architecture Seafile server is stateless, scales horizontally MySQL cluster Ceph/Swift/S3 Load Balancer Seafile Servers Seafile server is stateless, scales horizontally Head commit ID and user-library mapping in MySQL cluster All data and metadata in object storage

Fast and Reliable File Syncing Detect file changes with OS mechanisms Low CPU usage on client and server side Sync 100K files easily and quickly No data transfer after rename/move Don’t send duplicate files. Delta dection. Handles conflicts Concurrent updates Case conflict: sync ABC.txt and abc.txt to Windows Never remove a file unless user does Devil is in the details

How Syncing Works Almost looks like Git 2:Write objects Relational DB Object Storage 3: Update head commit ID after objects are saved 1: Client uploads commit, dir, file, and block objects 4: Client download objects and check out to folder Almost looks like Git

Syncing Performance Keep version info for the whole fs tree Results Combine many file updates into 1 commit A few database writes for a few K files Results 1 core, 1GB memory VM server 40K small files, ~20 files/s upload and download; single TCP connection; server CPU 2% - 5% Big file, ~8MB/s upload and download in 100bps network; server CPU 50%

Agenda Seafile Introduction Feature Overview System Design & Performance Roadmap

Roadmap Sync & Share Auth integration File locking for better collaboration Hierarchical access control within a library Auth integration OAuth Shibboleth Improve GUI responsibility with backbone.js

Conclusion Do one thing and do one thing well Choose any three ;-) Reliablity Scalability Performance Lightweight DB + Object Storage Git like data model, no client-side history Syncing model similar to Git, redesigned for auto syncing Choose any three ;-)

Thanks!

File Syncing Algorithm Client data 3 stages: worktree, index, repo Worktree: user visible folder, one worktree per library Index file: last modification time of each file in worktree Repo: Internal representation of the latest fs tree for the library. Only have delta blocks. commit commit worktree index repo checkout checkout

File Syncing Algorithm Sync State Machine

File Syncing Algorithm Upload Client creates new commit from batch of local changes Diff between local repo and the cached server fs tree After objects are uploaded, update server head commit ID in database Server do merge on concurrent updates, resolve conflicts Commit from client A HEAD commit on server Commit from client B

File Syncing Algorithm Version Check(init) Client caches server’s head commit ID Compare with server every 30s, if not the same trigger download download Server calculate update list with diff Client download and apply the update to worktree Update cached server head commit ID