Working with Cloud-based Storage CSE423 VIRTUALIZATION AND CLOUD COMPUTING Working with Cloud-based Storage
Introduction The world is creating massive amounts of data. A large percentage of that data either is already stored in the cloud, will be stored in the cloud, or will pass through the cloud during the data's lifecycle. Cloud storage systems are among the most successful cloud computing applications in use today. This chapter surveys the area of cloud storage systems, categorizes the different cloud storage system types, discusses file-sharing and backup software and systems
Lecture Outline Measuring the digital universe Provisioning cloud storage Creating cloud storage systems Cloud backup solutions Cloud storage interoperability
Measuring the Digital Universe Facts of hunger for storage An email with a 1GB attachment to 3 people can generate an estimated 5 GB of stored managed data. Only 25% of the data stored is unique, while 75% of the data stored is duplicated. 70% of the data stored in the world is user initiated., remainder is enterprise generated content.
Measuring the Digital Universe Facts of hunger for storage More than 50% of the data created everyday is the data that is automatically generated, (called shadow data/digital shadow ) especially from video cameras and surveillance photos, financial transaction event logs, performance data and so on. However lots of shadow data does get retained having never been touched by human bieng Much of the data produced is temporal, stored briefly and get deleted.
Measuring the Digital Universe The storage giant EMC has an interest in knowing just how much data is being stored worldwide. EMC has funded some studies over the past decade to assess the size of what it calls “The Digital Universe.” The latest study done by IDC in 2007-2008 predicted that by 2011 the world will store 1800 exabytes (EB) or 1.8 zettabytes (ZB) of data. By the year 2020,stored data will reach an astonishing 35ZB https://www.emc.com/leadership/digital-universe/index.htm
EMC’s Digital Universe Homepage
Cloud Storage Data Usage in 2020 By International Data Corporation, Digital Universe, May 2010
Cloud Storage Definition IaaS model Storage accessed by Web service API Cloudy characteristics Network access most often through browser On-demand provisioning User control SaaS model Software package on top of cloud storage for backup, synchronization, archiving, etc.
Storage Devices Block storage device File storage device Raw storage that can be partitioned to create volumes Data is transferred in blocks Example, hard disk, flash drives Faster data transfers/ additional overhead on clients File storage device Expose its storage to client in a form of files Example, file server, most often in the form of Network Attached Storage (NAS) devise Slower transfers/ less overhead from clients
Provisioning Cloud Storage Cloud storage may be broadly categorized into two major classes of storage: Unmanaged Storage Managed Storage
Cloud Storage Types Unmanaged storage Unmanaged storage is presented to a user as if it is a ready-to-use disk drive. The user has little control over the nature of how the disk is used. Preconfigured storage (limited level of mgt) Cannot (1) format as your like, (2) install your own file system (FAT, NTFS), and (3) change drive properties (compression, encryption) Reliable, relatively cheap, easy to work with Ex-Application using this storage are SaaS web services
Cloud Storage Types Managed storage Provided as a raw disk Managed storage involves the provisioning of raw virtualized disk and the use of that disk to support applications that use cloud-based storage Provided as a raw disk Can (1) format and partition the disk, (2) attach or mount the disk, and (3) make storage assets available to applications and other users Support applications built using Web services Ex-Application using this storage are IaaS web services
Unmanaged Cloud Storage With the development of high capacity disks in mid to late 1990 a new class of Storage provider known as Storage Service Provider (SSP) appeared with intent of doing online storage IDrive, FreeDrive, MyVirtualDrive, OmniDrive, Xdrive offered file hosting services in unmanaged storage form. Volumes were accessible using FTP then Utility then within browsers. DropBox example of file transfer utility. In unmanaged cloud, disk space provided to user as a sized partition.
Dropbox – File Transfer Utility
Managed Cloud Storage User provisions storage on demand and pays using pay-as-you-go model System appears to user as a raw disk that user must partition and format Amazon Simple Storage Service (S3) http://aws.amazon.com/s3/ Rackspace Cloud http://www.rackspace.com/index.php Google Storage for Developers https://cloud.google.com/storage/
Amazon S3 and Rackspace Cloud
Creating Cloud Storage Systems Concepts Multiple copies of data are stored on multiple servers and in multiple locations Storage virtualization software Failover - > changing the pointers to the stored object’s location Example Amazon Web Service (EC2, S3) supports “failover” / load balancing ->but you must purchase these features
Evaluating Cloud Storage Important considerations Client self-service Strong management capabilities Scale up – more disks Scale out – additional storage systems Performance characteristics such as throughput Block-based or file-based protocol support Seamless maintenance and upgrades
Cloud Backup Solutions Last line of defense in a strong backup routine Backup types Full system or image backups Point-in-time (PIT) backups or snapshots Incremental backups 3-2-1 Backup rule 3 copies (1 primary and 2 backups) 2 different media 1 copy should be stored offside
Backup Types Full System/ Image Backups Creates a complete copy of volume including all system files, the boot record and any other data contained in the disks. For create image backup of active system we need to stop all applications. Ex. Ghost
Backup Types Point in Time (PIT) or Snapshots Referred to as incremental backup, created so often. Lets you restore your data to a point in time and save multiple copies of any files that have been changed. Ex- Carbonite
Cloud Backup Solutions Last line of defense in a strong backup routine Backup types Full system or image backups Point-in-time (PIT) backups or snapshots Incremental backups 3-2-1 Backup rule 3 copies (1 primary and 2 backups) 2 different media 1 copy should be stored offside
Cloud Backup Features Logon authentication High encryption of data transfers Automated and scheduled backup Fast backup (snapshots) after full online backup, with 10-30 historical versions of a file retained Ability to retrieve historical versions of file
Cloud Backup Features (2) Multiplatform support (Win/ Mac / Linux) Web-based management console with ease to use features such as drag and drop. 24x7 technical support Logging and reporting of operations Multisite storage or replication, enabling data failover
Cloud Attached Backup
CTERA sells a server referred to as Cloud Attached Storage, which is meant for the Small and Medium Business (SMB) market, branch offices, and the Small Office Home Office (SOHO) market. The CTERA Cloud Attached Storage backup server has the attributes of a NAS (Network Attached Storage), with the added feature that after you set up which systems you want to back up, create user accounts, and set the backup options through a browser interface, the system runs automated backup copying and synchronizing of your data with cloud storage. Backed up data may be shared between users
Cloud Storage Interoperability Open standards (operating-system neutral and file-system neutral) Workgroups Cloud Data Management Interface (CDMI) from Storage Networking Industry Association (SNIA) http://www.snia.org Open Cloud Computing Interface (OCCI) from SNIA and Open Grid Forum (OGF) http://www.ogf.org
References Chapter 15 of Course Book: Cloud Computing Bible, 2011, Wiley Publishing Inc.