Windows Azure storage(WAS) A HIGHLY AVAILABLE CLOUD STORAGE SERVICE WITH STRONG CONSISTENCY
Introduction It is a scalable cloud storage system Used inside Microsoft for apps such as social networking search, Xbox, Skype Storage is provided in the form of blobs, tables and queues
Key design features Strong consistency Global and scalable Namespace/Storage Disaster recovery Multi tenancy and Cost of Storage
High level architecture 1. Storage stamp Cluster of N racks of storage nodes Append-only distributed file system holds up to 30PB Each storage stamp consists of 3 layers 2. Location Service Tracks resource usage Disaster recovery and load balancing Ability to easily add new region, location and storage stamp User account setting up
Storage Stamp Architecture Incoming Write Request Ack Front End Layer FE FE FE FE FE Partition Master Lock Service Partition Layer Partition Server Partition Server Partition Server Partition Server M M Paxos Stream Layer M Extent Nodes (EN)
Stream Layer Stream Layer (Distributed File System) An extent is replicated 3 times across different fault and upgrade domains With random selection for where to place replicas for fast MTTR Checksum all stored data Verified on every client read Scrubbed every few days Re-replicate on disk/node/rack failure or checksum mismatch M Stream Layer (Distributed File System) M Paxos M Extent Nodes (EN)
Stream //foo/myfile.data Stream Layer Concepts Block Min unit of write/read Checksum Up to N bytes (e.g. 4MB) Extent Unit of replication Sequence of blocks Size limit (e.g. 1GB) Sealed/unsealed Stream Hierarchical namespace Ordered list of pointers to extents Append/Concatenate Stream //foo/myfile.data Ptr E1 Ptr E2 Ptr E3 Ptr E4 Extent E1 Extent E2 Extent E3 Extent E4 Block Block Block Block Block Block Block Block Block Block Block Block Block Block Block sealed sealed sealed unsealed
Creating an Extent Paxos Create Stream/Extent SM Stream Master Paxos Create Stream/Extent Partition Layer EN1 Primary EN2, EN3 Secondary Allocate Extent replica set EN 1 EN 2 EN 3 EN Primary Secondary A Secondary B
Replication Flow (intra stamp replication) SM Paxos Partition Layer EN1 Primary EN2, EN3 Secondary Ack Append EN 1 EN 2 EN 3 EN Primary Secondary A Secondary B
Partition Layer Provides data model for objects stored Master Provides data model for objects stored Logic and semantics to process objects Massively scalable namespace for the objects Load balancing to access objects across PS Transaction ordering and strong consistency for Blobs, Tables and Queues Inter-stamp (geo) replication by shipping logs to other stamps Stores and reads the objects to/from extents in the Stream layer Lock Service Partition Server Partition Server Partition Server
Partition Layer – Index Range Partitioning Blob Index Account Name Container Name Blob Name aaaa aaaaa ……… harry pictures sunrise Account Name Container Name Blob Name aaaa aaaaa …….. zzzz zzzzz Split index into RangePartitions Split at PartitionKey boundaries PartitionMap tracks Index RangePartition Front-End caches the PartitionMap to route user requests Each part of the index is assigned to only one Partition Server at a time Storage Stamp Partition Master A-H: PS1 H’-R: PS2 R’-Z: PS3 Partition Server Account Name Container Name Blob Name harry pictures sunset ……… richard videos soccer Front-End Server PS 1 A-H Partition Map A-H: PS1 H’-R: PS2 R’-Z: PS3 Partition Server Partition Server Account Name Container Name Blob Name richard videos tennis ……… zzzz zzzzz PS 3 PS 2 H’-R R’-Z Partition Map
Front-End layer and VIP Authenticates and authorize requests. Routes the requests to Partition Layer Partition Map tracks of the partitions for the service being accessed ( Blobs, Queues, Tables) VIP - Virtual IP IP address exposed to the outside world New account request -> Location Service chooses a storage stamp -> Store account metadata LS then updates the DNS to allow requests from the user account to that storage stamp's virtual IP.
blob "Binary Large OBject" - collection of binary data as a single entity Stores unstructured object data Also known as Object storage Blob Data - data which does not conform to an established data type as defined by the database eg: image files Used: Social data such as photos, videos, music, and blogs Backups of files, documents, computers, databases, and devices Images and text for web applications Providing online storage of media files
BLOB - Containers Blobs are stored in a directory-like structure called “container” Blob Service Concepts: Access to Storage done through storage account An account can have unlimited number of containers Container can store unlimited number of blobs Each blob is of the type page, block or append Containers provide a useful way to assign security policies to groups of objects (images) (Pic01.jpg) Page/Block/Append (movies) Page/Block/Append (Mov01.AVI)
Types of blobs – BLOCK , page, Append Ideal for streaming workload operations Made of blocks each having unique blockID. Max size 200GB Multiple blocks can be modified and Commit Operation commits the modifications in a single go Eg: Video files accessed in sequence, documents, back ups , jpg's PAGE: Ideal for random read/write operations “page” is defined as a range of data. Max size 1TB Write operations happens in place and Commit Operation commits the modification then and there Eg: VHD files, Azure VMs to store OS and data disks APPEND: Consists of blocks Optimized for append operations - only by adding a new block to the end Multiple clients can write to same blob - no synchronization needed Eg: Logging
Table Structured or Semi-structured non-relational data NoSQL key/attribute store – it has a schemaless design (free flow – no fixed columns for rows) Used: Flexible datasets user data for web applications address books, device information
Table - ENTity Table is a collection of entities Table Service Concepts: Access to Storage done through storage account An account can have unlimited number of tables Table can store unlimited number of entities Each entity is a collection of properties Property is a name-value pair Entity also has 3 system properties a partition key a row key a timestamp Each entity can be up to 1MB in size Each entity can include up to 252 properties to store data Property
queue Cloud messaging solution for asynchronous communication between application components Provides reliable messaging for workflow processing Handle traffic bursts Build flexible applications : more scalable and less sensitive Used: Creating a backlog of work to process asynchronously Passing messages from an Azure web role to an Azure worker role
QUEUE - messages Queue is a set of messages Queue Service Concepts: Queue Service Concepts: Access to Storage done through storage account An account can have unlimited number of queues Queue can store unlimited number of messages Each message can be in any format up to 64KB Sender sends the message and the client receives and process and deletes them – guarantee success Maximum time a message can be is 7 days
Applications that run on azure Xbox OneDrive Bing Skype
advantages AND DRAWBACKS Diverse OS Supports many different programming languages, tools and frameworks Upgrades can be easily added Cost saving Flexibility / Scalability Only windows centric applications are supported by Microsoft for technical assistance Just started with Linux – limited support
Improvement points Compared with AS3 Can add the change notification feature Can remove the limitation of storage ( 500TB / account ) Extend technical support beyond windows What is best compared to AS3? Append, Page with web interface
Summary Architecture – Geo-replication, Availability with Consistency, Durability Blob – Unstructured data. Container. Types : Block, Page and Append Table – Structured data, NoSQL schema less. Entity – Property Queue – Messaging solution for asynchronous communication. Messages
reference Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency : Brad Calder, Ju Wang, Aaron Ogus, Niranjan Nilakantan, Arild Skjolsvold, Sam McKelvie, Yikang Xu, Shashwat Srivastav, Jiesheng Wu, Huseyin Simitci, Jaidev Haridas, Chakravarthy Uddaraju, Hemal Khatri, Andrew Edwards, Vaman Bedekar, Shane Mainali, Rafay Abbasi, Arpit Agarwal, Mian Fahim ul Haq, Muhammad Ikram ul Haq, Deepali Bhardwaj, Sowmya Dayanand, Anitha Adusumilli, Marvin McNett, Sriram Sankaran, Kavitha Manivannan, Leonidas Rigas http://sigops.org/sosp/sosp11/current/2011-Cascais/printable/11-calder.pdf http://sigops.org/sosp/sosp11/current/2011-Cascais/11-calder.pptx https://docs.microsoft.com/en-us/azure/storage/storage-introduction
Questions???