Download presentation
Presentation is loading. Please wait.
1
Feiping Li Jingjing Yang CSS534
Azure Cloud DB Feiping Li Jingjing Yang CSS534
2
Outline Intra-Stamp Replication Inter-Stamp Replication Some Examples
Drawbacks & Improvement The Windows Azure Cloud platform runs many cloud services across different data centers and different geographic regions. And Windows Azure Storage is a scalable cloud storage system to store seemingly unlimited amounts of data for any duration of time. Today we are going to introduce the WAS architecture, and its resource provisioning, load balancing, both local and geographic replication. Also, we’ll give several examples to illustrate it in the fault tolerant point of view. Finally, we’ll talk about some drawbacks and solutions to that architecture. The paper we mainly discuss is exactly written by WAS architects and designers. Calder, Brad, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011.
3
Two Replication Engines
Intra-Stamp Replication Provides durability against hardware failures. Inter-Stamp Replication Provides geo-redundancy against geo-disasters. As we all know, azure has prepared for temporary and large scale failures. How does MS Made it ? This is what I am going to discuss in the following slides. The picture shows a High level architecture of the Azure. It has 3 layers…Last class one team has presented components of each layer and how they interact with each other, so I am not going to to go through the details. in terms of fault tolerant, I will focus on two replication engines which provides high availability and durability. The 1st one is intra-stamp replication, which is used to against hardware failures, such as hard-disk crashes, or temporary availability issues of dependent services, such as storage or networking services. The 2nd one is inter-stamp replication, which provides geo-redundancy against geographical disasters. * A storage built out as separate fault domain with redundant networking and power, clusters typically range from ranks. stamp is a cluster of a certain number of racks of storage nodes, where each rack is
4
Intra-Stamp Replication
5
3 Replicas Intra-Stamp synchronous replication
There are 3 copies of each DB, a primary and two secondary replicas, the primary database performs the transactions and send the updates to the replicas. P Primary Secondary Secondary Intra-stamp replication is also called locally redundant storage. It is a synchronous replication. You get high availability by deploying the 3 replicas of each DB within the same region, a primary and two secondary replicas. The next question is where azure deploy the 3 instances ?
6
Fault domains Fault domain1 Fault domain2 Each fault domain is a fully independent physical sub-system with its own server racks and network routers. Fault domain3 Fault domain4 The 3 replicas are deployed over multiple faults domains. A fault domain is a physical unit of failure. They are fully independent of each other.
7
Locally Redundant Storage
Fault domain1 Fault domain2 Each replica is stored in a different fault zone. Provide durability against hardware failures. P Fault domain3 Fault domain4 And these means replicas are being deployed over multiple racks . so in case you lose one of the racks in which your application is deployed. You application will not go offline because some of the RELICAS will be available in other fault domains,
8
Primary Fails When the server containing the primary database fails, one of the secondary replicas is promoted to primary . Fault domain1 Fault domain2 P Say, If a Hardware failure happens, for example disk, node or rack failure, which happens frequently in large scale systems. Our primary replicas is down, one of the secondary replica is promoted to be the primary by partition manager.. Fault domain3 Fault domain4
9
Secondary Fails Self-healing
When a server fails that contains secondary replicas, new replicas are created Fault domain1 Fault domain2 P Fault domain3 Fault domain4 If one of the secondary replicas is down, a new replica will be created by the system As a result, azure also has a concept of self-healing. It means if you have a few replicas that go offline, due the fact that all the applications building in azure are stateless. As azure can simply take these replicas that you lost and put it somewhere else. That is one of the interesting high-availability features available in cloud computing
10
Replication Flow Stream Manager Allocate 3 replicas Request Clients PP
Fault domain1 Primary S Fault domain2 Secondary S Fault domai3 Secondary For writing requests made from a client, the primary database performs the transactions and send the updates to the secondary replicas. The system only return success to client after a successful update has been made to all 3 replicas All the operations occurs synchronously and therefore usually data consistency is guaranteed Customers get the full benefit of replicated database without having to configure or maintain complicated hardware, softer, OS or virtualization environment. 1 write 2 3 6 5 4
11
Inter-Stamp Replication
12
Inter-Stamp Replication -Geo Redundant Storage
Geo Redundancy against geo-disasters e.g. A catastrophic failure in your application because you loose complete components within a region. Intra-stamp replication covers all the minor failures. The next thing we will talk about is Catastrophic failures , because in addition to some small servers breaking down, it’s also possible that you notice a catastrophic failure in your application because you loose complete components within a region. It could be even worse and you could lose the complete region due to an earthquake. So next part we will talk about how you can build up your application with tools available in azure to overcome this kind of failures
13
Strategy – Deploy in Multiple Data Centers
North America Region Europe Region Asia Pacific Region Primary region Northern Europe North Central US Western Europe South Central US East Asia South East Asia Secondary region Windows Azure data center, 6 datacenters across 3 continents. The strategy is to make data geographical redundant, and it means that your data will always be available 3 times in your primary region. And it will also be copied asynchronously 3 times to a different data center region . Taking this picture as an example, if you have stores accounts in North Central US where data lives 3 times. It will also be copied to the South Central US data center asynchronously 3 times. So in total, you will have 6 replicas of your DB. Is somehow your primary region goes down, then traffic will be direct to secondary region and you application keeps running. Asynchronous Inter-Stamp Replication
14
Inter-Stamp Replication
West - Europe (Amsterdam) In a typical deployment, the domain name of our application points directly to the deployment. This means that if I lose the Western-Europe data center, my application will be completely offline.
15
Traffic Manager Primary Region West-Europe (Amsterdam)
Traffic Manager Primary Region West-Europe (Amsterdam) Secondary Region North-Europe (Dublin) To solve this issue, we can use traffic manager for failover. It allows you to point the hostname or domain name of the application. Traffic manager is actually an intelligence policy engine for DNS queries and each time a user wants to visit your application; traffic manager will tell him where to find. For example, if everything goes well the user will be connected to the Amsterdam data center. If something is wrong with that data center, traffic manager will notice this and the users will be failed over to the secondary deployment, in this case it’s Dublin. Also, as soon as my primary deployment back again, traffic manager will fail me over again to the primary deployment. So this little component actually makes sure that even if you lose a complete data center, your application will keep running.
16
Read Access Geo Redundant Storage
Asynchronous copy Read-only Primary Region West-Europe (Amsterdam) Secondary Region North-Europe (Dublin) But compute is only a small part of your application, underneath that you’re probably using blobs, tables, SQL or other components. It has been mentioned that the storage account is by default locally redundant, which means you have three copies within the same region. Now the next step you can take is to make it a geographic redundant, which means your data will also be copied asynchronously three times to a different data center within the same geopolitical region. A new feature is added because in a standard geo redundant, the only way to do a failover is call Microsoft and ask them to do that. This new feature is called read access geo redundant storage. This means after failing over you’re able to access your storage automatically and your application will still be online, maybe with a little bit degraded functionality, because it’s read-only.
17
Service Bus Paired namespace Primary Region West-Europe (Amsterdam)
Now Let’s talk about service bus. For cloud services, Microsoft Azure offers service bus help applications interact with each other. An application communicating over Azure service bus is usually utilizing a namespace. If a region goes down, it makes a single namespace a single point of failure. So paired namespace is added. It allows you to have your primary namespace in your primary region. And in addition to that, you can create another namespace in a secondary region. And you can use secondary namespaces as soon as you lose the primary namespace. This means that you can keep running your application even if service buses offline in my primary data center. Paired namespaces Namespace Back-end Primary Region West-Europe (Amsterdam) Secondary Region North-Europe (Dublin)
18
Paired Namespace API
19
Drawbacks Unavailability during the failover
Potential loss of recent updates Now let’s have a look at drawbacks of the system. If there is a disaster and an abrupt failover needs to occur, then there is unavailability during the failover and a potential loss of recent updates to a customer’s storage.
20
Possible Solution Build up another full or incremental backups
Backup storage In separate account Different location Another cloud provider Primary Region West-Europe (Amsterdam) Secondary Region North-Europe (Dublin) One possible solution to that is to take just an additional step in the processing, store a copy in a completely separate account, in a different location or another cloud provider. If you perform these full or incremental backups as a synchronization or replication of the data in your storage account, it’s better than you had just relied on the built-in Geo Replication. These backups should give you the capability of restoring your services to a point in time. It is up to you to decide how long you keep these backups and how far back in time you should be able restore data.
21
Cited Calder, Brad, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. ACM, 2011. Ravi Jhawar and Vincenzo Piuri, "Fault Tolerance and Resilience in Cloud Computing ", in computer and information security handbook, 2nd edition, j.Vacca(ed.), Morgan Kaufmann, 2013. Rabi Prasad Padhy, Manas Ranjan Patra, Suresh Chandra Satapathy, "WINDOWS AZURE PAAS CLOUD: AN OVERVIEW", international Journal of Computer Application, Issue2, Volume1, Febuary 2012. Fault-tolerance in Windows Azure SQL Database, us/blog/fault-tolerance-in-windows-azure-sql-database/ Get started with Service Bus queues, us/azure/service-bus-messaging/service-bus-dotnet-get-started-with-queues
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.