Data Parallel Application Development and Performance with Windows Azure Advisor : Professor Gagan Agrawal Present by : Yu Zhang
Agenda
Motivation
Goals
The same facilities that a desktop OS provides, but on a set of connected servers: Abstract execution environment Shared file system Resource allocation Programming environments Utility computing 24/7 operation Pay for what you use Simpler, transparent administration
Windows Azure PaaS ApplicationsWindows Azure Service Model Runtimes.NET 3.5/4, ASP.NET, PHP Operating SystemWindows Server 2008/R2-Compatible OS VirtualizationWindows Azure Hypervisor ServerMicrosoft Blades DatabaseSQL Azure StorageWindows Azure Storage (Blob, Queue, Table) NetworkingWindows Azure-Configured Networking
A Windows Azure application is called a “service” Definition information Configuration information At least one “role” Service definition is in ServiceDefinition.csde Defines aspects of a service that cannot be changed without redeployment Types of roles and static role configuration Set of configuration settings for a role Contract with the environment code runs
Service configuration is in ServiceConfiguration.cscfg Defines values for properties that can be dynamically updated for a running deployment Values of a configuration parameter Number of running instances
Definition: Role name Role type VM size (e.g. small, medium, etc.) Network endpoints Code: Web/Worker Role: Hosted DLL and other executables VM Role: VHD Configuration: Number of instances Number of update and fault domains
Desktop And Related Azure Concepts
Storage Services Public Internet Web Role Load Balancer
Storage Service Worker Role Web Role
Windows Azure Storage Abstractions
C1C1 C1C1 C2C2 C2C Producers Consumers P2P2 P2P2 P1P1 P1P Queue Usage Example
Communicating sequential processes Each process runs in its own local address space. Processes exchange data and synchronize via message passing. ( Usually, but not always, same code executed by all processes.) Need to take care of locality, in order to achieve performance – message passing does this explicitly.
Azure Parallel Programming Model VMS LB IIS VMS Web Role Worker Role Queue or WCF
MPI_Reduce(inbuf, outbuf, count, type, op, root, comm) Inbuf : address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation Root : process id of root process public class WorkerRole : RoleEntryPoint { Public override void Run() { doWork(); var msg = new CloudQueueMessage(); queue.AddMessage(msg); }
MPI_Allreduce(inbuf, outbuf, count, type, op, comm) Inbuf : address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation public class WorkerRole : RoleEntryPoint { Public override void Run() { if (queue.Exists()) { var msg = queue.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); } doWork(); var msg = new CloudQueueMessage(); queue.AddMessage(msg); }
Each worker role reads the data from matrix B Decouple the matrix A into n parts, n is the number of the worker roles. Each worker role gets one part of matrix A, for a N×N matrix, each worker role has two data sets, one is matrix B, the other is part of matrix A, say A K (1≤k≤n) n is the number of worker roles. Each worker role computes the A K ×B and add the result to its queue Web role performs the reduce operation gets the final result.
1. Web role calculates the initial means 2.Broadcast the k centroids to all worker roles 3. Each worker role computes distance of each local document vector to the centroids 4. Assign points to closest centroid and compute local MSE (Mean Squared Error) 5. Perform reduction for global centroids and global MSE value 6. Web role broadcast new cnetroids to all worker role until no points move.
1. Web role be the master, the other N worker roles are slaves. 2.Master divides the training samples to N subsets, and distributes 1 subset for each worker role. 3.Each individual worker role now computes the distance measures independently and storing the computes measures in a local array 4.When each worker role terminates distance calculation, it transmits a message to the web role indicating end of processing 5.Web role then notes the end of processing for the sender and acquires the computes measures by reduction. 6.After the web role has claimed all distance measures from all WRs, the following steps are performed: Select top k measures Sort all distance measures in ascending order Count the number of classes in the top k measures The input element’s class will belong to the class having the higher count among top k measures
What is Windows Communication Foundation (WCF)? WCF is Microsoft’s implementation of industry standards to provide a communication subsystem enabling applications on one machine (process boundary) or across multiple machines to communicate. WCF is a core component of the.NET Framework 3.0 and later versions which is included with Windows 7 and Vista platforms as well as the future version of Windows Server. The WCF API unifies ASMX Web Services,.NET Remoting, distributed transactions and messaging into a single programming model service orientation tenable. Fundamental to.NET Framework. ASMXWSE.NET Remoting COM+ (Enterpris e Services) MSMQ WCF
WCF: Address, Binding, Contract ClientService Message AddressBindingContract Where?How?What? Endpoint ABCABC Endpoints ABC WCF Services are deployed, discovered and consumed as endpoints
WCF : Endpoint
WCF in Azure maxBufferSize=" " maxReceivedMessageSize =" " maxBufferSize=" " maxReceivedMessageSize =" "
PolymorphismEncapsulationSubclassing 1980s Interface-based Dynamic Loading Runtime Metadata 1990s Object-Oriented Service-Oriented Component-Based Message-basedSchema+Contract Binding via Policy 2000s C&C++ with MPI Queue with Azure WCF with Azure
Experimental Evaluation MPIQueueWCF 8 Processors0.0993sec sec sec 4 Processors0.1656sec sec 6.349sec 2 processors0.4723sec sec sec MPIQueueWCF 8 Processors Processors processors MPIQueueWCF 8 Processors sec sec sec 4 Processors sec sec sec 2 processors sec sec sec Time (sec ) Time (sec ) Time (sec ) Time (sec ) Time (sec ) Time (sec ) Matrix Multiplication Kmeans KNN Fastest Read: 31ms Slowest Read: 203ms Fastest Write: 31ms Slowest Write: 234ms Fastest Delete: 0ms Slowest Delete: 593ms simply a reliable method of delivering messages between processes Fastest Read: 31ms Slowest Read: 203ms Fastest Write: 31ms Slowest Write: 234ms Fastest Delete: 0ms Slowest Delete: 593ms simply a reliable method of delivering messages between processes QUEUE Performance
Azure VS Traditional Cluster CPU Ram Bandwidth Glenn 2.7Ghz 8 G20 Gbps Azure 1.6Ghz 2 G10 Gbps
Conclusion