Diagnostics and Service Management In this module- Learn how to find out what is going on Do something about it
How do you do diagnostics today How do you do diagnostics today? How do you manage your applications and services today?
Challenges with diagnostics in the cloud Many instances They move around Massive amount of data Can’t remote desktop in No remote tools
The Diag engine brings all of the sources together for you. MonAgentHost.exe is started automatically by default. Listener wired up in app/web.config Need to define a storage account connection string
How does it work (in a nutshell)? Role Instance Role Instance Starts Diagnostic Monitor Starts** Monitor is configured Imperatively at Start time Remotely any time Monitor buffers data locally User can set a quota (FIFO) User initiates transfer to storage Scheduled or On Demand Role Diagnostic Monitor Local directory storage
Sources: Data Source Default Destination Trace Logs Enabled Azure Table Diagnostic Infrastructure Logs IIS Logs Blob Performance Counters Disabled Windows Event Logs IIS Failed Request Logs Crash Dumps Arbitrary Files
The Escape Hatch Allows you to collect any file that is in a defined directory Can be used for: Collecting custom audit files Any source of data Usage data for billing
Loading the Diagnostic Agent The agent is loaded as an Azure module in the ServiceDefinition.csdef The module expects a connection string named: A production connection string must be HTTPS.
Write to Trace Output
Common Patterns Get Config Make a change to the config From default Current running Make a change to the config Start the Diag agent with new config
Changing Config Can change from within the instance Affects only that instance Then start the agent immediately Can change from outside for all roles Change the central file Agent notices a change and reloads Affects all instances of that role
Local directory storage Remote Configuration Poll Interval Role Role Instance Diagnostic Monitor Local directory storage
Get the Current Configuration First create a cloud storage account that points to the storage account used for diag data Call createroleisntancediagnosticmanager off of that, passing in the isntance info Then call getcurrentconfiguration
Make changes to the config Create a perfcounter config object Add the counter specifier Change the sample rate Add the config object to the datasources collection Adjust the scheduled transfer period
Commit the change Then commit the change by passing in the config object from above
Sample Results
Log Filters Does not filter data collected Only filters what is transferred transferOptions.LogLevelFilter = LogLevel.Error;
Visualizing the data Cerebrata’s Azure Diagnostics Manager
Visualizing the data
Visualizing the data Configuring counters remotely…
Schedule Transfers Each source is assigned its own schedule Data is transferred at the right time Set interval to 0 to disable transfer
On Demand Transfers Handy for responding to events Handled like an external config change Requests are handled asynchronously Returns a request id when submitted Can report success to a queue
Service Management API Allows us to do almost anything the portal can do Limitations No billing data Creating a subscription Creating a storage or compute service Cannot deploy management certificates Free Don’t be stupid – may get throttled
API Authentication All API calls must be signed with a registered administrative certificate X509 certificates are used You can register up to five certificates You can revoke at any time Can be self signed Upload .cer through portal
Deploying Services Delete/Create Deployment Visual Studio does this. VIP will change Service Model Updates don’t matter VIP Swap Bring up another environment in Staging and swap Only Input Endpoints (external ports) matter In-Place Upgrade Rolling upgrade across roles Most restrictive on changes (no size, endpoints, roles, etc.) Web Deploy*
Configuring VS2010
Deployment Environments Two Environments to choose from Nearly Identical… <servicename>.cloudapp.net <deploymentID>.cloudapp.net VIP Swap between them
If the cube is Gray, You’re OK. If the cube is Blue, a bill is due. Even when you ‘suspend’ your service, you will still be charged. Suspend only disables inbound traffic. The code is actually still running.
Worried about leaving something running? Download the Grey Box Application GreyBox.CodePlex.com Open source, originally written by Strategic Data Systems and Mike Wood Reminds you if you have apps running, and helps you stop them Avoids overrun of MSDN allocation
Grey Box
MOCP will notify you MOCP will send an email to the Live ID of the subscriber when compute reaches: 75% 100% 125% Only works for committed hours, not for pay as you grow hours
VIP Swap Upgrades Swap Virtual IPs between the two slots Production becomes Staging Staging becomes Production Instances are not affected DNS and LB remains intact Happens very fast Can only use when the service model hasn’t changed
VIP Swap Deployment VM VM VM VM Prod Prod Deployment Stage Stage VM VM Web Role Worker Role VM VM VM VM Load Balancer: Prod Prod Deployment Stage Stage Web Role Worker Role VM VM VM VM
In-Place Upgrades Rolling upgrades are IT Nirvana Difficult to do in traditional IT Leverages Upgrade Domains Service model must be identical (ie. No new roles, no changes in .csdef, etc.) For Each Upgrade Domain Stop instances Update Start instances
In Place Upgrade #1 #2 #1 #2 Rack Rack VM VM Prod VM VM VM VM VM VM Web Role Web Role Load Balancer: #1 VM VM Prod #2 VM VM Worker Role Worker Role #1 VM VM #2 VM VM
Fault and Upgrade Domains Fault Domains Represent groups of resources anticipated to fail together i.e. Same rack, same server Fabric spreads instances across fault domains Default of 2 Upgrade Domains Represents groups of resources that will be upgraded together Specified by upgradeDomainCount in ServiceDefinition Default of 5 Fabric splits Upgrade Domains across Fault Domains and Across Roles
Upgrade Domains Defined in .csdef Instances evenly distributed Isolated Hardware Isolated Hardware Isolated Hardware Fault Domain 1 Fault Domain 2 Fault Domain 3 Upgrade Domain 1 Role A Instance 1 Role B Instance 2 Role C Instance 3 Upgrade Domain 2 Role B Instance 1 Role C Instance 2 Role A Instance 3 Upgrade Domain 3 Role C Instance 1 Role A Instance 2 Role B Instance 3
How? Can be done via portal or the management API Upgrade Mode: Automatic or Manual Manual waits for human intervention to confirm upgrade is ok before proceeding
Changing Configuration Change any setting in .cscfg Change the number of instances running Three approaches: Edit on portal Upload new file in portal Upload new file with management API By default, changing the number of instances does not affect running instances. Any other config restarts the instances.
Deployment and Management Tools Visual Studio* CSManage.exe Windows Azure MMC Windows Azure Service Management (WASM) cmdlets 3rd Party tools
Windows Azure Service Management Cmdlets Set of PowerShell cmdlets Wraps Management REST API and Diagnostics API Enables building of sophisticated deployment scripts Works with rest of .NET CLR
Windows Azure MMC MMC Snapin providing graphical view of services, diagnostics, and storage Built on top of WASM Cmdlets Plugin-based, extensible Remotely configure diagnostics Download and view diagnostics
Monitoring Windows Azure Diagnostics Windows Azure Monitoring MP for SCOM Available as RC now! Monitors Health, Scales, and more
Autoscaling Azure does not autoscale Azure gives you the tools What is ‘busy’ for your app is different than someone else Azure gives you the tools Not an easy problem to crack Define inputs Define rules to determine busy or stagnant state Make adjustments Don’t run amok, put a human in somewhere
Deploying Applications in Windows Azure Lab Exercises Deploying via the Management Portal Deployment via PowerShell Deployment via Visual Studio Securing Azure with SSL* Labs Location C:\WAPTK\Labs\WindowsAzureDeploymentVS2010