Presentation is loading. Please wait.

Presentation is loading. Please wait.

Debugging Citrix XenDesktop & XenApp

Similar presentations


Presentation on theme: "Debugging Citrix XenDesktop & XenApp"— Presentation transcript:

1 Debugging Citrix XenDesktop & XenApp
Jamie Baker Manager, US Escalation Team Kapil Ramlal Sr. Software Maintenance Engineer May, 2010

2 Citrix Confidential - Do Not Distribute
Agenda Troubleshooting Theory Brief Architectural Review XenApp and XenDesktop Common Components Common Problem Types Debugging Tools and Techniques Citrix Confidential - Do Not Distribute

3 Troubleshooting Theory
Citrix Confidential - Do Not Distribute

4 Facts of Life Good Bad Compare them Both

5 Troubleshooting Good Determine how to collect the data you need Bad
Know how the system is supposed to work Determine how to collect the data you need Bad Collect data from the bad system Compare the Good system with the Bad

6 Brief Architectural Review

7 Putting It All Together
 Find “best” virtual desktop  Acquire license and determine settings  Authenticate  Start VM Desktop Delivery Controller SAN  Connect using ICA PVS  Register  PXE-boot VM and stream OS  Log in  Deliver apps Virtual Machines XenServer Full range of authentication methods supported through web interface technology  Apply profile Full support for SmartAccess and ICA session policies Active Directory with roaming profiles XenApp Citrix Confidential - Do Not Distribute

8 Brief Architecture Review: XenApp
Uses Zone architecture for logical server grouping Each XenApp Server runs on a Terminal Server accessed over ICA Uses the IMA service as its backbone to communicate Farm Data IMA uses subsystem DLL’s to control the various features of XenApp (such as Load Balancing or Resource Monitoring) Uses a Data Store for static Farm data, and a Local Host Cache for static member server data, and uses a volatile (in-memory) Data Store for dynamic data (ZDC’s only) Can operate on the LHC if Data Store is inaccessible Uses IMA heartbeat messages to ensure server availability Connections are routed to the Least Loaded Server based on Load Balancing XenApp can stream server desktops, published applications, and offline applications Citrix Confidential - Do Not Distribute

9 Brief Architecture Review: XenDesktop
Utilizes an on demand “brokered” architecture Leverages a connection broker called a Desktop Delivery Controller (DDC), which is similar to a Zone Data Collector (ZDC) on XenApp Typically integrates with a Virtualized Hosting Infrastructure such as Citrix XenServer or Microsoft Hyper-V, which can start the Virtual Desktop Agents (VDA’s) on demand End users connect to virtual desktops over ICA, which are typically virtualized Windows Operating system images running the VDA software Unlike XenApp & Terminal Services, XenDesktop uses it’s own protocol stack for incoming connections called port ICA Leverages Citrix Web Interface as the initial connection portal Integrates with Active Directory for user authentication and management DataStore hosted on external Database (SQL server) Citrix Confidential - Do Not Distribute

10 The Desktop Delivery Controller
Desktop Delivery Controller (Broker) Farm Uses core XA technology IMA, licensing, WI, ... Delivers and controls access to virtual desktops VDA technology agnostic User authentication / single sign on VM power management ICA policy decision Licensing Web Service WI User DDC Servers VM Host (XenServer, Hyper-V, VMware) LDAP ICA Web Service LDAP AD VDAs (VMs, Blade PCs) Citrix Confidential - Do Not Distribute

11 The Desktop Delivery Controller
Worker / controller design Few controllers / many VDAs DDC scales to 1000s of VDAs DDC not in connection path Dependent on Active Directory User authentication Communication security Controller discovery Licensing Web Service WI User DDC Servers VM Host (XenServer, Hyper-V, VMware) LDAP ICA Web Service LDAP AD VDAs (VMs, Blade PCs)

12 The Desktop Delivery Controller Core Architecture
DDC Server Pool Manager VM Host Web Svc Domain Controller LDAP HTTP WI 80, 443 XML Service Workstation (VM or blade PC) VDA (VM or blade PC) DCOM, WCF AMC 2514, 8000, DCOM IMA PSC 2513 MFCOM / IMAProxy WCF 8080 Controller Service Desktop Service CGP Svc PortICA Drivers PortICA Svc DDC Servers IMA 2512 Licensing License Server Data Store ADO

13 The Desktop Delivery Controller Core Architecture
DDC is based on XenApp technology – using IMA as core service Meta-installer wraps up XenApp 4.5 and XenDesktop-specific patches in a simple UI DDC inherits install-time dependency on Terminal Services – but no need for TS-CALs IIS installed and used on every DDC server by default (for WI, XML service) Responsible for publishing, “load balancing”, distributing information Single zone IMA farm only – desktop launch and power management operations handled by zone master “Controller Service” communicates with and manages virtual desktops Implemented using .NET, using WCF to implement web services All communication to workstation goes through this service Scales to several thousand workstations per server “Pool Manager” service interfaces to VM hosting Plug-ins for XenServer, VMware Virtual Center, SCVMM ICA CGP Client (via AG) 1494 2598 Citrix Confidential - Do Not Distribute

14 Citrix Confidential - Do Not Distribute
Virtual Desktop Agent “Virtual Desktop Agent” Collection of services, drivers, ... “PortICA”: ICA connectivity “Desktop Service”: web service interface communicating with DDCs How does it relate to XenApp? Majority of ICA code is shared Does not use Terminal Services Major changes in: WinLogon integration, session management, USB support Citrix Confidential - Do Not Distribute

15 XenDesktop and Active Directory
XenDesktop relies on AD for Authentication of end users and admins Mutual authentication of DDCs and VDAs Encryption of network traffic Discovery and authorization of DDCs by VDAs Each DDC farm can have an OU Only used for purpose 4. May (but need not) contain computer accounts Need not be configured at root OU level Alternative Discovery Method Configure DDC identity in VDA registry (see CTX118976) Citrix Confidential - Do Not Distribute

16 Virtual Machine Integration
Out-of-the-box integration with XenServer Hyper-V via SCVMM Virtual Infrastructure via Virtual Center Each DDC farm can have an OU Only used for purpose 4. May (but need not) contain computer accounts Need not be configured at root OU level Alternative available with XD3 Configure DDC identity in VDA registry (see CTX118976) Citrix Confidential - Do Not Distribute

17 Pool Manager Operation
IMA zone master handles all VM operations Operations forwarded by other DDCs Plug-ins store meta-data on VMs “Dirty” flag, SID of OS running in VM, internal ID Keep meta-data in mind when cloning VMs that host VDAs Hypervisor infrastructure linked to desktop groups Each desktop group uses a connection to the hypervisor (pooling fix for XenServer plug-in) Pool management operations are throttled Up to 10% of VMs in a pool are started/stopped at a time for idle pools Fine-grained control available through config file Citrix Confidential - Do Not Distribute

18 Citrix Confidential - Do Not Distribute
Common Components The same ICA client can be used to access both XenDesktop and XenApp Citrix Web Interface can also be used for both Active Directory XML IMA DDC/ZDC (Although roles are a bit different) Citrix Confidential - Do Not Distribute

19 Citrix Confidential - Do Not Distribute
Common Problem Types XenDesktop XenApp VDA Registration Failure VDA Connectivity / Reconnecting failure Hyper visor Issues Service hangs (DDC/Poolmgr) .NET Global Assembly Cache Exceptions CPU Memory Data Store Server Connectivity / Reconnecting failure Load Balancing Black Hole Service hangs (IMA/ZDC) CPU Memory Citrix Confidential - Do Not Distribute

20 Debugging VDA Registration
Depending on environmental factors, VDA registration problems may occur if a dependent component is not functioning as expected Debugging VDA Registration

21 Under The Hood: VDA Registration
DDC Server Controller Service IMA VDA DDC WCF Desktop Service  Service starts, looks up farm OU  Service queries DC for all SCPs in that OU  Service selects one DDC at random, looks up DDC computer account and initiates a connection through WCF  DDC service receives connection (Kerberos prevents man-in-the-middle) Now let’s take a look at some of the key steps of the registration process…  OS retrieves Kerberos ticket for DDC  Service looks up VDA computer account  Service validates that caller is member of “Controllers” group, and sets configuration  DDC checks that computer account (SID) is in published group in IMA LDAP AD  Service initiates WCF connection to VDA (using peer IP address and Kerberos ticket)  Call succeeds, VDA marked registered Citrix Confidential - Do Not Distribute

22 Citrix Confidential - Do Not Distribute
Registration What Can Go Wrong? Time between VDA and DDC not in sync Personal firewall on the VDA DNS lookups fail VDA locked down Time between VDA and DDC not in sync Kerberos tickets are time stamped and rejected if not valid to avoid replay attacks Personal firewall on the VDA Prevents DDC from calling the VDA – uses port 8080 by default DNS lookups fail VDA reads DDC’s hostname from AD and uses DNS to resolve its IP address DDC uses peer IP address – new in XD3 (used to look up IP address which can fail due to caching) VDA locked down VDA needs to authenticate connection – DDC needs “access this computer from the network” privilege Citrix Confidential - Do Not Distribute

23 Debugging VDA Registration
Use XDPing to check for time sync issues Check port connectivity (Telnet, XDPing, CtxPrtChk) Check resultant set of policy for AD inconsistencies Check Event Viewer Capture remote CDF trace using CDFControl (CTX111961) DEMO: Capturing a remote CDF trace with CDFControl Let’s begin by reviewing one issue that can potentially affect the XD environment, and that’s failure to register a VDA So what are some things that can go wrong with VDA registration? Time between VDA and DDC not in sync Kerberos tickets are time stamped and rejected if not valid to avoid replay attacks Personal firewall on the VDA Prevents DDC from calling the VDA – uses port 8080 by default DNS lookups fail VDA reads DDC’s hostname from AD and uses DNS to resolve its IP address DDC uses peer IP address – new in XD3 (used to look up IP address which can fail due to caching) VDA locked down VDA needs to authenticate connection – DDC needs “access this computer from the network” privilege ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Use XDPing to check for time sync issues Check port connectivity (Telnet, XDPing, CtxPrtChk) Check resultant set of policy for AD inconsistencies Example Check Event Viewer Capture remote CDF trace using CDFControl (CTX111961) DEMO – Capturing a remote CDF trace with CDFControl Citrix Confidential - Do Not Distribute

24

25 Debugging VDA Connectivity
Depending on environmental factors, VDA registration problems may occur if a dependent component is not functioning as expected Debugging VDA Connectivity

26 Debugging VDA Connectivity
Leverage XDPing (CTX123278) to rule out common causes Capture remote CDF trace Check for display driver switching issues Ensure no WDDM display driver is being used Try reducing screen resolution and color depth Some additional items to lookout for: Check for Winlogon hangs (synchronization problems) Check lingering applications that might not send appropriate display change messages when transtioning to and from ICA (WM_DISPLAYCHANGE message should be sent to all top-level windows at session disconnection) Collect complete memory dump and MPSReports Citrix Confidential - Do Not Distribute

27 The Global Assembly Cache
Switching gears a bit, let us focus on another component on a VDA that can ultimately lead to a connectivity failure. What can lead to how? You might be asking... The Global Assembly Cache

28 The Global Assembly Cache (GAC)
Stores assemblies specifically designated to be shared by several applications Citrix XenDesktop VDA and DDC services use .NET, and rely on the GAC for shared assemblies On a computer where CLR exists, has a machine-wide code cache called the global assembly cache (GAC). The global assembly cache stores assemblies specifically designated to be shared by several applications on the computer Citrix XenDesktop VDA and DDC services use .NET, and rely on the GAC for shared assemblies If a GAC assembly load failure occurs, then the component referencing the library encounters an exception In the case of the Citrix services, these type of exceptions typically cause the offending service to terminate, or fail to start Reference: Further information: Citrix Confidential - Do Not Distribute

29 The Global Assembly Cache (GAC)
Debugging When? .NET components fail to start or encounters an exception, such as: How? Enable Fusion Logging (Registry Based Setting) The GAC Utility Comes with Microsoft Visual Studio IDE Can be used to reinstall GAC components FileNotFoundException BadImageFormatException FileLoadException So how do you know when to check for GAC related problems, and how? When .NET components (such as WCF .NET services) fail to start or encounters an exception, such as: FileNotFoundException, FileLoadException, BadImageFormatException How? Enable Fusion Logging User a debugger and attach to the service for a live debug The GAC Utility Comes with Microsoft Visual Studio IDE Can be used to reinstall GAC components (seems to be useful technique in re-registering components) Citrix Confidential - Do Not Distribute

30 Debugging a Windows Service
So you want to just debug the service instead of leveraging Fusion Logging? Attach to the service using a debugger, such as Windbg.exe from the Microsoft Debugging Tools For Windows Package Configure the service to start with a debugger attached See Microsoft KB article KB for detailed debugger setup instructions So you've made it as far as getting the debugger attached! Load the Son of Shrike CLR debugging extenson – SOS.dll .loadby sos mscorwks View all threads to see if any contains an exception !threads If exception exists, switch to that thread and view the exception details ~Xs where X is the actual number (index) of the thread !PrintException Use –nested if the debugger warns about it! Citrix Confidential - Do Not Distribute

31 The Black Hole Problem (XenApp)
So enough about GAC, how many of you ever heard of the black hole problem that could potentially affect XenApp? The Black Hole Problem (XenApp)

32 The XenApp “Black Hole” Problem
Connections are routed to the Least Loaded Server in the Farm An underlying problems exists on the Least Loaded server Least Loaded Server still responds to IMA heartbeat pings ZDC gets the pings and routes to the broken server, causing a "Black Hole” effect Connections are routed to the Least Loaded Server in the Farm XenApp servers use IMA heartbeat ping to communicate with member servers to ensure they are alive If underlying problems exists on a Least Loaded XenApp server, such as with RPC or Terminal Servces, then that server might refuse to accept new connections despite the fact that it can respond to heartbeat pings So all connections are innocently routed to the broken server, causing them to get sucked into a "Black Hole" Citrix Confidential - Do Not Distribute

33 The XenApp “Black Hole” Problem
What are the 3 most important XenApp servers, from a connecting user's standpoint? How to quickly validate the health of these 3 servers? How to monitor Farm Health using MedEvac (CTX119899) Runs tests against: Terminal Services RPC XML Least Loaded Server What are the 3 most important XenApp servers, from a connecting user's standpoint? Zone Data Collectors XML Broker Least Loaded Server What is the fastest way to validate the health of these 3 servers? MedEvac How to monitor Farm Health using MedEvac (CTX119899) Runs tests against: Terminal Services RPC XML Citrix Confidential - Do Not Distribute

34 What about XenDesktop? DEMO: Xnapshot (Sneak Peek)
The XenDesktop architecture does not have a similar "Black Hole" concern due to the nature of its architecture (i.e. Users have their own desktops, which they can reboot) A similar tool to MedEvac is currently under development, that will profile and troubleshoot the entire XenDesktop environment: Xnapshot As the name suggests, it will capture a snapshot of the environment details for later comparison Leverages other useful utilities Citrix Confidential - Do Not Distribute

35 A typical customer support engineer
A reasonable customer I got a problem I need information A Here it is When can you fix it? I need information B I need information X “He does not know what he is doing”

36 Have you changed anything ?
A typical customer support engineer A capable customer I got a problem Have you changed anything ? Hmmm. Not really “I better not tell him what I did”

37 Xnapshot tool makes it easy
Next generation customer Take-it-easy support engineer “Smart – he knows what he is doing” We got all your information and we know what were changed.

38 Citrix Confidential - Do Not Distribute

39 Capturing Post-Mortem Memory Dumps
We see quite a number of cases related to API hooking DLL’s causing problems. We’ve even seen cases where our own hooking DLL’s have caused problems. To troubleshoot hooking related issues, we don’t need to know how to write hooks… Just some basic knowledge about how hooking works. So, an API can be considered to be a building block that can be used to build a program. So programs achieve a lot of their functionality by calling API’s. Other programs can achieve further functionality by “hooking” the original building block and instead return and use a “better” building block – or API… This can be considered API hooking… For example, the Citrix Mmhook.dll module can hook display related function calls on a server with a single physical monitor connected, to help render display to multiple displays on the client side… Capturing Post-Mortem Memory Dumps

40 Capturing Post-Mortem Memory Dumps
User and Kernel Space Windows uses 2 levels of protection to restrict access to areas of memory System memory is divided into 2 spaces: User Space Kernel Space Applications run in User Space Operating System code and Drivers run in Kernel Space Some key points: Processes have their own protected virtual address space If 2 processes are running, and one crashes, the other should remain running In the kernel, drivers and the OS all share the same virtual address space… Which means big problems can occur if something goes wrong there.

41 Capturing Post-Mortem Memory Dumps
User Dump Capture Setup a default post-mortem debugger to catch crashing applications How to Set the NT Symbolic Debugger as a Default Windows Postmortem Debugger (CTX105888) Check for the managed debugger under: HKLM\Software\Microsoft\.NETFramework Value: DbgManagedDebugger JIT Debugging (.NET) Automatically capturing a dump when a process crashes: ~snip: How do I configure the debugger? 1.       Download and install the latest “Debugging Tools for Windows.” a.       If you’re running a 64-bit OS, you’ll want both the 32- and 64-bit versions* b.      You can either install the entire set of tools on the machine (it’s a quick, small install) or you can install to one machine and copy “cdb.exe” from the install directory to any number of target machines. *Note: My sample .reg files below assume you install the 32-bit debugger to c:\debuggers\x86\ and the 64-bit version to c:\debuggers\x64\. 2.       Create/set the following registry keys and values (If you’re working on a 64-bit version of Windows, you’ll need to set these keys under the Wow6432node as well.†): a.       Key: HKLM\Software\Microsoft\Windows NT\Current Version\AeDebug:                                                                i.      Value: “Debugger” 1.       Type: String 2.       Value data: <path to cdb> -pv -p %ld -c “.dump /u /ma <dump file path\name.dmp>;.kill;qd"                                                              ii.      Value: “Auto” 2.       Value data: “1” b.      Key: HKLM\Software\Microsoft\.NETFramework                                                                i.      Value: “DbgManagedDebugger 2.       Value data: <path to cdb> -pv -p %ld -c ".dump /u /ma <dump file path\name.dmp>;.kill;qd"                                                              ii.      Value: DbgJITDebugLaunchSetting 1.       Type: DWORD (32-bit) 2.       Value data: 2 †Note: You should set the keys to point to the appropriate “bitness” debugger. I.e. you want the OS/CLR to launch the 64-bit debugger for 64-bit process crashes and the 32-bit version for 32-bit crashes. Make sure your debugger paths are set accordingly.

42 Capturing Post-Mortem Memory Dumps
System Dump Capture Configure in Startup and Recovery settings Ensure pagefile can store dump See MS Article cc for registry The above registry settings can be used to set the complete memory dump option when using more than 4GB of RAM.

43 Capturing Post-Mortem Memory Dumps
System Dump Capture Dedicated Dump Drive (Windows 7+) Location:   HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl Name: DedicatedDumpFile Type:  REG_SZ Value: Dump path, such as D:\dedicateddumpfile.sys How to Recover Windows Kernel Level Dump Files from Provisioned Target CTX123642 Citrix Confidential - Do Not Distribute

44 Capturing Post-Mortem Memory Dumps
System Dump Capture Tools The NMI switch (MS KB927069) Keyboard Initiated (MS FF545499) SystemDump 3.1 for 32-bit and 64-bit platforms (CTX111072) Citrix Confidential - Do Not Distribute

45 Intelligent Load Balancing(XenApp)
A vast majority of the Citrix community is familiar with checking Farm load levels using "QFARM /LOAD" Recently, for customers upgrading to the most recent versions of XenApp notice some anomalies when it comes to user load being evenly dispersed amongst their servers This can be even more noticeable when using a moving average load (like CPU/Memory) The main problem here is relying on just "QFARM /LOAD" alone is not a fair way to monitor load, when ILB is enabled Intelligent Load Balancing(XenApp)

46 Citrix Confidential - Do Not Distribute
How ILB Works Works by giving logons a higher load bias Default ILB algorithm assigns a bias of ½ the remaining load Default algorithm: Current Load += [(Max Load – Current Load) / 2] The ILB adjusts itself back down after pending logons are complete! Troubleshooting Load Balancing Issues CTX112082 Citrix Confidential - Do Not Distribute

47 Intelligent Load Balancing
Debugging Use "QFARM /LTLOAD" to obtain the "load throttle" load level Check for slow logons Adjust/disable ILB setting Capture a CDF trace of the LMS subsystem Use "QFARM /LTLOAD" to obtain the "load throttle" load level of the servers in the XenApp Farm Check for applications slowing down the logon process, and also for profile related issues (which can slow down the logon process) If an underlying problem is causing slow logons, then it wll impact the ILB value which can cause the server to have a higher load Determine root cause of slow logon, or adjust/disable ILB setting Capture a CDF trace of the LMS subsystem and perform a clear test showing: User and expected destination server based on “QFARM /LTLOAD”, and the actual server where the connection went Citrix Confidential - Do Not Distribute

48 We see quite a number of cases related to API hooking DLL’s causing problems. We’ve even seen cases where our own hooking DLL’s have caused problems. To troubleshoot hooking related issues, we don’t need to know how to write hooks… Just some basic knowledge about how hooking works. So, an API can be considered to be a building block that can be used to build a program. So programs achieve a lot of their functionality by calling API’s. Other programs can achieve further functionality by “hooking” the original building block and instead return and use a “better” building block – or API… This can be considered API hooking… For example, the Citrix Mmhook.dll module can hook display related function calls on a server with a single physical monitor connected, to help render display to multiple displays on the client side… API Hooking

49 CTX107825 – HOW TO DISABLE CITRIX HOOKS ON A PER APP BASIS
To Hook or not to Hook Try capturing a CDF trace on the particular component for deeper insight Example: MF_Hook_SCardHook Try excluding the application from the hook to see how it behaves If it runs fine based on testing, then leave it excluded Both XenDesktop and XenApp use features which rely on API hooking API hooking is when an executable program makes a function call destined for a specific memory address, but instead ends up calling another function in its place, which finally ends up calling the intended function This is sometimes necessary to implement features based on certain architectural requirements (such as to remote certain function calls over the network) This is sometimes not required, and depending on the type of hook, may cause compatibility issues for some applications CTX – HOW TO DISABLE CITRIX HOOKS ON A PER APP BASIS Citrix Confidential - Do Not Distribute

50 XenApp Health Monitoring and Recovery
Runs health checks against individual XenApp servers Can take action based on test failure Can run custom tests! Can run Citrix health checks against individual XenApp servers Can take action based on test failure, such as removing it from Load Balancing, or shutting it down Can run custom tests! Demo: Using CtxLicChk to proactively monitor the License Server's ability to dispense a license Citrix Confidential - Do Not Distribute

51 The Windows Performance Tools
Starting with Windows 7, the Windows Performance Tools leverage the Event Tracing for Windows Subsystem in a way that allows Administrators to collect a wealth of data. Information such as top DLL calls, startup information, stack traces and more. The Event Tracing subsystem is a high performance tracing mechanism that’s built into the OS and runs in the kernel, so it works well even for production environments. What’s nice about the Windows Performance Tools, is that the Xperf Viewer (which comes with it) automagically downloads the required symbols needed to parse the trace data! The Windows Performance Tools

52 WPT: The Big Picture XPerfView XPerf ETW XML file Merged ETL file
6. CLI trace analysis via actions 5. GUI trace analysis via graphs and summary tables Merged ETL file 3. Controls logging sessions and enables/disables providers Action System and Symbol Information XPerf Control/Status Control/Status Post Processing 4. Metadata injection Control/Status ETW ETL file ETW Session Event Providers Event Providers 1. Collection of configurable in-memory buffers that is managed by the kernel 2. Any component that has been instrumented with Event Tracing API Control/Status Data flow Citrix Confidential - Do Not Distribute

53 The Windows Performance Tools (WPT)
Citrix Confidential - Do Not Distribute

54 XenDesktop & XenApp Core Services
DDC: Pool Manager Service (CdsPoolMgr.exe) XML Service IMA Desktop Delivery Service (CDSController.exe) VDA: Workstaton Agent PortICA CtxSvcHost XML ZDC: IMA XML Service MEMBER XA SERVER: XML Citrix Confidential - Do Not Distribute

55 TechEdge Survey, Video Postings & PPTs
The TechEdge survey will be ed out to end-user customers If you complete the survey, you will be entered to win a $250 Amazon gift card. The winner will be announced June 1st. View TechEdge videos & PPTs on the Knowledge Center by Monday, May17th

56


Download ppt "Debugging Citrix XenDesktop & XenApp"

Similar presentations


Ads by Google