This post was co-authored by Fernando Klurfan, Principal Product Manager, and Georgy Momchilov, Distinguished Engineer.

Service Continuity is a major redesign of the Citrix Virtual Apps and Desktops product line, and it is coming full force into Citrix Workspace.

Did you know airbags were invented by two British dentists to prevent jaw injuries during collisions in World War I? The airbag’s productization story is filled with anecdotes, incredibility, and technological breakthroughs that have made this safety feature ubiquitous in automobiles.

What does this have to do with cloud computing? Well, what if there was a kind of airbag or safety net for cloud-based products? What if there was something that could protect organizations from outages?

During one of our typical Tuesday product management meetings a while back, Citrix executives Sridhar Mullapudi and Juan Rivera presented a remarkably simple diagram. Looking back, I think of this as a “Doc Brown” moment: a flux capacitor singularity.

Their diagram showed a cloud outage scenario where users were still able to launch apps.

This was the inception of a project we called “Shield,” and when we started it, we knew it was going to require determination, tenacity, and momentum.

Full disclosure: Sometimes I had my doubts, but our architect Georgy Momchilov was the inspirational driving force. The etymology of the word architect can be traced to Greek, master builder – i.e., a person skilled in the art of building, one who plans, designs, and supervises construction.

So, together with our “art-chitect” and the most fearsome group of engineers in the company (across the U.S., China, India, and England), we embarked on a project we hope will change the way you work, with many more benefits beyond outage protection.

What Is Service Continuity?

Service Continuity is not a new feature or a new protocol. It is an innovative approach to building products in Citrix Cloud, adding design-for-failure to the Citrix Virtual Apps and Desktops service architecture.

This product evolution is the outcome of key learnings from scaling and servicing Citrix Virtual Apps and Desktops for thousands of customers over the years and striving to provide customers with “more 9s.”

The way we broker connections has evolved. Instead of gaining authorization to access a resource and subsequently connecting to such resource, Service Continuity decouples the two stages so they do not have to be executed synchronously.

Service Continuity is designed to remove (or, if not possible, minimize) the dependency on the availability of the components involved in the connection process. That means that users can still launch their virtual apps and desktops, regardless of the cloud services health status.

Even if Citrix Workspace service is down. Even if our cloud broker is down. Even if your internet is down (provided users still have network connection to a resource location).

It is not designed to make each Citrix Cloud service highly available/scalable. Those initiatives are driven in parallel by every engineering group within Citrix. As proof, in Q4 2020 we increased our SLA.

But Service Continuity is more than an insurance policy we give to our Citrix Workspace customers.

It is designed to deliver higher levels of reliability than any existing cloud or on-prem deployment.

Let me repeat that. Even higher reliability than on-prem Citrix Virtual Apps and Desktops by removing on-prem access layer boxes (Citrix ADC, Citrix StoreFront) and with a new brokering mechanism.

The project was guided by three main principles: resiliency, security, and performance. Every decision we took fell into one of these buckets. Backward compatibility was also a key consideration (in other words, customers do not need to upgrade their VDA workloads to reap the benefits). The only requirement is a Citrix Workspace app upgrade on the user’s device. Citrix takes care of everything else.

How Does Service Continuity Work?

Service Continuity is an extension of the Citrix Virtual Apps and Desktops service and is designed for Citrix Cloud customers who leverage Citrix Workspace and Citrix Gateway service.

The pillar of this Service Continuity effort is a new type of .ICA file, called Workspace connection lease (or CL for short). Let’s walk through an example. When a user first logs in to Citrix Workspace on a Monday morning, Citrix Workspace app also makes a request for CLs that are generated by Citrix Cloud, which are then stored securely on the user’s machine. (Existing CLs are refreshed/extended every time the user logs into Citrix Workspace.)

CLs are long-lived authorization tokens, caching every entitlement that is published to the user (regardless of a previous successful launch) coupled with information on the Resource Location.

Let us take a closer look at what this means:

  • Long-lived: The IT admin can configure the CL validity period to be between one and 30 days.
  • Authorization: The CLs are eventually redeemed with either a Citrix Gateway service PoP or a Connector in the Resource Location, which then grant authorization for a network connection to go through. If users have valid CLs, Citrix Gateway service and/or a Connector concludes that they have passed the authentication process with Citrix Workspace and the connection should be permitted. User authentication is deferred to the VDA, which presents the Windows Logon UI and prompts the user one last time for AD credentials or Smart Card PIN. Because CLs are currently a fallback path, this happens only during outages.
  • Enumeration: CLs contain enumeration of every published resource entitlement that is leasable. However, the visual representation of published resources in Citrix Workspace app, including in offline conditions, is achieved based on separate new Progressive Web App and Service Worker technology. Icons for currently non-leasable apps (such as SaaS apps coming from Citrix Secure Workspace Access) are greyed out.
  • Tokens: CLs make session launch stateless, which means they are nothing more than client-side data, removing dependencies on any outbound service.

Security

A substantial portion of the security architecture of Service Continuity is based on public-key cryptography (both public-key encryption and digital signature), without relying on a Public Key Infrastructure (PKI). The CLs are signed by Citrix Cloud, are user and device bound, and the sensitive payload in the CLs is encrypted, which means the CLs cannot be tampered with or viewed by unauthorized entities.

The Unexpected: An Outage

There are multiple ways a CL can spring into action in the current phase one of our Service Continuity efforts. By default, the Citrix Workspace app still relies on .ICA files first and the online brokering process you know today.

(For now! CLs are designed to operate the same way either during an outage or during normal conditions, so eventually they will supersede .ICA files even for cloud-online scenarios.)

Today, if Workspace Service <–> Identity Service <–> 3rd party IDPs <–> Cloud Broker <–> Ticketing Services <–> Gateway Service are all online, .ICA files are used.

But what happens if one of these components break?

  • Workspace outage: This is most evident to end users because their Citrix Workspace app cannot load the https://<mydomain>.cloud.com store, and it is the earliest form of an outage users experience. The Citrix Workspace app UI will automatically enter “outage mode” and enumerate the apps based on local cache.
  • Identity Service outage: If an authentication cannot be completed, a user can cancel the store logon UI. Citrix Workspace app will also enter “outage mode” by enumerating the icons for apps that are leasable from the local cache.
  • Cloud Broker outage: The Remote Broker service in the Connector in the Resource Location reports an error and enters “pending outage” mode. It continues to probe the Cloud Broker until it finally enters “initial outage mode.” The HA service in the Connector takes over, and all brokering responsibilities are transferred to it. A VDA re-registration storm takes place, where all the VDAs that were registered with the principal Broker now register with the HA Broker. During this re-registration period, some users might not be able to launch their apps. Does this sound like Local Host Cache (LHC)? Almost! We see Service Continuity as the Workspace evolution of LHC, without requiring any on-prem access layer (i.e. no Citrix StoreFront or Citrix ADC). Future versions of Service Continuity will aim at removing the LHC dependency.
  • Ticketing Service outage: If this cloud service (or any other involved in .ICA file generation) is down, Citrix Workspace app fails to request an .ICA file or fails to launch with an .ICA file. CLs are now invoked as a fallback.
  • Citrix Gateway service outage: This is the most interesting scenario because external users have no other option but to connect via Citrix Gateway service (CGS). That means that Service Continuity is only as good as the CGS cloud availability, which is multi-cloud and multi-PoP. After reviewing every single outage in the last two years, we feel confident that CGS outages will not cause a launch failure. Let me explain. Citrix uses intelligent traffic management for DNS services in Citrix Gateway service, so we can steer requests based on real-time performance data. Basically g.nssvc.net is our global FQDN, and ITM acts as authoritative DNS for nssvc.net. When clients resolve global Citrix Gateway service FQDN, the query will land on ITM, which returns the IP of the closest healthy gateway service PoP. If any PoP goes down, the Controller takes it out of rotation.Because the CLs are tokens, Citrix Gateway service does not need to contact any additional service in Citrix Cloud and can autonomously authorize a connection, as well as failover an existing connection to another Citrix Gateway service PoP if necessary, without any database dependency. (There is no dependency on STA Service.)This is a recipe for a stateless Citrix Gateway service cloud — exactly what we need for Service Continuity to work!
  • Internet outage: This is the holy grail of continuity. If your network is configured in a way that still allows users to ping the Resource Location directly (MPLS or VPN links), then the Citrix Workspace app can contact the Connector and VDA over TCP 2598 and can launch the session without going through the Citrix Gateway service. You can configure a Resource Location to be both external and internal at the same time, and Citrix Workspace app will try to connect directly. If that fails, it will use Citrix Gateway service.

You Are in Control

Think about the last point for a second. If you can guarantee connectivity to any Resource Location where the app and/or desktop is published, internal users are protected against the most catastrophic type of outages, and no dependency on any external component (ISP or cloud) is introduced in the launch path.

As long as the Connectors and VDAs are online, users can stay productive. Using the CLs, Citrix Workspace app recursively attempts to connect to a Resource Location until one is found where a VDA can host a session.

You now have ultimate control!

Preventive Maintenance

We have enhanced Citrix Workspace app with better built-in diagnostic tools and UI error messages, codes, correlation IDs, so in the rare occurrence you need Citrix technical support, they can quickly troubleshoot and resolve your issue.

In addition, Citrix added information about the session launch in our Workspace app Connection Center. In the future, most importantly, information will also be included in Citrix Analytics for Performance.

Sign-up for Public Tech Preview

So, go ahead, sign-up here and start testing the public tech preview! We will send you a step-by-step companion guide on how to configure, simulate outages and validate the multiple functionalities.

What’s Next?

Service Continuity is an innovation that will transform how organizations provide digital workspaces, and how they handle outages, helping them to maintain the best possible availability for their end users.

Service Continuity is also a new dimension in the Citrix Workspace portfolio, and we plan to expand this soon to other Citrix Workspace app platforms (Linux, Mac and Workspace app for Web).

Connection Leases are also extensible to other Workspace offerings, so stand by. We have your back there too.

If you have any questions, contact the product team at servicecontinuity@citrix.com. For more information, check out this Citrix Tech Zone article and product documentation. And look out for Part 2, which will include more technical information.

Happy Holidays, and adios 2020!


Disclaimer: The development, release and timing of any features or functionality described for our products remains at our sole discretion and are subject to change without notice or consultation. The information provided is for informational purposes only and is not a commitment, promise or legal obligation to deliver any material, code or functionality and should not be relied upon in making purchasing decisions or incorporated into any contract.