Migrate VMs behind Standard Load Balancer to another region with Azure Site Recovery
Published Aug 31 2021 12:35 AM 7,351 Views
Microsoft

The original Japanese Edition is here.

https://logico-jp.io/2021/08/30/use-azure-site-recovery-to-migrate-virtual-machines-behind-standard-...

 

Medium edition is here.

https://medium.com/microsoftazure/migrate-vms-behind-standard-load-balancer-to-another-region-with-a...

 

The other day, one of my customers asked me the following question.

 

Inquiry from customer

 

We have a system which consists of Azure Load Balancer and two VMs behind the load balancer. To meet our rules around BCDR (business continuity & disaster recovery), we would like to migrate this system with Azure Site Recovery (ASR), but the issue of “Site Recovery configuration failed (151196)” happens and prevents us from configuring ASR. What is the root cause? Do you have any workarounds or solutions?

As this inquiry is not clear for me, I asked them to elaborate the condition and issue.

  • They use Standard Load Balancer.
  • ExpressRoute is used to connect between their on-premise environment and Azure, and forced tunneling is enabled.
  • Their application running VMs uses Table storage as a data source. They have already configured Service Endpoint for Table storage.
  • As state is not shared between VMs, simple migration from one VM to another is required.

The following diagram seems to reflect customer’s environment.

image-23[1].png

 

VNet connected to ExpressRoute is not Hub network, so integration between ExpressRoute and Site Recovery, which is described in the following URL, is not required in this case.

 

Integrate ExpressRoute with disaster recovery for Azure VMs

https://docs.microsoft.com/azure/site-recovery/azure-vm-disaster-recovery-with-expressroute

 

Cause

If you are familiar with Azure, you would detect the root cause at once.

Standard Load Balancer prevents VMs behind the load balancer from accessing outside located VNet. So, configuration for accessing ASR related resources outside VNet is required. Indeed forced tunneling is configured, but this configuration does not work behind Standard Load Balancer.

 

This is mentioned in the document.

 

Issue 2: Site Recovery configuration failed (151196)

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-troubleshoot-network-connectivity#issu...

 

If the VMs are behind a Standard internal load balancer, by default, it wouldn’t have access to the Microsoft 365 IPs such as login.microsoftonline.com. Either change it to Basic internal load balancer type or create outbound access as mentioned in the article Configure load balancing and outbound rules in Standard Load Balancer using Azure CLI.

ASR needs access to Azure Active Directory services such as login.microsoftonline.com, but configuration for accessing such services was not done. Forced tunneling lets you redirect or “force” all Internet-bound traffic back to your on-premises location, and default gateway is advertised from on-premise side. However, forced tunneling does not work for VMs behind Standard Load Balancer.

 

Outbound connectivity

Outbound connectivity from VMs is listed below. These are required when replicating VMs with Azure Site Recovery.

Storage *.blob.core.windows.net
Azure Active Directory login.microsoftonline.com
Replication *.hypervrecoverymanager.windowsazure.com
Service Bus *.servicebus.windows.net

 

This is mentioned in the following document.

 

Troubleshoot Azure-to-Azure VM network connectivity issues

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-troubleshoot-network-connectivity

 

Solutions

We have the following options to establish outbound connectivity required for replicating VMs with Azure Site Recovery.

  1. Replace Standard Load Balancer with Basic Load Balancer.
  2. Assign public IPs to VMs behind Standard Load Balancer.
  3. Assign NAT Gateway to subnet where VMs connect.
  4. Add Public Load Balancer and configure outbound rule from VMs.
  5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.
  6. Use Service Endpoint and Private Endpoint to open routes to required services.

 

1. Replace Standard Load Balancer with Basic Load Balancer.

Basic Load Balancer permits VMs behind load balancer to connect outside VNet, while Standard Load Balancer doesn’t.

 

Azure Load Balancer SKUs

https://docs.microsoft.com/azure/load-balancer/skus

 
image-22[1].png

 

When forced tunneling is enabled, replication traffic leaves the Azure boundary (i.e. is gone to the Internet). As the following document says, this configuration is not recommended. Iit is okay if forced tunneling is disabled.

 

Forced tunneling

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-about-networking#forced-tunneling

 

2. Assign public IPs to VMs behind Standard Load Balancer.

Public IPs are assigned to both VMs to access directly outside VNet.

image-21[1].png

 

This solution means not only outbound traffic from VMs goes but also inbound traffic to VMs from outside VNet comes. So, the following configuration is mandatory.

  • NSG (Network Security Group) should be configured to manage inbound/outbound traffic.
  • It is simpler to assign NSG to subnet where VMs connect than to assign NSG to each NIC of VM.

If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

 

3. Assign NAT Gateway to subnet where VMs connect.

Instead of assigning public IP addresses to VMs, NAT gateway is assigned to the subnet where VMs connect.

image-17[1].png

 

NAT gateway works for outbound access and inbound traffic cannot use public IP address(es) assigned to NAT gateway. So, NAT gateway prevents VMs to being accessed from outside VNet.

If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

 

Virtual Network NAT Documentation 

https://docs.microsoft.com/azure/virtual-network/nat-gateway/

 

4. Add Public Load Balancer and configure outbound rule from VMs.

Public Load Balancer and outbound rule allow us to configure to permit outbound traffic from VMs behind the load balancer.

image-18[1].png

 

This solution is similar to the 2nd and 3rd solution, but this is the most expensive than the 2nd and the 3rd. If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

 

5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.

Azure Firewall allows us to managed inbound/outbound traffic from/to VMs. And default route of the subnet where VMs connect is changed to Azure Firewall with UDR (User Defined Route).

image-19[1].png

Azure Firewall allows us to manage inbound/outbound traffic with not only IP address(es) and FQDN but also FQDN, while NSG does not with FQDN. If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.

Indeed Azure Firewall is powerful, but this option is the most expensive of all mentioned in this entry.

 

6. Use Service Endpoint and Private Endpoint to open routes to required services.

Instead of assigning public IP address(es) to either VMs or the subnet, routes to services required for ASR replication are opened with Service Endpoint and Private Endpoint.

image-25[1].png

The following document describes how to enable replication with private endpoints.

 

Replicate machines with private endpoints

https://docs.microsoft.com/azure/site-recovery/azure-to-azure-how-to-enable-replication-private-endp...

 

Services required for ASR replication and what option(s) are acceptable are listed below.

  • Azure Active Directory: Service Endpoint (if explicitly access to Microsoft 365 should be permitted, NAT Gateway is the best solution.)
  • Service Bus : Service Endpoint only (As destination is not clear, Service Endpoint is the only option.)
  • Storage Service: Either Service Endpoint or Private Endpoint
  • Recovery Service Container: Private Endpoint Only

This solution is ideal thanks to the following reasons.

  • All traffic does not leave Azure boundary.
  • Public IP addresses for NAT Gateway are required.
  • Cost effective.

Note the following points when configuring this solution.

  • Depending upon storage account SKU (premium or standard) used for cache storage, storage account roles to be granted to managed identity of Recovery Service Container varies.
Storage SKU Roles to be granted
Standard

Contributor

Storage BLOB Data Contributor

Premium

Contributor

Storage BLOB Data Owner

 

  • In the URL above, configuring private endpoint to cache storage is optional. In this case, however, we have to configure Private Endpoint or Service Endpoint to cache storage as VMs are behind Standard Load Balancer.

Summary and customer decision

We have several options to solve this situation and each option has pros/cons. After explaining these options to the customer, they made a decision to choose option #6.

  Does traffic leave Azure boundary even though choosing Microsoft network routing?  Is public IP needed?  Cost Configuration points Remarks
1 Yes in some cases. No Outgoing traffic cost might increase. On-premise firewall rules In case of using forced tunneling, storage replication traffic goes to the Internet.
2 No Yes   NSG
(Inbound/Outbound)
 
3 No Yes   NSG
(Especially outbound)
 
4 No Yes   Public Load Balancer
(Outbound rule)
 
5 No Yes Azure Firewall is expensive.

UDR

Firewall rules

 
6 No Yes if NAT Gateway is used.   Grant role to managed identity of recovery container.  

 

2 Comments
Co-Authors
Version history
Last update:
‎Jul 31 2023 03:15 AM
Updated by: