VMware Horizon/AppVolumes LB with HAProxy and Keepalived on PhotonOS

At the time of writing, VMware Horizon provides a built-in “Load-Balancer/High-Availability” option only for the Unified Access Gateway. Unfortunately if you want your Horizon Connection servers or your AppVolumes managers to be “Load-Balanced/High-Available” you have to rely on other VMware or 3rd party solutions. For my homelab I wanted to have the same experience of having “Load-Balanced/High-Available” for my connection servers and appvolume managers, so I could get hands-on experience with those setups and do upgrades with zero downtime of course.

So I came up with the idea to use the VMware Photon OS appliance, put HAProxy and KeepAlived on it and get it up and running. Click here to read more about VMware PhotonOS. In the end, this is how my environment should look like:

Prepare the servers

I wanted to have load balancing but also high availability, so I needed 2 servers. Therefore, the following steps need to be executed twice. I choose the OVA-hw13_uefi download of PhotonOS 3.0 revision 3, which is a minimal PhotonOS version, optimized for running on VMware vSphere with minimal resources.

(If you only want load balancing in your environment, you can limit your setup to a single server and skip the parts where I install and configure Keepalived.)

  • Import the OVA in your vCenter or ESXi host
  • By default, this OVA of PhotonOS is setup to acquire a DHCP address and SSH is enabled at startup
  • Get the IP address of the new VM and connect to it with an SSH client
  • Login with user “root” and password “changeme”. You’ll be prompted to change the password immediately to a new password.
  • First thing I did is update my VM with the latest security patches. PhotonOS uses tdnf as default package manager, which is a customized version of the DNF package manager without the Python dependencies. I also prefer to use the nano editor to change configuration files, so I installed that package also.
tdnf upgrade -y
tdnf install nano -y
  • Next thing is to change networking to have a static IP. By default, there’s a network configuration file in /etc/systemd/network called 99-dhcp-en.network which specifies to use DHCP on all network adapters.
  • Edit the default “/etc/systemd/network/99-dhcp-en.network” file to disable DHCP:
[Match]
Name=e*
[Network]
DHCP=no
  • Create a new file “/etc/systemd/network/10-static-en.network”
[Match]
Name=eth0
[Network]
Address=192.168.1.251/24  # 192.168.1.252 for the second server
Gateway=192.168.1.254
DNS=192.168.1.254
[DHCP]
UseDNS=false
  • Change the owner of the new file:
chown systemd-network:systemd-network /etc/systemd/network/10-static-en.network

To be able to use the virtual IP in both Keepalived and HAProxy, I needed to make some changes to allow ipv4 forwarding and to allow both services to use an IP that is not defined on a physical interface (virtual IP’s). By default this is disabled on PhotonOS. I’ll enable those by creating a new file in /etc/sysctl.d called 55-keepalived.conf and put the following lines in it:

#Enable IPv4 Forwarding
net.ipv4.ip_forward = 1
#Enable non-local IP bind
net.ipv4.ip_nonlocal_bind = 1

You’ll notice there’s already a file call 50-security-hardening.conf in the same folder. By using a higher number for our new configuration file it’s possible to overwrite some settings that are already defined in the default file.

Final change I needed to make is the iptables configuration to allow http/https access. Therefore I changed the file /etc/systemd/scripts/ip4save (add the highlighted lines). The reason to allow access on port 8404 will become clear later on.

# init
*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT DROP [0:0]
# Allow local-only connections
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
#keep commented till upgrade issues are sorted
#-A INPUT -j LOG --log-prefix "FIREWALL:INPUT "
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT
-A INPUT -p tcp --dport 80 -j ACCEPT
-A INPUT -p tcp --dport 443 -j ACCEPT
-A INPUT -p tcp --dport 8404 -j ACCEPT

-A OUTPUT -j ACCEPT
COMMIT

To enable the new settings I rebooted my VM. I repeated all of the above steps for the second server.

High-Availability with Keepalived

To create a high available load balancer, I used Keepalived to create an master and backup node for HAProxy. Keepalived uses VRRP (Virtual Router Redundancy Protocol) to assign a virtual IP (VIP) to the master node so that HAProxy (or any other service you need) is always available.

Installation is pretty easy:

tdnf install keepalived -y

Once installed, there’s a default keepalived.conf file in /etc/keepalived. I renamed that file to keepalived.conf.orig. You can have a look at the default configuration file to check things like alerts and so, which I won’t go into in this post.
I created a new keepalived.conf file and put the following configuration in it on PhotonLB1 (which is being setup as the MASTER peer)

! Configuration File for keepalived

global_defs {
   router_id PhotonLB1
   vrrp_skip_check_adv_addr
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
  script "/usr/bin/kill -0 haproxy"
  interval 2
  weight 2
}
vrrp_instance LB_VIP {
  interface eth0
  state MASTER                  # BACKUP on PhotonLB2
  priority 101                  # 100 on PhotonLB2
  virtual_router_id 11          # same on all peers
  authentication {              # same on all peers
    auth_type AH
    auth_pass Pass1234
  }
  unicast_src_ip 192.168.1.251    # real IP of MASTER peer
  unicast_peer {
    192.168.1.252                 # real IP of BACKUP peer
  }
  virtual_ipaddress {
    192.168.1.250                 # Virtual IP for HAProxy loadbalancer
    192.168.1.20                  # Virtual IP for Horizon
    192.168.1.30                  # Virtual IP for AppVolumes Manager
  }
  track_script {
    chk_haproxy                 # if HAProxy is not running on this peer, start failover
  }
}

On PhotonLB2 (the BACKUP peer) the configuration is a little bit different

! Configuration File for keepalived

global_defs {
   router_id PhotonLB2
   vrrp_skip_check_adv_addr
   vrrp_garp_interval 0
   vrrp_gna_interval 0
}
vrrp_script chk_haproxy {
  script "/usr/bin/kill -0 haproxy"
  interval 2
  weight 2
}
vrrp_instance LB_VIP {
  interface eth0
  state BACKUP                  # MASTER on PhotonLB1
  priority 100                   # 101 on PhotonLB1
  virtual_router_id 11          # same on all peers
  authentication {              # same on all peers
    auth_type AH
    auth_pass Pass1234
  }
  unicast_src_ip 192.168.1.252    # real IP of BACKUP peer
  unicast_peer {
    192.168.1.251                 # real IP of MASTER peer
  }
  virtual_ipaddress {
    192.168.1.250                 # Virtual IP for HAProxy loadbalancer
    192.168.1.20                  # Virtual IP for Horizon
    192.168.1.30                  # Virtual IP for AppVolumes Manager
  }
  track_script {
    chk_haproxy                 # if HAProxy is not running on this peer, start failover
  }
}

In short:

  • MASTER has priority 101, BACKUP has priority 100
  • Every 2 seconds Keepalived checks if HAProxy is running. If it’s running, priority raises by 2, if it’s not, priority lowers by 2
  • So while HAProxy is running on both peers, the priorites are MASTER:101+2, BACKUP:100+2. The highest priority wins and becomes MASTER.
  • If HAProxy stops on the MASTER, it’s priority lowers to 101, which will be lower than the BACUP (101 < 102), so BACKUP becomes the MASTER and the virtual IP’s move over to the other side. Once HAProxy is active again on the MASTER, all VIP’s move back to the MASTER peer.

At this point, the basic configuration of Keepalived is ready and should be able to start, although it will go into fault state on both peers because I haven’t installed HAProxy yet, so the tracking script will fail.

First, let’s start the service and see if it’s working:

systemctl start keepalived
journalctl -r

The last command shows the journal log in reverse order (so the newest entries on top). You can also open a new ssh session and start “journalctl -f” to show the log with live updates. This can be handy if you’re testing things.

In the journal log you should see that the service is started, but the tracking script fails because HAProxy isn’t installed yet. On the MASTER peer you will see the message “Entering MASTER STATE”. On the Backup peer you’ll see “Entering BACKUP STATE”.

Keepalived_vrrp[777]: Script `chk_haproxy` now returning 1
Keepalived_vrrp[777]: VRRP_Script(chk_haproxy) failed (exited with status 1)
Keepalived_vrrp[777]: (LB_VIP) Receive advertisement timeout
Keepalived_vrrp[777]: (LB_VIP) Entering MASTER STATE

Keepalived_vrrp[777]: (LB_VIP) setting VIPs.

Next, enable the Keepalived service to start at boot:

systemctl enable keepalived

Load balancing with HAProxy

Now let’s install HAProxy on PhotonOS:

tdnf install haproxy -y

Once installed, an example configuration file is set in “/etc/haproxy/haproxy.cfg”. I’ve renamed it and created a new blanc haproxy.cfg. Unlike the Keepalived configuration, which was different on both peers, the HAProxy configuration must be exactly the same on both nodes. So if you make changes, be sure to make them on both peers!

First create an extra directory where HAProxy will be chrooted

mkdir /var/lib/haproxy
chmod 755 /var/lib/haproxy

Next, create the following configuration in /etc/haproxy/haproxy.cfg.

# HAProxy configuration

#Global definitions
global
  chroot /var/lib/haproxy
  stats socket /var/lib/haproxy/stats
  daemon

defaults
  timeout connect 5s
  timeout client 30s
  timeout server 30s

### Statistics & Admin configuration ###
userlist stats-auth
  group admin   users admin
  user admin insecure-password LetMeIn
  group ro users stats
  user stats insecure-password ReadOnly
frontend stats-http8404
  mode http
  bind 192.168.1.250:8404
  default_backend statistics
backend statistics
  mode http
  stats enable
  stats show-legends
  stats show-node
  stats refresh 30s
  acl AUTH http_auth(stats-auth)
  acl AUTH_ADMIN http_auth_group(stats-auth) admin
  stats http-request auth unless AUTH
  stats admin if AUTH_ADMIN
  stats uri /stats
######

### Horizon Connection servers ###
frontend horizon-http
  mode http
  bind 192.168.1.20:80
  # Redirect http to https
  redirect scheme https if !{ ssl_fc }

frontend horizon-https
  mode tcp
  bind 192.168.1.20:443
  default_backend horizon
backend horizon
  mode tcp
  option ssl-hello-chk
  balance source
  server cs1 192.168.1.21:443 weight 1 check inter 30s fastinter 2s downinter 5s rise 3 fall 3
  server cs2 192.168.1.22:443 weight 1 check inter 30s fastinter 2s downinter 5s rise 3 fall 3
######

### AppVolume Managers ###
frontend appvol-http
  mode http
  bind 192.168.1.30:80
  redirect scheme https if !{ ssl_fc }
frontend appvol-https
  mode tcp
  bind 192.168.1.30:443
  default_backend appvol

backend appvol
  mode tcp
  option ssl-hello-chk
  balance source
  server avm1 192.168.1.31:443 weight 1 check inter 30s fastinter 2s downinter 5s rise 3 fall 3
  server avm2 192.168.1.32:443 weight 1 check inter 30s fastinter 2s downinter 5s rise 3 fall 3
######
  • Statistics & Admin configuration: This part creates 2 groups and 2 users. 1 admin user and 1 read-only user. Those will be used to view the statistics of HAProxy and to put backend servers into maintance. It defines the frontend where we bind to the LB VIP on port 8404(remember the iptables rule for port 8404?) and the backend where some statistics options are specified and authentication is taking care of. So I’ll be able to see statistics or enable/disable backends using http://192.168.1.250:8404/stats.
  • Horizon Connection servers: The first part is just a redirect from http to https. Next the frontend is configured using the VIP for Horizon. In the backend the Connection servers are specified and the load balancing algorithm.
    As I use tcp mode for Horizon in stead of http mode, I don’t have to put any certificates on my load balancer. I just use the certificates of the backend servers (which are wildcard certificates, so my load balancer host name “horizon.domain.com” works with my certificate of *.domain.com).
    The option ssl-hello-chk is needed to make sure that HAProxy not only checks if port 443 is open on the backend to set the backend as active, but to also check that there’s actually a valid SSL connection to the backend. If you don’t specify this, the backend will become active for HAProxy, while Horizon services might still be starting up and not be available yet for use.
    In normal conditions, every 30s the backends are checked (inter 30s). When a backend is down, the check is performed every 5s (downinter 5s), and when a check fails or succeeds after a previous failure the backends are checked every 2s (fastinter 2s).
    The balancing algorithm is set to “source” so that a client keeps going to the same connection server. If you don’t set this, you’ll get regular login prompts for the users because they’re being sent to different backend servers every time.
  • AppVolumes Managers: The AppVolumes section is similar to the Horizon section.

Once the configuration is done, I started HAProxy and checked the logs to see everything was fine:

systemd[1]: Starting HAProxy Load Balancer...
haproxy[3143]: [NOTICE] 040/100726 (3143) : New worker #1 (3145) forked
systemd[1]: Started HAProxy Load Balancer.
Keepalived_vrrp[3127]: Script `chk_haproxy` now returning 0
Keepalived_vrrp[3127]: VRRP_Script(chk_haproxy) succeeded
Keepalived_vrrp[3127]: (LB_VIP) Changing effective priority from 101 to 103

The HAProxy is starting without any errors and Keepalived has already noticed it and raised the priority of the MASTER peer to 103. So that’s all good. Let’s see if I can open up the statistics using http://192.168.1.250:8404/stats. Depending on the user I use to login here, I can see only statistics, or I have additional options to select a backend server to put it into maintenance or bring it back up. This is how the “admin” interface looks like:

I’ve setup a refresh timer of 30s in the configuration, so the page will reload automatically every 30s.

When you select a backend server, you can put it into maintenance mode, temporary disable checks, …

When I stop the HAProxy service on PhotonLB1, the VIP’s will move to PhotonLB2 and on the statistics page, you’ll see that the hostname on top of the page will be updated (Statistics Report for pid nnn on PhotonLBx). Also in the logs of PhotonLB2 you’ll see it becomes the new MASTER peer:

Keepalived_vrrp[10257]: (LB_VIP) Entering MASTER STATE
Keepalived_vrrp[10257]: (LB_VIP) setting VIPs.

To check if the VIP’s are actually on the node, you can use the “ip add” command:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:50:56:83:ad:96 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.252/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.250/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.20/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 192.168.1.30/32 scope global eth0
       valid_lft forever preferred_lft forever

Once I start the HAProxy service again on PhotonLB1, it becomes the MASTER peer again:

systemd[1]: Starting HAProxy Load Balancer...
haproxy[3650]: [NOTICE] 040/102326 (3650) : New worker #1 (3652) forked
systemd[1]: Started HAProxy Load Balancer.
Keepalived_vrrp[3127]: Script `chk_haproxy` now returning 0
Keepalived_vrrp[3127]: VRRP_Script(chk_haproxy) succeeded
Keepalived_vrrp[3127]: (LB_VIP) Changing effective priority from 101 to 103
Keepalived_vrrp[3127]: (LB_VIP) Entering MASTER STATE
Keepalived_vrrp[3127]: (LB_VIP) setting VIPs.

Horizon

Before we can use the load balancer for Horizon, we need to make some configuration changes on the connection servers.
In the file “C:\Program Files\VMware\VMware View\Server\sslgateway\conf\locked.properties” (if the file doesn’t exist, create a new empty text file) you need to specify the hostname used to connect to the load balancer using the option “balancedHost=”

balancedHost=horizon.domain.com

After changing this file, you’ll need to restart the VMware View Connection server to apply the new settings.

AppVolumes Manager

For accessing the AppVolumes manager, I can just enter https://appvol.domain.com to access one of the managers now. No need to adjust any settings here.

On the AppVolume agents, you can now change the AppVolumes manager keys in the registry to use the new load balancer hostname for AppVolumes. There are 2 places you need to change this in the registry:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\CloudVolumes\Agent]
"Manager_Address"="appvol.domain.com"

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\svservice\Parameters]
"Manager1"="appvol.domain.com:443"

Conclusion

So at the end, I have 2 minimal VM’s, running a hardened PhotonOS and with that I can make my Horizon Connection servers and AppVolumes managers load balanced and high available. It’s not the “next, next, finish”-style to set up, but once set up, the basic statistics/admin interface gives me enough options to monitor and control the backends.

I hope you’ve found this post useful. If you have any questions or comments about this, don’t hesitate leave them below.

Leave a Reply

Your email address will not be published. Required fields are marked *