High Availability

Virtualizor now supports High Availability for KVM Virtualization.

NOTE: This feature is in Public Beta.

Requirements

  • CentOS 7.x
  • yum
  • Shared Storage to create the VPS disks.
  • Shared mount point for KVM XML configuration files at /etc/libvirt
  • At least four nodes to create HA cluster with Virtualizor KVM (to get reliable quorum) includes Virtualizor master.
  • Shared IPPool among the HA Server group, so that the same IP can work on the other server where VM will get migrated on failure.
  • Since Version 2.9.9+

Installation

  • Login to the Virtualizor Master with the servers root details
  • Click on Servers ->Add Server Groups and check the High Availability checkbox to enable High Availability for the Server Group.

NOTE: You MUST add the server group with High Availability enabled before adding Slave servers under High Availability cluster. Otherwise Virtualizor will not be able to add HA cluster and install HA utilities on new server which will be added under HA enabled server group.

Check if Server Group has HA enabled.
Click on Servers -> Server Groups/Regions

Add Server in HA enabled Server Group

Once the HA server group is added and enabled you are ready to add new servers in HA group/cluster.
To add new server under HA Server Group, you will need to select HA enabled server group while adding the new server.

Once you have entered all the information for adding the new server with HA enabled server group, click on Add Server.
You can check the installation process on task wizard.

Create VPS with HA Enabled

If the server has HA enabled, VM will be automatically create with HA enabled.
NOTE: Above option (High Availability) will be shown if the selected server is under HA enabled server group.

Monitor HA Cluster(s)

Once you have created/added Server with HA enabled, you can monitor the resource created on those HA cluster.

To check resource and node go to Admin Panel -> Virtual Servers -> High Availability

You can create select the HA enabled Group from the dropdown and it will fetch the status of that cluster.

Simulating HA

Perform a Failover with following steps :

# pcs status
 Cluster name: HA_Group_1
 Stack: corosync
 Current DC: ha2 (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum
 Last updated: Wed Mar 11 04:13:42 2020
 Last change: Fri Feb  7 02:09:16 2020 by root via crm_resource on ha3 
3 nodes configured
1 resource configured

Online: [ ha2 ha3 ha4 ]

Full list of resources:
 resource_v1001_4csljb16ihzegay         (ocf::heartbeat:VirtualDomain): Started ha2 

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled 

You can see that the status of the v1001 resource is Started on a particular node (in this example, ha2 ).
Shut down Pacemaker and Corosync on that machine to trigger a failover :

# pcs cluster stop  ha2

A cluster command such as pcs cluster stop nodename can be run from any node in the cluster, not just the affected node.

Verify that pacemaker and corosync are no longer running on ha2 server :
# pcs status
 Error: cluster is not currently running on this node 

Go to the other node, and check the cluster status :

# pcs status
 Cluster name: HA_Group_1
 Stack: corosync
 Current DC: ha2 (version 1.1.20-5.el7_7.1-3c4c782f70) - partition with quorum
 Last updated: Wed Mar 11 07:30:09 2020
 Last change: Fri Feb  7 02:09:16 2020 by root via crm_resource on ha3
 3 nodes configured
 1 resource configured

 Online: [ ha3 ha4 ]

 Full list of resources:

  resource_v1001_4csljb16ihzegay (ocf::heartbeat:VirtualDomain): Started ha3

Daemon Status:
   corosync: active/enabled
   pacemaker: active/enabled
   pcsd: active/enabled

Notice that v1001 is now running on ha3.
Failover happened automatically and no errors are reported.

You can even view it on Admin panel->Virtual Servers->High Availability

Troubleshooting HA

Check if pcsd service is running or not :

systemctl status pcsd.service

Use corosync-cfgtool to check whether cluster communication is active .

corosync-cfgtool -s

pcs status command should always show partition with quorum and also no stonith related errors should be shown to avoid any issues with working of high availability .

Was this helpful to you?