Search
The Best Hyperconverged Infrastructure (HCI) for Enterprise ROBO, SMB & Edge

HeartBeat: Explanation

Description

This article provides a detailed explanation of the Heartbeat mechanism in StarWind VSAN, designed to prevent data corruption in the event of a synchronization channel failure. The mechanism allows the service to check node availability via an alternate heartbeat network interface and, if necessary, trigger the failover process to preserve data integrity.

Explanation

What is “split-brain”?

When cluster nodes with highly-available (HA) StarWind devices are unable to synchronize data with each other but continue to accept write commands from initiators independently, a “split-brain” scenario can occur. In this case, the StarWind service on each node may assume that the partner nodes are offline and continues operating in single-node mode, using the data written to it. In such a scenario, the data appears to be “split” between the nodes but is never replicated. In this case, any further replication between the nodes may cause data blocks to be overwritten by the partner node, leading to data corruption.

Heartbeat helps!

Heartbeat technology is used to prevent, the “split-brain” scenario by using an alternate network interface, which can be used by StarWind services to communicate with each other via this link and trigger the failover process. This alternate interface is called the Heartbeat interface. The failover process in this case is called – the Heartbeat Failover Policy.

StarWind Virtual SAN allows to configure several synchronization and several heartbeat links for each HA device. The HA device created with the Heartbeat Failover Policy has different priority numbers on the nodes, where it is replicated (1st, 2nd, 3rd, etc.) The 1st priority is considered the highest.

If all the synchronization links appear to be disconnected, but at least one heartbeat link remains active, StarWind services on the nodes can communicate via this link to resolve the issue with the synchronization loss: the HA device with the lowest priority will be marked as not synchronized and get subsequently blocked for the further read and write operations until the synchronization channel resumption. At the same time, the partner device on the remaining synchronized node flushes data from the cache to the disk to preserve data integrity in case the node goes down unexpectedly.

When “split-brain” still could happen?

With the Heartbeat Failover Policy, the “split-brain” still can occur when all synchronization and heartbeat channels disconnect simultaneously (if they were configured on the same physical network adapter), and the partner nodes do not respond to the node’s requests.
Also, “split-brain” can occur if all the heartbeat links disconnect first and the synchronization links disconnect later.

That is why it is recommended to assign multiple independent heartbeat channels during replica creation to improve system stability and avoid the “split-brain” issue. The several heartbeat links should be located either on different physical network adapters or the synchronization link(s) and the heartbeat link(s) should be located on separate physical network adapters.

Conclusion

The Heartbeat mechanism ensures that StarWind VSAN-based clusters can maintain data integrity and continue operations even if one or more synchronization channels fail. By using heartbeat links, the system avoids split-brain situations and allows a single node to continue serving I/O safely until at least one synchronization channel is restored. Proper configuration of multiple heartbeat channels significantly improves cluster stability and reliability.

 

Hey! Looking to deploy a new, easy-to-manage, and cost-effective hyperconverged infrastructure?
Alex Bykovskyi
Alex BykovskyiStarWind Virtual HCI Appliance Product Manager
Well, we can help you with this one! Building a new hyperconverged environment is a breeze with StarWind Virtual HCI Appliance (VHCA). It’s a complete hyperconverged infrastructure solution that combines hypervisor (vSphere, Hyper-V, Proxmox, or our custom version of KVM), software-defined storage (StarWind VSAN), and streamlined management tools. Interested in diving deeper into VHCA’s capabilities and features? Book your StarWind Virtual HCI Appliance demo today!