Symptoms

RAID malfunction may affect StarWind in one or another way. Below are the most common ones:

1. HA storage high response time. Results in:

VMs/databases might stop responding for a while or completely hang

A lot of ‘degraded performance’ notifications in StarWind and Windows Application logs

2. Looping sync. Results in:

VMs/databases might stop responding for a while or completely hang

A lot of ‘synchronization status changes’ notifications in StarWind and Windows Application logs

3. Locked HA. Results in:

VMs/databases failure

Corrupted data

Cause

Any Virtual SAN software may be sensitive to RAID issues. Basically operational delays on RAID array become significant when its state is changed to degraded, which results in whole environment slows down. Moreover, in case if delays reach 60 seconds point, then sync will be lost and full resync will start which cause additional workload, response time will go through the roof for sure and HA might be locked.

Workaround

The fastest way to retrieve good response time is to stop StarWind service on a node, where RAID changed its state to degraded.

If StarWind has HA images that reside not only on the malfunctioning RAID partition, but on healthy drives as well, then stopping StarWind service is not recommended. The best approach, in this case, is removing replica to the faulty node for these particular synchronous replication partners.

The service may be started again and replica may be recreated only in case if RAID issues are fixed (i.e. all faulty drives are replaced and RAID rebuild is finished).

Note that this simple workaround should be applied for sure if you reached the symptom 2 to prevent possible data corruption. The workaround may be skipped if you encountered only with the 1st symptom.

Request a Product Feature

To request a new product feature or to provide feedback on a StarWind product, please email to our support at support@starwind.com and put “Request a Product Feature” as the subject.

Back to blog