(Failover Cluster) Client and Cluster 네트워크 유실 시 VM 옮기는 방법
🧠 Cluster Behavior Recap
- WSFC checks node health using heartbeats sent over networks marked for cluster use.
- If a node can still:
- Talk to the cluster (via Cluster Only network),
- Access the CSV (Cluster Shared Volumes),
- And the VM is healthy,
👉 Then the cluster considers the node healthy, even if it’s isolated from clients.
🔧 So How Can You Change That?
Create a Custom Health Detection Script (Advanced)
- You can write a cluster resource script or a VM health script that checks:
- If the client network is available.
- If not, fail the VM resource — causing failover.
This is how monitoring agents like SCOM, Veeam, or 3rd-party management tools often handle extended health checks.
🔗 Want help creating a PowerShell-based custom script for that? I can provide a sample.
✅ Recommended Approach (If You Really Want Failover on Client Network Loss)
- Create a VM Health Check script that:
- Pings or tests reachability to external IPs or gateways.
- If unreachable for X seconds, fails the VM resource using:
Stop-ClusterGroup -Name "YourVMGroupName" -MoveToBestNode
❓ Will Stop-ClusterGroup -Name "YourVMGroupName" -MoveToBestNode perform Live Migration?
🧨 No, it will not perform Live Migration.
Instead, it will trigger a failover, which causes:
- The VM to stop on the current node.
- The cluster to bring it up on another node.
⚠️ The VM will be restarted on the new node — downtime occurs.
✅ What to Use for Live Migration
If you want Live Migration (zero downtime), use:
Move-ClusterVirtualMachineRole -Name "YourVMName" -Node "TargetNode"
Or simply:
Move-ClusterGroup -Name "YourVMName" -Node "TargetNode"
- These commands perform Live Migration if:
- The VM is running.
- The cluster is healthy.
- Live Migration network is available.
- No blockers exist (e.g. loss of storage, VM paused, etc.)
🔍 Domain Controller's Role in Failover Clustering
Cluster Creation | Required — nodes must authenticate with the domain to form/join the cluster. |
Cluster Startup | Required — during initial boot, cluster service authenticates via the domain. |
Quorum Witness Access (File Share) | Must resolve and authenticate to access file share quorum. |
Cluster Name Object (CNO) | Stored in Active Directory. Requires DC to register and update. |
Live Migration | Uses Kerberos authentication → requires DC & DNS. |
DNS Resolution | All cluster nodes resolve cluster name, witness, and partners through DNS. |
Hyper-V Authorization | Uses AD credentials for remote management, permissions, and constrained delegation. |
💡 Without a reachable DC, some features will hang or fail (especially Live Migration, witness arbitration, or remote cluster management).
❌ Move-ClusterVirtualMachineRole Will Likely Fail in This Case
Why?
Live Migration requires:
Kerberos Auth | Needs to talk to the DC → broken without DNS/AD. |
DNS Resolution | Migration needs to resolve target node FQDN → fails without DNS. |
Cluster Coordination | RPC/SMB auth between nodes → fails without AD trust. |
Access to Cluster Name Object (CNO) | Used for coordination, and can't be resolved. |
So even though the "Cluster Only" network is still active for heartbeats and CSVs, the node becomes partially isolated (sometimes called “AD dark”).