windows

(Failover Cluster) Client and Cluster 네트워크 유실 시 VM 옮기는 방법

아이셩짱셩 2025. 5. 30. 11:30
728x90

🧠 Cluster Behavior Recap

  • WSFC checks node health using heartbeats sent over networks marked for cluster use.
  • If a node can still:
    • Talk to the cluster (via Cluster Only network),
    • Access the CSV (Cluster Shared Volumes),
    • And the VM is healthy,

👉 Then the cluster considers the node healthy, even if it’s isolated from clients.


🔧 So How Can You Change That?

Create a Custom Health Detection Script (Advanced)

  • You can write a cluster resource script or a VM health script that checks:
    • If the client network is available.
    • If not, fail the VM resource — causing failover.

This is how monitoring agents like SCOM, Veeam, or 3rd-party management tools often handle extended health checks.

🔗 Want help creating a PowerShell-based custom script for that? I can provide a sample.

 

✅ Recommended Approach (If You Really Want Failover on Client Network Loss)

  • Create a VM Health Check script that:
    1. Pings or tests reachability to external IPs or gateways.
    2. If unreachable for X seconds, fails the VM resource using:
      Stop-ClusterGroup -Name "YourVMGroupName" -MoveToBestNode

❓ Will Stop-ClusterGroup -Name "YourVMGroupName" -MoveToBestNode perform Live Migration?

🧨 No, it will not perform Live Migration.

Instead, it will trigger a failover, which causes:

  • The VM to stop on the current node.
  • The cluster to bring it up on another node.

⚠️ The VM will be restarted on the new node — downtime occurs.


✅ What to Use for Live Migration

If you want Live Migration (zero downtime), use:

Move-ClusterVirtualMachineRole -Name "YourVMName" -Node "TargetNode"
 

Or simply:

Move-ClusterGroup -Name "YourVMName" -Node "TargetNode"

 

  • These commands perform Live Migration if:
    • The VM is running.
    • The cluster is healthy.
    • Live Migration network is available.
    • No blockers exist (e.g. loss of storage, VM paused, etc.)

🔍 Domain Controller's Role in Failover Clustering

FunctionRole of Domain Controller
Cluster Creation Required — nodes must authenticate with the domain to form/join the cluster.
Cluster Startup Required — during initial boot, cluster service authenticates via the domain.
Quorum Witness Access (File Share) Must resolve and authenticate to access file share quorum.
Cluster Name Object (CNO) Stored in Active Directory. Requires DC to register and update.
Live Migration Uses Kerberos authentication → requires DC & DNS.
DNS Resolution All cluster nodes resolve cluster name, witness, and partners through DNS.
Hyper-V Authorization Uses AD credentials for remote management, permissions, and constrained delegation.
 

💡 Without a reachable DC, some features will hang or fail (especially Live Migration, witness arbitration, or remote cluster management).

❌ Move-ClusterVirtualMachineRole Will Likely Fail in This Case

Why?

Live Migration requires:

RequirementWhy It Fails Without "Client and Cluster"
Kerberos Auth Needs to talk to the DC → broken without DNS/AD.
DNS Resolution Migration needs to resolve target node FQDN → fails without DNS.
Cluster Coordination RPC/SMB auth between nodes → fails without AD trust.
Access to Cluster Name Object (CNO) Used for coordination, and can't be resolved.
 

So even though the "Cluster Only" network is still active for heartbeats and CSVs, the node becomes partially isolated (sometimes called “AD dark”).

728x90