개발 공부

(Failover Cluster) Heartbeat Health Check 본문

windows

(Failover Cluster) Heartbeat Health Check

아이셩짱셩 2025. 5. 30. 13:58
728x90

📦 Environment

Each host node is connected to a switch that separates two VLANs:

VLAN Purpose Cluster Network Name
VLAN A "Client and Cluster" Used for client traffic, DNS/DC, Live Migration, VM access
VLAN B "Cluster Only" Used for S2D, CSV traffic, node-to-node heartbeat only
 

Each host has 2+ NICs mapped to these VLANs:

  • NIC1 = "Client and Cluster"
  • NIC2 = "Cluster Only"
  • (possibly NIC3 = "Client only" for DNS, management)

✅ Failover Cluster Behavior Based on Adapter/Network Loss

Scenario Cluster Status Live Migration Will VMs Fail Over? Node Evicted?
"Client and Cluster" NIC down (VLAN A) Cluster stays online via "Cluster Only" ❌ No (Kerberos/DNS fails) ❌ No (VMs stay, cluster thinks node is healthy) ❌ No
"Cluster Only" NIC down (VLAN B) Cluster stays online via "Client and Cluster" ✅ Yes ❌ No (No need to move) ❌ No
Both NICs down (total loss) Node is unreachable → Cluster detects failure ❌ No (forced restart of VM on other node) ✅ Yes (VMs restarted on other node) ✅ Yes (node removed temporarily)
Client Only NIC down (if separate) Cluster unaffected ✅ Yes (as long as DC/DNS accessible) ❌ No ❌ No

✅ Heartbeats in a Cluster

Windows Failover Clustering uses heartbeats to check the health of cluster nodes. These heartbeats are sent over all enabled cluster networks (based on network roles and availability).

In your setup, the roles are usually like this:

Cluster Network NameVLANRoleUse for Heartbeat?Used for Storage (S2D)?
Client and Cluster VLAN A Cluster and Client (3) ✅ Yes ✅ Possibly (if allowed)
Cluster Only VLAN B Cluster Only (1) ✅ Yes ✅ Preferred for storage heartbeat

🔍 Heartbeat Path Logic

Which VLAN is used?

All cluster networks marked as Cluster or Cluster and Client are used for heartbeats.

If multiple networks are available between nodes, the cluster:

  • Sends heartbeats over all cluster-enabled networks.
  • Uses whichever network still works to maintain node visibility.
  • Only declares a node "down" if all cluster heartbeats fail.

So:

  • VLAN A (Client and Cluster) → Used
  • VLAN B (Cluster Only) → Also used
    ✅ This gives redundancy: if VLAN A goes down, VLAN B can keep the node "alive".

What IP/MAC are checked?

  • The heartbeat uses the IP address of each node on the respective VLAN.
  • Failover Cluster picks the IPs from:
    • Cluster-managed NICs/interfaces
    • IPs assigned to each Cluster Network

You can see this by running:

Get-ClusterNetwork | Get-ClusterNetworkInterface

You’ll get output like:

Node       Network             Address         State
----       -------             -------         -----
Node1      Client and Cluster  10.0.1.11       Up
Node2      Client and Cluster  10.0.1.12       Up
Node1      Cluster Only        172.16.0.11     Up
Node2      Cluster Only        172.16.0.12     Up

These are the IP endpoints used for cluster heartbeats. The MAC addresses are just part of the network layer used to reach these IPs — they aren’t directly monitored.


🔄 Heartbeat Mechanism: What Really Happens

🔹 Protocol

  • Windows Failover Clustering sends UDP packets on port 3343 between all pairs of nodes.
  • Every second, a small heartbeat packet is sent.
  • If 5 consecutive heartbeats are missed, the node is considered down (default).

You can see this port is reserved in official docs:

UDP Port 3343 – used for cluster heartbeats between nodes. This is critical.


🔹 One-to-One Heartbeating

Each node communicates with each other node over each cluster-enabled network.

Example:
If you have 3 nodes (Node1, Node2, Node3), and two cluster networks ("Client and Cluster", "Cluster Only"), the cluster creates:

Node1 <-> Node2 via VLAN A
Node1 <-> Node2 via VLAN B
Node1 <-> Node3 via VLAN A
Node1 <-> Node3 via VLAN B
...

🔹 How IP is Determined:

  • When you enable a NIC on a node, and it has IP (via DHCP or static), and it's on a subnet shared with other cluster nodes, the Failover Cluster detects it.
  • If multiple interfaces exist, Cluster compares subnets and decides whether they can be used for cluster communication.
  • Once it verifies that the subnet is shared by >1 node, it registers that IP as a cluster network interface.

You can validate the exact interfaces and IPs used by:

Get-ClusterNetworkInterface | ft Node, Network, Address, State

🧪 BONUS: How to See Port 3343 Heartbeats

You can actually see live UDP heartbeat traffic using netstat or better, PowerShell or Wireshark.

Example:

Get-NetUDPEndpoint | Where-Object { $_.LocalPort -eq 3343 }
# OR
netstat -ano | findstr ":3343"
728x90
Comments