개발 공부
(Failover Cluster) Heartbeat Health Check 본문
📦 Environment
Each host node is connected to a switch that separates two VLANs:
| VLAN | Purpose | Cluster Network Name |
| VLAN A | "Client and Cluster" | Used for client traffic, DNS/DC, Live Migration, VM access |
| VLAN B | "Cluster Only" | Used for S2D, CSV traffic, node-to-node heartbeat only |
Each host has 2+ NICs mapped to these VLANs:
- NIC1 = "Client and Cluster"
- NIC2 = "Cluster Only"
- (possibly NIC3 = "Client only" for DNS, management)
✅ Failover Cluster Behavior Based on Adapter/Network Loss
| Scenario | Cluster Status | Live Migration | Will VMs Fail Over? | Node Evicted? |
| "Client and Cluster" NIC down (VLAN A) | Cluster stays online via "Cluster Only" | ❌ No (Kerberos/DNS fails) | ❌ No (VMs stay, cluster thinks node is healthy) | ❌ No |
| "Cluster Only" NIC down (VLAN B) | Cluster stays online via "Client and Cluster" | ✅ Yes | ❌ No (No need to move) | ❌ No |
| Both NICs down (total loss) | Node is unreachable → Cluster detects failure | ❌ No (forced restart of VM on other node) | ✅ Yes (VMs restarted on other node) | ✅ Yes (node removed temporarily) |
| Client Only NIC down (if separate) | Cluster unaffected | ✅ Yes (as long as DC/DNS accessible) | ❌ No | ❌ No |
✅ Heartbeats in a Cluster
Windows Failover Clustering uses heartbeats to check the health of cluster nodes. These heartbeats are sent over all enabled cluster networks (based on network roles and availability).
In your setup, the roles are usually like this:
| Client and Cluster | VLAN A | Cluster and Client (3) | ✅ Yes | ✅ Possibly (if allowed) |
| Cluster Only | VLAN B | Cluster Only (1) | ✅ Yes | ✅ Preferred for storage heartbeat |
🔍 Heartbeat Path Logic
Which VLAN is used?
All cluster networks marked as Cluster or Cluster and Client are used for heartbeats.
If multiple networks are available between nodes, the cluster:
- Sends heartbeats over all cluster-enabled networks.
- Uses whichever network still works to maintain node visibility.
- Only declares a node "down" if all cluster heartbeats fail.
So:
- VLAN A (Client and Cluster) → Used
- VLAN B (Cluster Only) → Also used
✅ This gives redundancy: if VLAN A goes down, VLAN B can keep the node "alive".
What IP/MAC are checked?
- The heartbeat uses the IP address of each node on the respective VLAN.
- Failover Cluster picks the IPs from:
- Cluster-managed NICs/interfaces
- IPs assigned to each Cluster Network
You can see this by running:
Get-ClusterNetwork | Get-ClusterNetworkInterface
You’ll get output like:
Node Network Address State
---- ------- ------- -----
Node1 Client and Cluster 10.0.1.11 Up
Node2 Client and Cluster 10.0.1.12 Up
Node1 Cluster Only 172.16.0.11 Up
Node2 Cluster Only 172.16.0.12 Up
These are the IP endpoints used for cluster heartbeats. The MAC addresses are just part of the network layer used to reach these IPs — they aren’t directly monitored.
🔄 Heartbeat Mechanism: What Really Happens
🔹 Protocol
- Windows Failover Clustering sends UDP packets on port 3343 between all pairs of nodes.
- Every second, a small heartbeat packet is sent.
- If 5 consecutive heartbeats are missed, the node is considered down (default).
You can see this port is reserved in official docs:
UDP Port 3343 – used for cluster heartbeats between nodes. This is critical.
🔹 One-to-One Heartbeating
Each node communicates with each other node over each cluster-enabled network.
Example:
If you have 3 nodes (Node1, Node2, Node3), and two cluster networks ("Client and Cluster", "Cluster Only"), the cluster creates:
Node1 <-> Node2 via VLAN A
Node1 <-> Node2 via VLAN B
Node1 <-> Node3 via VLAN A
Node1 <-> Node3 via VLAN B
...
🔹 How IP is Determined:
- When you enable a NIC on a node, and it has IP (via DHCP or static), and it's on a subnet shared with other cluster nodes, the Failover Cluster detects it.
- If multiple interfaces exist, Cluster compares subnets and decides whether they can be used for cluster communication.
- Once it verifies that the subnet is shared by >1 node, it registers that IP as a cluster network interface.
You can validate the exact interfaces and IPs used by:
Get-ClusterNetworkInterface | ft Node, Network, Address, State
🧪 BONUS: How to See Port 3343 Heartbeats
You can actually see live UDP heartbeat traffic using netstat or better, PowerShell or Wireshark.
Example:
Get-NetUDPEndpoint | Where-Object { $_.LocalPort -eq 3343 }
# OR
netstat -ano | findstr ":3343"'windows' 카테고리의 다른 글
| (Network) LAG, LACP, LBFO, SET (0) | 2025.07.07 |
|---|---|
| (Failover Cluster) DNS의 역할 (0) | 2025.05.30 |
| (Failover Cluster) Domain Controller의 역할 (0) | 2025.05.30 |
| (Failover Cluster) Client and Cluster 네트워크 유실 시 VM 옮기는 스크립 (0) | 2025.05.30 |
| (Failover Cluster) Client and Cluster 네트워크 유실 시 VM 옮기는 방법 (0) | 2025.05.30 |