(Failover Cluster) Quorum

개발 공부

(Failover Cluster) Quorum 본문

windows

(Failover Cluster) Quorum

아이셩짱셩 2025. 5. 9. 18:43

728x90

🔷 What Is Quorum?

Quorum is the mechanism that prevents split-brain — where multiple parts of a cluster think they’re in charge and cause data corruption.

In simple terms:

👉 Quorum = Majority agreement among voting members (nodes + witness) about which part of the cluster should stay online.

If quorum is not reached, the cluster shuts down (partially or completely) to protect data.

🔹 File Share Witness (FSW) — What Is It?

File Share Witness is a concept and component of Windows Failover Clustering — not specific to Storage Spaces Direct (S2D) or Cluster Shared Volumes (CSV).

It is used to help maintain quorum, which is the decision-making mechanism in a cluster (i.e., to determine which parts stay online during a failure).

🔹 How FSW Detects Failures (Especially Network Partitions)

🔍 Cluster Node View of Witness

The Cluster Service on each node continuously checks connectivity to:
- Other cluster nodes
- The File Share Witness
If a node loses contact with others or with the FSW, it does not automatically assume failure.
The cluster performs quorum arbitration, and only the side with quorum stays online.

✅ If Only One Node Contacts the FSW, How Do Others See 7 Votes?

Because:

The Cluster Service replicates vote status between nodes.
If Node A (FSW Coordinator) successfully locks the FSW and confirms its vote:
- It informs the cluster: “Witness vote is active.”
- Other nodes add it to the quorum count.

So:

🟢 Nodes do not each contact the FSW — they rely on the cluster membership and internal sync.

🔹 Goal of Failover Clustering

In the event of network failure, the cluster must:

Detect the failure of a node.
Determine if the node is truly unreachable, or if it’s just a temporary glitch.
Decide whether to fail over workloads (e.g., VMs, file shares).
Ensure quorum is maintained (i.e., the cluster doesn’t split).

🔹 1. Detection of Node Failure (via Heartbeats)

✅ Process:

Cluster nodes send heartbeats to each other every second (by default).
If a node misses 5 consecutive heartbeats (~5 seconds by default), the other nodes suspect it has failed.

📡 Protocols Involved:

Type	Protocol	UsedPort
Heartbeat	UDP + RPC	3343 (UDP), 135 (RPC)
Cluster Comm	TCP/IP + SMB	Dynamic ports, SMB (445)
SMB	File witness, cluster shared access	445
ICMP (Ping)	Used optionally for basic checks	ICMP

Heartbeats use a combination of UDP multicast/unicast and RPC.

🔹 2. Validation and Voting

After a missed heartbeat:

The cluster attempts additional checks (e.g., RPC, SMB connections).
If the node fails all checks, it's marked down.
The cluster runs quorum arbitration to determine if enough nodes (or witness) are available to keep running.

🔹 3. Quorum Check & Arbitration

The Cluster Service checks:

How many votes (nodes + witness) are online.
Whether the current node(s) are part of the majority.

🧠 Scenarios:

Scenario	Result
Majority of votes present (including witness)?	Cluster stays online
Less than majority?	Cluster pauses (goes offline) to avoid split-brain

🔹 4. Failover of Workloads

If a node is confirmed as failed:

The Cluster Resource Host Subsystem (RHS) moves clustered roles (like VMs, SQL, etc.) to a surviving node.
Any Cluster Shared Volumes (CSV) are brought online on another node.

728x90

저작자표시 (새창열림)

'windows' 카테고리의 다른 글

(Failover Cluster) failover cluster가 어떻게 failure를 인지하는지 2 (1)	2025.05.09
(Failover cluster) Data I/O (0)	2025.05.09
(S2D) Deploy Storage Spaces Direct (0)	2025.05.09
(Failover Cluster) Witness in Clustering (0)	2025.05.09
(S2D) 헷갈리는 개념들 (SAN, NAS, S2D, LUN, shared volumem, Cluster Shared Volume) (0)	2025.05.09

'windows' Related Articles

Comments

개발 공부

(Failover Cluster) Quorum 본문

(Failover Cluster) Quorum

🔷 What Is Quorum?

👉 Quorum = Majority agreement among voting members (nodes + witness) about which part of the cluster should stay online.

🔹 File Share Witness (FSW) — What Is It?

🔹 How FSW Detects Failures (Especially Network Partitions)

🔍 Cluster Node View of Witness

✅ If Only One Node Contacts the FSW, How Do Others See 7 Votes?

🔹 Goal of Failover Clustering

🔹 1. Detection of Node Failure (via Heartbeats)

✅ Process:

📡 Protocols Involved:

🔹 2. Validation and Voting

🔹 3. Quorum Check & Arbitration

🧠 Scenarios:

🔹 4. Failover of Workloads

'windows' 카테고리의 다른 글

티스토리툴바