Article intro

Today, I will write an article about what I have been doing with Galera MySQL for testing fault tolerance and self-healing behavior of MySQL Galera Cluster.

Assumed you have MySQL Galera cluster installed, so I don't need to guide from begin!

Galera Cluster

Lab Information:

# Cluster 1 - segment 0
10.5.1.5 mysql-galera-db-1
10.5.3.7 mysql-galera-db-2
10.5.1.228 mysql-galera-db-3

# Cluster 2 - segment 1
10.5.0.80 mysql-galera-db-4
10.5.2.76 mysql-galera-db-5
10.5.1.219 mysql-galera-db-6

# Garbd
10.5.2.133 mysql-galera-db-7

Quick Intro: MySQL Galera Cluster

MySQL Galera Cluster is a multi-master, synchronous replication solution providing high availability and data consistency across nodes. It uses WSREP API and quorum-based voting to maintain cluster integrity. Key components:

Nodes: Active database instances (read/write)
Galera Arbitrator (garb): Lightweight arbitrator for quorum in even-node setups

Lab 1: Isolate Segment 1 → Write 1M Records to Segment 0 → Rejoin Segment 1

Test split-brain prevention, quorum, and IST/SST recovery.

Lab 2: Block Traffic Between Segment 0 & 1 (Garb Still Reachable)

Validate Garb’s role in quorum decision during partial network partition.

Lab 3: Simulate Unstable Network with TC (Packet Loss, Latency, Reorder)

Use Linux Traffic Control (tc) – built-in, powerful, zero-install – to inject

loss (e.g. 5%)
delay (jitter) (e.g. 200ms ± 50ms)

Quick Monitor

Tools/CLI will use during lab: sysbench, tc, iptables, systemctl, journalctl

root@mysql-galera-db-3:~# mysql -e "show global status where Variable_name rlike 'wsrep_(ready|connected|cluster_size|local_state_comment|local_recv_queue_avg|flow_control_paused|cluster_status)';"
+------------------------------+-----------+
| Variable_name                | Value     |
+------------------------------+-----------+
| wsrep_local_recv_queue_avg   | 0.0182108 |
| wsrep_flow_control_paused_ns | 0         |
| wsrep_flow_control_paused    | 0         |
| wsrep_local_state_comment    | Synced    |
| wsrep_cluster_size           | 7         |
| wsrep_connected              | ON        |
| wsrep_cluster_status         | Primary   |
| wsrep_ready                  | ON        |
+------------------------------+-----------+

Variable explain:

wsrep_local_recv_queue_avg: weight of receive. Should be lower than 1.0 (if its high, equal to lagging)
wsrep_flow_control_paused : Flow control - signal that node is slowing. Should be equal to 0 (0.0 - 1.0, if > 0.1 = problems!)
wsrep_local_state_comment, wsrep_cluster_status, wsrep_cluster_size : cluster status
wsrep_cluster_status: Should be "Primary"
wsrep_ready: Should be "ON"
wsrep_cluster_conf_id: This is interesting metric, for every node join/leave, quorum loss & recovery, network issues, this will be increased. If number increasing way too fast -> big problem!

Lab 1: Isolate Segment 1 → Write 1M Records to Segment 0 → Rejoin Segment

Status before graceful shutdown.

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 75                                   |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 2730e7d3-ae4e-11f0-a3f7-f290a6348920 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

After shutdown whole segment 1. (systemctl stop mysql on node 4,5,6)

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 4                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 78                                   |
| wsrep_cluster_size         | 4                                    |
| wsrep_cluster_state_uuid   | 2730e7d3-ae4e-11f0-a3f7-f290a6348920 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

Start process 1M record:

root@mysql-galera-db-1:~# sysbench oltp_read_write --mysql-db=test --mysql-user=root --mysql-password=123123 --table-size=1000000 prepare
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Creating table 'sbtest1'...
Inserting 1000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
root@mysql-galera-db-1:~# mysql -e "select count(*) from test.sbtest1"
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

Start all mysql in segment 1 which was shutdown and check the result for data sync.

root@mysql-galera-db-4:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 81                                   |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 2730e7d3-ae4e-11f0-a3f7-f290a6348920 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+
root@mysql-galera-db-4:~# mysql -e "select count(*) from test.sbtest1"
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

Scenario above was using graceful shutdown, how about ungraceful shutdown (with pkill -9 mysql)? Nodes in segment 1 doesn't send state transfer, Nodes in segment 0 have no idea if nodes in segment 1 died or network partition. Galera see 3 nodes gone -> suspicion of split-brain -> change to non-primary to avoid 2 segment write together.

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 2730e7d3-ae4e-11f0-a3f7-f290a6348920 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

Cluster only can recover when nodes in segment 1 start again, this is unrealistic, therefore we will go to next case.

Lab 2: Block Traffic Between Segment 0 & 1 (Garb Still Reachable)

# Do this in 3 node of segment 1
iptables -A INPUT -s 10.5.1.5 -j DROP
iptables -A INPUT -s 10.5.3.7 -j DROP
iptables -A INPUT -s 10.5.1.228 -j DROP
iptables -A OUTPUT -d 10.5.1.5 -j DROP
iptables -A OUTPUT -d 10.5.3.7 -j DROP
iptables -A OUTPUT -d 10.5.1.228 -j DROP


# Do this in 3 node of segment 0
iptables -A INPUT -s 10.5.0.80 -j DROP
iptables -A INPUT -s 10.5.2.76 -j DROP
iptables -A INPUT -s 10.5.1.219 -j DROP
iptables -A OUTPUT -d 10.5.0.80 -j DROP
iptables -A OUTPUT -d 10.5.2.76 -j DROP
iptables -A OUTPUT -d 10.5.1.219 -j DROP

root@mysql-galera-db-4:~# iptables -L -n -v
Chain INPUT (policy ACCEPT 632K packets, 289M bytes)
 pkts bytes target     prot opt in     out     source               destination
   28  2280 DROP       0    --  *      *       10.5.1.5             0.0.0.0/0
   29  1920 DROP       0    --  *      *       10.5.3.7             0.0.0.0/0
   33  5156 DROP       0    --  *      *       10.5.1.228           0.0.0.0/0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 551K packets, 659M bytes)
 pkts bytes target     prot opt in     out     source               destination
   33 11216 DROP       0    --  *      *       0.0.0.0/0            10.5.1.5
   53 35948 DROP       0    --  *      *       0.0.0.0/0            10.5.3.7
   33  8448 DROP       0    --  *      *       0.0.0.0/0            10.5.1.228

root@mysql-galera-db-5:~# iptables -L -n -v
Chain INPUT (policy ACCEPT 644K packets, 688M bytes)
 pkts bytes target     prot opt in     out     source               destination
   52  7560 DROP       0    --  *      *       10.5.1.5             0.0.0.0/0
   46  2940 DROP       0    --  *      *       10.5.3.7             0.0.0.0/0
   46  2940 DROP       0    --  *      *       10.5.1.228           0.0.0.0/0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 501K packets, 50M bytes)
 pkts bytes target     prot opt in     out     source               destination
   59 33164 DROP       0    --  *      *       0.0.0.0/0            10.5.1.5
   81 67768 DROP       0    --  *      *       0.0.0.0/0            10.5.3.7
   57 33208 DROP       0    --  *      *       0.0.0.0/0            10.5.1.228

root@mysql-galera-db-6:~# iptables -L -n -v
Chain INPUT (policy ACCEPT 696K packets, 691M bytes)
 pkts bytes target     prot opt in     out     source               destination
   48 10444 DROP       0    --  *      *       10.5.1.5             0.0.0.0/0
   46 12344 DROP       0    --  *      *       10.5.3.7             0.0.0.0/0
   46 11364 DROP       0    --  *      *       10.5.1.228           0.0.0.0/0

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 607K packets, 68M bytes)
 pkts bytes target     prot opt in     out     source               destination
  153  181K DROP       0    --  *      *       0.0.0.0/0            10.5.1.5
  176  213K DROP       0    --  *      *       0.0.0.0/0            10.5.3.7
  152  179K DROP       0    --  *      *       0.0.0.0/0            10.5.1.228

After that, check cluster status. We still see it has 7 nodes, which mean garb still see 2 segments and cluster_size still equal to 7.

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 7                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

root@mysql-galera-db-4:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 7                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

Check with ping, check port in random node

root@mysql-galera-db-5:~# nc -vz -w 1 mysql-galera-db-1 4567
nc: connect to mysql-galera-db-1 (10.5.1.5) port 4567 (tcp) timed out: Operation now in progress
root@mysql-galera-db-5:~# ping -w 1 mysql-galera-db-1
PING mysql-galera-db-1 (10.5.1.5) 56(84) bytes of data.

--- mysql-galera-db-1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
-----------------------------------------------------


root@mysql-galera-db-1:~# nc -vz -w 1 mysql-galera-db-5 4567
nc: connect to mysql-galera-db-5 (10.5.2.76) port 4567 (tcp) timed out: Operation now in progress
root@mysql-galera-db-1:~# ping -w 1 mysql-galera-db-5
PING mysql-galera-db-5 (10.5.2.76) 56(84) bytes of data.

--- mysql-galera-db-5 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Hmm, test with 1 more node to make sure we are going in right way

root@mysql-galera-db-2:~# nc -vz -w 1 mysql-galera-db-4 4567
nc: connect to mysql-galera-db-4 (10.5.0.80) port 4567 (tcp) timed out: Operation now in progress
root@mysql-galera-db-2:~# ping -w 1 mysql-galera-db-4
PING mysql-galera-db-4 (10.5.0.80) 56(84) bytes of data.

--- mysql-galera-db-4 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

Ok, Let's create new database for data synchronization

# Node 1 (segment 0)
mysql> create database test1;
Query OK, 1 row affected (0.01 sec)

mysql> create database test2;
Query OK, 1 row affected (0.01 sec)
------------------------------------

# Node 5 (segment 1)
root@mysql-galera-db-5:~# mysql -e "create database test3"
root@mysql-galera-db-5:~# mysql -e "create database test4"
root@mysql-galera-db-5:~# mysql -e "show databases"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test1              |
| test2              |
| test3              |
| test4              |
+--------------------+

Test write in node 1

root@mysql-galera-db-1:~# sysbench oltp_read_write --mysql-db=test2 --mysql-user=root --mysql-password=123123 --table-size=2000000 prepare
sysbench 1.0.20 (using system LuaJIT 2.1.0-beta3)

Creating table 'sbtest1'...
Inserting 2000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...

Check data in node 2

root@mysql-galera-db-4:~# mysql -e "select count(*) from test2.sbtest1;"
+----------+
| count(*) |
+----------+
|  2000000 |
+----------+

This part will surely have the question of why right? The answer is node garb, it took a little time to prove. First is to check traffic via tcpdump, this is when tcpdump is turned on when node 1 insert.

root@mysql-galera-db-7:~# tcpdump -i eth0 port 4567 -w galera.pcap
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C2147624 packets captured
2148191 packets received by filter
0 packets dropped by kernel
root@mysql-galera-db-7:~# ls -lh
total 2.6G
-rw-r--r-- 1 tcpdump tcpdump 2.6G Oct 24 09:34 galera.pcap

Quite heavy, open Wireshark then open the tcpdump file.

wireshark

Looking confused and not understanding much, I should probably do the wireshark course =_=. However, if it's difficult, ask, and here is the answer

Based on this tcpdump data, ~~I analyze as follows~~, I copy this from Claude AI xD

Key Observation

Node 10.5.2.133 is acting as a clear relay/coordinator.

Communication Pattern:

10.5.2.133 receives data from many nodes (10.5.2.76, 10.5.1.5, 10.5.1.228, 10.5.3.7, 10.5.0.80, 10.5.1.219)
Then broadcast/forward to ALL other nodes
All using port 4567 (destination port)

Evidence of relay behavior:

Frame 1236874-1236875: 10.5.2.133 sends data to 10.5.2.76 (80+108 bytes)
Frame 1236876: 10.5.2.133 sends data to 10.5.1.5 (108 bytes)
Frame 1236879: 10.5.2.133 sends data to 10.5.3.7 (108 bytes)
Frame 1236880: 10.5.2.133 sends data to 10.5.1.228 (108 bytes)
Frame 1236881: 10.5.2.133 sends data to 10.5.0.80 (108 bytes)
Frame 1236882: 10.5.2.133 sends data to 10.5.1.219 (108 bytes)

Data size pattern:

Majority: 108 bytes payload (may be standard message size)
Occasional: 80 bytes (may be ACK or heartbeat)
Base TCP overhead: 66 bytes (pure ACK)

Timing analysis:

All events occur in ~0.1 seconds (from 25.174043 → 25.174227)
Node 10.5.2.133 sends messages almost simultaneously to all nodes
Latency is very low, indicating this is a local network

Conclusion

10.5.2.133 is definitely a relay/hub node, not just participating in voting.

Well of course there must be evidence since I used explanation from AI. First is this article: http://blog.yannickjaquier.com/mariadb/galera-arbitrator-as-an-odd-node-in-galera-cluster.html (search keyword relay)

Reality on node garb

root@mysql-galera-db-7:/var/log# cat garbd.log |grep -i relay
2025-10-23 17:02:33.870  INFO: (6325ebf1-9e99, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:02:44.767  INFO: (6325ebf1-9e99, 'tcp://10.5.2.133:4567') turning message relay requesting on, nonlive peers: tcp://10.5.0.80:4567
2025-10-23 17:02:47.871  INFO: (6325ebf1-9e99, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:03:17.872  INFO: (6325ebf1-9e99, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:03:34.873  INFO: (6325ebf1-9e99, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:32:49.280  INFO: (9d377f5a-9e1b, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:33:50.779  INFO: (c1df7af4-ac85, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:34:44.237  INFO: (e1bc7bfa-8f5d, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:43:04.498  INFO: (0bea5770-a3e5, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:44:23.824  INFO: (0bea5770-a3e5, 'tcp://10.5.2.133:4567') turning message relay requesting on, nonlive peers: tcp://10.5.1.5:4567
2025-10-23 17:44:24.816  INFO: (0bea5770-a3e5, 'tcp://10.5.2.133:4567') turning message relay requesting off
2025-10-23 17:44:33.148  INFO: (0bea5770-a3e5, 'tcp://10.5.2.133:4567') turning message relay requesting off

In conclusion, in this case, the network block between segment 0 and 1 can still be synchronized thanks to garb as a relay. Try stopping garb and keeping the block between segment 0 and 1 as is, the result is as follows:

root@mysql-galera-db-7:~# systemctl stop garb

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 3                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

root@mysql-galera-db-6:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 3                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

What's next?

Turn on garb node again to test if traffic goes through garb node?

root@mysql-galera-db-7:~# systemctl start garb

root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

Even though there are 7 nodes and iptables has been flushed, the cluster still lost quorum. After trying to restart, I couldn't turn on any node, all reported errors.

2025-10-23T10:36:43.663708Z 0 [System] [MY-000000] [WSREP] P: (1cffc6a5-9efc, 'tcp://0.0.0.0:4567') turning message relay requesting off
2025-10-23T10:36:51.164207Z 0 [ERROR] [MY-000000] [WSREP] P: failed to open gcomm backend connection: 110: failed to reach primary view
     at ./gcomm/src/pc.cpp:connect():160
2025-10-23T10:36:51.164260Z 0 [ERROR] [MY-000000] [WSREP] P: ./gcs/src/gcs_core.cpp:gcs_core_open():256: Failed to open backend connection: -110 (Connection timed out)
2025-10-23T10:36:52.164756Z 0 [ERROR] [MY-000000] [WSREP] P: ./gcs/src/gcs.cpp:gcs_open():1701: Failed to open channel 'fpt_id' at 'gcomm://10.5.1.5,10.5.3.7,10.5.1.228': -110 (Connection timed out)
2025-10-23T10:36:52.164796Z 0 [ERROR] [MY-000000] [WSREP] P: gcs connect failed: Operation timed out
2025-10-23T10:36:52.164803Z 0 [ERROR] [MY-000000] [WSREP] wsrep::connect(gcomm://10.5.1.5,10.5.3.7,10.5.1.228) failed: 7
2025-10-23T10:36:52.164809Z 0 [ERROR] [MY-010119] [Server] Aborting

Perform recovery, for example with node1 being the node with the most complete data (highest seqno).

Edit this file /etc/mysql/conf.d/galera.conf then change comment this line: wsrep_cluster_address="gcomm://node1,node2,node3" Change to safe_to_bootstrap: 1 (I understand this is manually fixed).

root@mysql-galera-db-1:~# cat /var/lib/mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid:    0e38a58c-aff7-11f0-b1f0-6ee869c6a893
seqno:   1234
safe_to_bootstrap: 1

Then start it and check databases

root@mysql-galera-db-1:~# mysql -e "show databases"
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| test1              |
| test2              |
| test3              |
| test4              |
+--------------------+

Proceed to turn on other nodes + garb. After turning it on, remember to edit the config in the bootstrap node. If you don't restart it, it's easy to get into trouble, but you don't need to restart node 1. Restarting to confirm is ok.

root@mysql-galera-db-6:~# systemctl start mysql
sleep 10
mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 6                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

# Node 1 after edit mysql config and restart! 
root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 8                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

In conclusion, if the connection to garb is lost and the network between the two segments is blocked, recovery must be performed manually. What if only the network between the two segments is blocked and garb is still alive? We will have to check if the data traffic goes through the garb node.

So we will have a question like: How to know which node to enable to bootstrap in case of recovery?

Document here: https://mariadb.com/docs/galera-cluster/high-availability/recovering-a-primary-component#manual-bootstrap-using-grastate.dat
The grastate.dat file is the most important one to determine which node has the latest data and should bootstrap the cluster from that node

The grastate.dat file contains metadata about the node state:

# GALERA saved state
version: 2.1
uuid:    5ee99582-bb8d-11e3-b6c1-3c970e3bf0cb
seqno:   -1
safe_to_bootstrap: 1

seqno (sequence number):

Last transaction number this node committed
Node with highest seqno = latest data
If seqno: -1 = node crashed/shutdown not graceful

safe_to_bootstrap:

1 = safe to bootstrap
0 = should not bootstrap from this node
Last shutdown node usually has value 1

Correct recovery procedure

Add a link for credibility:

# 1. Check grastate.dat on ALL nodes
cat /var/lib/mysql/grastate.dat

# 2. Select the node with the highest seqno
# Example:
# Node A: seqno: 1850
# Node B: seqno: 1848
# Node C: seqno: -1
# => Select Node A

# 3. If all are seqno: -1, use the script check:
mysqld --wsrep-recover

# Output will show:
# Recovered position: 5ee99582-bb8d-11e3-b6c1-3c970e3bf0cb:1850

# 4. Bootstrap from the node with the latest data
# Method 1: Set safe_to_bootstrap = 1 in grastate.dat
vi /var/lib/mysql/grastate.dat
# Change safe_to_bootstrap: 0 to safe_to_bootstrap: 1

# Method 2: Force bootstrap
# Apply to mysql galera community, the guide mainly shows instructions for MariaDB
mysqld_bootstrap

# 5. Start the remaining nodes normally
systemctl start mysql

Lab 3: Simulate Unstable Network with TC (Packet Loss, Latency, Reorder)

Scenario: Simulate a slow network scenario by dropping packages, increasing network latency. We have many tools but the 2 most prominent tools in this are

tc (Traffic Control) Built-in Linux: available, powerful, no need to install additional
Comcast: This tool is found on github, 10k5 stars, seems to be stable. https://github.com/tylertreat/comcast

But to keep it simple, I will use tc, whichever is easier is preferred.

TLDR: the output is a bit long, the summary of results here is for lazy people

| Network Condition | Packet Loss | Latency | Jitter  | Result       |
|-------------------|-------------|---------|---------|--------------|
| **Good**          | ≤ 5%        | ~50ms   | ±10ms   | OK           |
| **Bad**           | ≥ 30%       | ~100ms  | ±50ms   | Cluster die  |
| **Common**        | ≤ 5%        | ~100ms  | ±50ms   | OK           |

Simulate writing to 2 segments continuously with the same DB, basically keycloak uses uuid-v4 as id so multi-write is possible without fear of conflict, so we also test the same case.

# Create DB in any node.
root@mysql-galera-db-1:~# mysql -e "CREATE DATABASE test_failover;"
root@mysql-galera-db-1:~# mysql -e "USE test_failover; CREATE TABLE users (id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR(50), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);"
# Check in node 4 segment 1
root@mysql-galera-db-4:~# mysql -e "show databases"|grep test_failover
test_failover

Basic tc commands:

### 1. Packet loss (flickering) ###
# Add 10% packet loss on interface eth0
tc qdisc add dev eth0 root netem loss 10%

# Packet loss random 10-30%
tc qdisc add dev eth0 root netem loss 10% 20%

# Delete rule
tc qdisc del dev eth0 root

### 2. Network delay (latency) ###
# Add 100ms delay
tc qdisc add dev eth0 root netem delay 100ms

# Delay 100ms ± 10ms (jitter)
tc qdisc add dev eth0 root netem delay 100ms 10ms

# Delay with distribution
tc qdisc add dev eth0 root netem delay 100ms 20ms distribution normal

### 3.Bandwidth limit ###
# Bandwidth limit 1Mbit
tc qdisc add dev eth0 root tbf rate 1mbit burst 32kbit latency 400ms

### 4. Combine multiple conditions (most realistic) ###
# Packet loss + delay + jitter (like real network)
tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms

### 5. After testing, must delete.
tc qdisc del dev eth0 root

We will use the combine command to test it to make it look like reality. Setup is recorded at node 1 (segment 0) and node 4 (segment 1).

Explanation: tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms

loss 5%: packets will be dropped 5%
delay 50ms: each packet will be delayed at least 50ms
jitter 10ms: latency will fluctuate, for example, actual latency will be between 40ms-60ms

Like this:

Packet 1: delay = 50ms + random(-10ms to +10ms) = 45ms
Packet 2: delay = 50ms + random(-10ms to +10ms) = 58ms  
Packet 3: delay = 50ms + random(-10ms to +10ms) = 42ms
Packet 4: delay = 50ms + random(-10ms to +10ms) = 60ms
Packet 5: DROPPED (5% packet loss)

while true; due 
    mysql -e "USE test_failover; INSERT INTO users (name) VALUES ('User $(date +%s)');" 
    sleep 1
done


### Count at node 2 and node 5 ###
root@mysql-galera-db-2:~# mysql -e "use test_failover; select count(*) from users;"
+----------+
| count(*) |
+----------+
| 116      |
+----------+

root@mysql-galera-db-5:~# mysql -e "use test_failover; select count(*) from users;"
+----------+
| count(*) |
+----------+
| 132      |
+----------+

### Now we will get the scenario where all nodes in segment 1 have network problems. (Loss 5%, dela
# Check at node 4,5,6 to see if it is available

root@mysql-galera-db-4:~# tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms
root@mysql-galera-db-4:~# tc qdisc show dev eth0
qdisc netem 8001: root refcnt 5 limit 1000 delay 50ms 10ms loss 5%

root@mysql-galera-db-5:~# tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms
root@mysql-galera-db-5:~# tc qdisc show dev eth0
qdisc netem 8001: root refcnt 5 limit 1000 delay 50ms 10ms loss 5%

root@mysql-galera-db-6:~# tc qdisc add dev eth0 root netem loss 5% delay 50ms 10ms
root@mysql-galera-db-6:~# tc qdisc show dev eth0
qdisc netem 8001: root refcnt 5 limit 1000 delay 50ms 10ms loss 5%

Demo ping: segment 0 → segment 1 with latency 100ms + jitter 50ms

root@mysql-galera-db-3:~# ping mysql-galera-db-4
PING mysql-galera-db-4 (10.5.0.80) 56(84) bytes of data.
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=1 ttl=64 time=151 ms
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=2 ttl=64 time=146 ms
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=3 ttl=64 time=117 ms
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=4 ttl=64 time=99.6 ms
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=5 ttl=64 time=110 ms
64 bytes from mysql-galera-db-4 (10.5.0.80): icmp_seq=6 ttl=64 time=91.5 ms

Results after a short time: with 5% dropped packages, no problem. At least that's what the count shows, this is the count after stopping recording at both ends

root@mysql-galera-db-2:~# mysql -e "use test_failover; select count(*) from users;"
+----------+
| count(*) |
+----------+
|     1156 |
+----------+
root@mysql-galera-db-5:~# mysql -e "use test_failover; select count(*) from users;"
+----------+
| count(*) |
+----------+
|     1156 |
+----------+

So with 5% packets dropped, latency is 50ms, Jitter 10ms is not a problem. Try adding 1 higher one and see?

Packet loss: 30%
Latency: 100ms
Jitter: 50ms

# Still updating on nodes in segment 1 (node 4,5,6)
root@mysql-galera-db-4:~# tc qdisc del dev eth0 root
root@mysql-galera-db-4:~# tc qdisc add dev eth0 root netem loss 30% delay 100ms 50ms
root@mysql-galera-db-4:~# tc qdisc show dev eth0
qdisc netem 8002: root refcnt 5 limit 1000 delay 100ms 50ms loss 30%
..............
# Hmmm, just about to write it, it turned into non-Primary
root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

root@mysql-galera-db-4:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

root@mysql-galera-db-5:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

root@mysql-galera-db-6:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 0                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | non-Primary                          |
+----------------------------+--------------------------------------+

# After deleting the rule and dropping 30% of the network, it was ok again
root@mysql-galera-db-1:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 9                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

root@mysql-galera-db-4:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 9                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

root@mysql-galera-db-5:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 9                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

root@mysql-galera-db-6:~# mysql -e "SHOW STATUS LIKE 'wsrep_cluster_%';"
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 7                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 9                                    |
| wsrep_cluster_size         | 7                                    |
| wsrep_cluster_state_uuid   | 0e38a58c-aff7-11f0-b1f0-6ee869c6a893 |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

In conclusion, with 2 tests of network drop 5%, latency 40-60ms, cluster works ok. For network with 30% packet drop, cluster cooks itself.

Extra node for test case 3, when drop 30% why they are not become split-brain?. It need consensus before talking about quorum. Here is the link to the discussion https://groups.google.com/g/codership-team/c/VWE3YE5dZgE/m/sgeJEU5EuJwJ

Well, before you even start thinking about *quorum*, you need to come to
a *consensus* about membership. And you know what *consensus* means.
*Everybody* must agree. Now suppose you have nodes A, B, C, D, E. And
for the sake of the argument you have such (50%) packet loss on E, so
that at a given round it sees only B and D. The overall picture is:

A sees A, B, C, D
B sees A, B, C, D, E
C sees A, B, C, D
D sees A, B, C, D, E
E sees B, D, E

So not only all 5 of them can't agree on the membership, even 4 "well
connected" nodes can't. And more than half of the nodes see E. Now it
still may seem quite simple to you, since you know that the packet loss
is on E. But how do the nodes know? Maybe the packet loss is actually on
A and C?

So by default Galera tries 3 times and gives up (instead of trying
indefinitely which would have an appearance of hanging).

Had the packet loss been lower, 1 of three attempts would have resulted
in all of A, B, C, D seeing E and reaching consensus on 5-node
membership. Had the packet loss been higher, 1 of the attempts would
have resulted in all of A, B, C, D not seeing E and reaching consensus
on 4-node membership.

Only then you can get to "quorum" calculation.

In galera documen mentioned this problem: https://galeracluster.com/library/galera-documentation.pdf

If no messages were received from the node for a period greater than the evs.inactive_timeout (page 282) period, the
node is declared failed regardless of the consensus. The failed node remains non-operational until all members agree
on its membership. If the members cannot reach consensus on the liveness of a node, the network is too unstable for
cluster operations.

Conclusion

Through these three comprehensive lab scenarios, we've explored the fault tolerance and self-healing capabilities of MySQL Galera Cluster under various network conditions. Here are the key takeaways:

Lab 1: Segment Isolation & Data Synchronization

Key Findings:

Graceful shutdown: Cluster handles segment isolation cleanly. When 3 nodes (segment 1) shutdown gracefully, the remaining 4 nodes (segment 0 + garbd) maintain quorum and continue operations. Upon rejoin, data synchronizes seamlessly using IST/SST.
Ungraceful shutdown (kill -9): Leads to split-brain suspicion. All segments enter non-Primary state to prevent conflicting writes. Recovery requires manual intervention via bootstrap procedure.
Recovery procedure: Critical to identify the node with highest seqno in grastate.dat for safe bootstrap. The mysqld --wsrep-recover command is essential when all nodes show seqno: -1.

Lab 2: Network Partition with Garbd Relay

Key Findings:

Garbd as relay: When segment 0 and segment 1 cannot communicate directly (via iptables blocks), Galera Arbitrator acts as a message relay, maintaining cluster integrity and enabling data synchronization between isolated segments.
Garbd failure impact: If garbd goes down during network partition, both segments lose quorum and enter non-Primary state, requiring manual recovery.
Network validation: Always verify connectivity with nc and ping when troubleshooting. Tcpdump analysis confirms garbd's relay behavior - receiving data from all nodes and broadcasting to others.

Lab 3: Network Instability Simulation

Key Findings:

Good network conditions (≤5% packet loss, ~50ms latency, ±10ms jitter): Cluster operates normally with successful data replication across segments.
Poor network conditions (≥30% packet loss, ~100ms latency, ±50ms jitter): Cluster fails to reach consensus on membership, causing all nodes to enter non-Primary state.
Consensus before quorum: The critical insight is that Galera requires unanimous consensus on cluster membership before calculating quorum. With high packet loss, nodes see different membership views, preventing consensus - even if majority would support quorum.

Best Practices Learned

Monitoring: Track wsrep_cluster_conf_id - rapid increases indicate cluster instability (nodes joining/leaving frequently).
Recovery preparation: Document which node has the latest data (grastate.dat with highest seqno) before maintenance.
Garbd placement: Deploy Galera Arbitrator in a network position that can reach all segments to enable relay functionality.
Network requirements: Maintain packet loss <5% and latency <100ms for stable operations. Higher packet loss leads to consensus failure.
Split-brain prevention: Galera's conservative approach (entering non-Primary on suspicion) prevents data conflicts but requires manual recovery.

Final Thoughts

MySQL Galera Cluster demonstrates robust synchronous replication with strong consistency guarantees. However, it's sensitive to network quality - requiring stable, low-latency connections. The arbitrator (garbd) is not just a "voting member" but can actively relay messages during network partitions, making its placement crucial for multi-segment deployments.

Understanding the difference between consensus (all nodes agree on membership) and quorum (majority calculation) is fundamental to troubleshooting Galera clusters. Always ensure your network infrastructure meets Galera's requirements: <5% packet loss, <100ms latency, and reliable connectivity between all nodes.

For production deployments, combine these insights with proper monitoring, documented recovery procedures, and regular disaster recovery drills to ensure your Galera cluster can handle real-world network challenges.

Lab: MySQL Galera Cluster Reliability

Article intro

Quick Intro: MySQL Galera Cluster

Lab 1: Isolate Segment 1 → Write 1M Records to Segment 0 → Rejoin Segment 1

Lab 2: Block Traffic Between Segment 0 & 1 (Garb Still Reachable)

Lab 3: Simulate Unstable Network with TC (Packet Loss, Latency, Reorder)

Quick Monitor

Lab 1: Isolate Segment 1 → Write 1M Records to Segment 0 → Rejoin Segment

Lab 2: Block Traffic Between Segment 0 & 1 (Garb Still Reachable)

Key Observation

Conclusion

What's next?

Correct recovery procedure

Lab 3: Simulate Unstable Network with TC (Packet Loss, Latency, Reorder)

Conclusion

Lab 1: Segment Isolation & Data Synchronization

Lab 2: Network Partition with Garbd Relay

Lab 3: Network Instability Simulation

Best Practices Learned

Final Thoughts

Published

Category

Tags

Contact