adllm Insights logo adllm Insights logo

Configuring MariaDB Galera Cluster for Synchronous Replication Across WAN Links with High Latency

Published on by The adllm Team. Last modified: . Tags: MariaDB Galera Cluster WAN High Latency Synchronous Replication Database Geo-Replication High Availability Database Tuning

MariaDB Galera Cluster offers a robust solution for achieving synchronous multi-master replication, ensuring high availability and data consistency across database nodes. However, deploying Galera Cluster across Wide Area Networks (WANs) with inherent high latency presents unique challenges. While synchronous replication guarantees that a transaction is committed on all nodes before returning success, network delays can significantly impact commit times and overall application performance.

This article provides a comprehensive guide for experienced engineers on configuring and tuning MariaDB Galera Cluster for optimal performance and stability in high-latency WAN environments. We will delve into critical configuration parameters, best practices, and diagnostic techniques to help you build resilient geo-distributed database architectures.

Understanding the Impact of WAN Latency on Galera

The cornerstone of Galera’s consistency model is its synchronous nature. When a transaction is committed on one node, it’s broadcast to all other nodes in the cluster. The commit operation only completes once all nodes (or participating segments) acknowledge the write-set. Consequently, the network Round Trip Time (RTT) between the most distant nodes directly contributes to the commit latency.

Several key Galera concepts are particularly relevant in WAN scenarios:

  • Flow Control: Prevents nodes from becoming overwhelmed by replication traffic. If a node’s receive queue (wsrep_local_recv_queue) grows too large, it signals other nodes to slow down or pause sending write-sets. Over WAN, this mechanism needs careful tuning to avoid premature or excessive throttling.
  • Quorum: Ensures cluster integrity by requiring a majority of nodes (or weighted nodes) to be operational to process writes, preventing split-brain scenarios. WAN link instability can affect quorum calculations.
  • State Snapshot Transfer (SST): A full data copy used when a node joins or rejoins the cluster and cannot catch up via Incremental State Transfer (IST). SSTs over WAN can be extremely time-consuming and resource-intensive. More details on SST can be found in the MariaDB documentation on State Snapshot Transfers.
  • Incremental State Transfer (IST): Allows a node to receive only missing write-sets from a donor’s gcache (Galera cache). A sufficiently sized gcache is vital to favor IST over SST, especially in WAN setups where node downtimes or network partitions might be longer.

Pre-Configuration Essentials: Measuring and Planning

Before deploying or tuning a WAN Galera cluster, thorough planning and measurement are crucial.

Measure Inter-Node RTT

Accurately measure the RTT between all nodes that will participate in the cluster, especially those in different data centers. This is the single most important metric for tuning various timeout parameters.

Use the ping utility for basic RTT measurement. For instance, to measure RTT from a node in DC1 to a node in DC2:

1
2
# On a node in DC1, pinging a node (e.g., node2.dc2.example.com) in DC2
ping -c 10 node2.dc2.example.com

Repeat this for all pairs of nodes across different locations. Document the average and maximum RTT values.

Network Bandwidth and Quality

While latency primarily affects commit times, sufficient and stable bandwidth is necessary for replicating write-sets. Packet loss and jitter on WAN links can also severely degrade performance and stability. Ensure your WAN links meet the anticipated replication load.

Cluster Topology and Quorum

For WAN deployments, an odd number of segments (data centers) is generally recommended for simpler quorum management. If you have an even number of data centers (e.g., two), deploying a Galera Arbitrator (garbd) in a third, independent location is essential for maintaining quorum during a link failure between the two main sites.

Core Galera Configuration for WAN Deployments

Galera’s behavior is primarily controlled via the wsrep_provider_options string in your MariaDB configuration file (e.g., my.cnf or a file in my.cnf.d/). You can find a comprehensive list of these options in the MariaDB Knowledge Base on Galera Cluster System Variables and the Codership Galera Parameters documentation.

Segments (gmcast.segment): The Cornerstone of WAN Optimization

In Galera Cluster versions 3.x and later (MariaDB 10.1+), segments are critical for WAN deployments. A segment typically corresponds to a single data center or network location. Nodes within the same segment communicate freely. For inter-segment communication, write-sets are relayed only once between designated nodes in each segment, drastically reducing cross-WAN traffic compared to an all-to-all mesh.

Each segment should have a unique, non-zero integer ID. Nodes that should not participate in relaying (e.g., arbitrators) can be assigned to segment 0.

Your wsrep_provider_options string in my.cnf for nodes in different data centers would include specific gmcast.segment values:

1
2
3
4
5
6
7
# In my.cnf for a node in Data Center 1
[galera]
wsrep_provider_options="gmcast.segment=1;..." # Other options follow

# In my.cnf for a node in Data Center 2
[galera]
wsrep_provider_options="gmcast.segment=2;..." # Other options follow

Ensure all nodes list at least one contact point from each active segment in wsrep_cluster_address for robust discovery.

A more illustrative wsrep_provider_options string for a node in Segment 1 might contain parameters like these (conceptually this is one long string in my.cnf):

1
2
3
4
5
6
7
8
# Example parameters within wsrep_provider_options for a node in DC1
# "gmcast.listen_addr=tcp://0.0.0.0:4567;gmcast.segment=1; \
#  evs.send_window=512;evs.user_send_window=512; \
#  evs.inactive_check_period=PT15S;evs.suspect_timeout=PT30S; \
#  evs.inactive_timeout=PT1M;evs.consensus_timeout=PT1M20S; \
#  gcache.size=2G"
# Actual my.cnf entry is one line:
# wsrep_provider_options="gmcast.listen_addr=tcp://0.0.0.0:4567;gmcast.segment=1;evs.send_window=512;..."

The above comment block illustrates how multiple parameters form the single wsrep_provider_options string.

Tuning Network Timeouts (evs.* parameters)

Extended Virtual Synchrony (EVS) parameters manage group communication. Defaults are often too aggressive for high-latency WAN links. Key timeout parameters to adjust within wsrep_provider_options:

  • evs.inactive_check_period: How often to check for inactive connections. Default PT1S. Increase, e.g., PT15S or PT30S.
  • evs.suspect_timeout: Time after which a non-responsive node is suspected. Default PT5S. Increase based on max RTT, e.g., PT30S.
  • evs.inactive_timeout: Time after which a suspected node is declared inactive and dropped. Default PT15S. Must be greater than suspect_timeout, e.g., PT1M.
  • evs.consensus_timeout: Timeout for reaching consensus in membership changes. Default PT30S. Increase for WAN, e.g., PT1M or PT1M20S.
  • evs.install_timeout: Timeout for installing a new view. Default PT1M. Can be increased if view installations are slow, e.g., PT1M30S.

Example settings for these timeouts within the wsrep_provider_options string:

1
2
3
# Part of the wsrep_provider_options string
# ...;evs.inactive_check_period=PT15S;evs.suspect_timeout=PT30S; \
# evs.inactive_timeout=PT1M;evs.consensus_timeout=PT1M20S;...

Always ensure timeouts are set significantly higher than the maximum observed RTT between any two nodes.

Optimizing Send Windows (evs.send_window, evs.user_send_window)

These control the maximum number of data packets in flight. For high-latency links, larger send windows can improve throughput.

  • evs.send_window: For replication traffic. Default is often 4 or 10. For WANs, consider 256, 512, or 1024.
  • evs.user_send_window: For other EVS messages. Can also be increased.

Include these in wsrep_provider_options:

1
2
# Part of the wsrep_provider_options string
# ...;evs.send_window=512;evs.user_send_window=256;...

Managing GCache for IST (gcache.size)

The Galera cache (gcache) stores write-sets for IST. A larger gcache.size is crucial for WANs. Set gcache.size in wsrep_provider_options:

1
2
# Part of the wsrep_provider_options string
# ...;gcache.size=4G;... # Example: 4 Gigabytes

Consider gcache.recover = yes to attempt gcache recovery on startup.

Adjusting Flow Control (gcs.fc_limit, gcs.fc_factor, gcs.fc_master_slave)

Flow control settings might need to be less aggressive for WANs.

  • gcs.fc_limit: Max write-sets in receive queue before flow control. Default often 16. Increase for WAN (e.g., 100-200+).
  • gcs.fc_factor: Fraction of fc_limit to resume replication. Default 0.8. 1.0 can be less aggressive.
  • gcs.fc_master_slave: If TRUE, one node triggers flow control for the cluster.

Example adjustments in wsrep_provider_options:

1
2
# Part of the wsrep_provider_options string
# ...;gcs.fc_limit=200;gcs.fc_factor=1.0;...

Securing Replication Traffic (socket.ssl_*)

Replication traffic over WAN should be encrypted. Galera supports SSL/TLS. Refer to Encrypting Galera Cluster Traffic in MariaDB documentation or Codership’s guide on traffic encryption.

Include SSL options in wsrep_provider_options:

1
2
3
4
5
# Part of the wsrep_provider_options string
# ...;socket.ssl_key=/path/to/server-key.pem; \
# socket.ssl_cert=/path/to/server-cert.pem; \
# socket.ssl_ca=/path/to/ca-cert.pem; \
# socket.ssl_cipher=AES128-SHA;...

Ensure certificates and keys are properly generated and distributed.

The Galera Arbitrator (garbd) in WAN Scenarios

garbd is a lightweight daemon acting as a voting member, useful for quorum in even-numbered DC setups. See the MariaDB Galera Arbitrator documentation for details.

A basic garbd startup command:

1
2
3
4
5
# Example command to start garbd
garbd --address "gcomm://node1_dc1_ip:4567,node1_dc2_ip:4567" \
      --group "my_galera_cluster_name" \
      --options "gmcast.segment=0" \
      --log "/var/log/garbd.log"

gmcast.segment=0 makes garbd a non-relaying voting member.

SST/IST Considerations over WAN

  • SST Method: mariabackup (see MariaDB documentation for mariabackup) is generally preferred over rsync for WAN SSTs due to efficiency.
  • wsrep_sst_receive_address: Crucial if nodes are behind NAT.
  • Bandwidth Throttling: Some SST methods allow bandwidth limits (e.g., rsync --bwlimit).

Monitoring and Diagnosing WAN Clusters

Regular monitoring using SHOW GLOBAL STATUS LIKE 'wsrep_%'; is vital. Consult the MariaDB Galera Cluster Status Variables documentation for variable meanings.

Key status variables to watch:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status'; -- Should be 'Primary'
SHOW GLOBAL STATUS LIKE 'wsrep_connected';      -- Should be 'ON'
SHOW GLOBAL STATUS LIKE 'wsrep_ready';          -- Should be 'ON'
SHOW GLOBAL STATUS LIKE 'wsrep_cluster_size';   -- Expected number of nodes
SHOW GLOBAL STATUS LIKE 'wsrep_local_state_comment'; -- Node state
SHOW GLOBAL STATUS LIKE 'wsrep_local_recv_queue';    -- Replication queue size
SHOW GLOBAL STATUS LIKE 'wsrep_local_send_queue';    -- Send queue size
SHOW GLOBAL STATUS LIKE 'wsrep_flow_control_paused'; -- Fraction of time paused
SHOW GLOBAL STATUS LIKE 'wsrep_cert_deps_distance';  -- Parallelization info
SHOW GLOBAL STATUS LIKE 'wsrep_gcomm_uuid';          -- Segment/group UUID
SHOW GLOBAL STATUS LIKE 'wsrep_incoming_addresses'; -- Addr of connected nodes

High wsrep_local_recv_queue or wsrep_flow_control_paused values indicate issues. Also monitor MariaDB error logs and use network diagnostic tools like ping, mtr, and traceroute.

Common Challenges and Pitfalls in WAN Galera Setups

  • Ignoring RTT: Leads to instability.
  • Not Using Segments: Causes poor performance.
  • Insufficient Bandwidth: Results in growing queues.
  • Too Small gcache.size: Causes slow SSTs.
  • Chatty Applications: Amplify latency impact. Batch operations.
  • Unstable WAN Links: Cause persistent problems.
  • Firewall Misconfigurations: Block essential Galera ports (default 4567 TCP/UDP, 4444 for IST/SST donor, SST method ports).

Best Practices Summary for WAN Galera Deployments

  1. Measure RTT meticulously and configure timeouts accordingly.
  2. Always use gmcast.segment for multi-DC deployments.
  3. Tune evs.send_window and evs.user_send_window.
  4. Allocate sufficient gcache.size.
  5. Adjust flow control parameters to be less aggressive.
  6. Use an odd number of segments/sites or employ garbd for quorum.
  7. Encrypt all replication traffic over WAN.
  8. Monitor cluster health continuously.
  9. Design applications for local reads where possible.
  10. Test failover scenarios rigorously.

Alternative Architectures (Brief Mention)

For extremely high latencies where synchronous commit times are unacceptable:

  • Asynchronous Replication Between Galera Clusters: Each region runs its own Galera Cluster, linked by standard MariaDB asynchronous replication.
  • Database-as-a-Service (DBaaS) Geo-Replication: Cloud solutions like Amazon Aurora Global Database or Google Cloud Spanner offer alternatives.

Conclusion

Configuring MariaDB Galera Cluster for synchronous replication across high-latency WAN links is complex but achievable. It demands a deep understanding of Galera, meticulous planning, careful tuning, and continuous monitoring. By leveraging segments, appropriately adjusting parameters, and following best practices, you can build a resilient geo-distributed database system. Remember the inherent trade-off: strong consistency over WAN comes at the cost of write latency influenced by distance and network quality.