Troubleshooting inter-cluster replication issues

Srinivasan Vembakottai

Updated March 22, 2024 22:30

Question:

Troubleshooting inter-cluster replication issues

Answer:

1. Check cluster names are different across all clusters in replication.

2. Check all the Secondary cluster “replicationjournal” data sources are properly pointing to the CJ/Primary Cluster details only (including the failover data sources)

3. Run the below command on the CJ cluster
$RLI_HOME/bin/monitoring.bat -d cloud-replication

(in Linux)

$RLI_HOME/bin/monitoring.sh -d cloud-replication

This fetches properties for HDAP stores under replication.

The above command does the below operations

The primary cluster sends the Encrypt(RJ password + auth type+ SSHA) to the remote HTTP servlet on the Secondary Cluster, by getting the hostname from the "replica" in cn=replication,cn=config
The Secondary Cluster HTTP servlet will fetch the local Encrypt(RJ password + auth type+ SSHA) to do the comparison
If they both match, then as a response to the HTTP request the cn=Directory Manager user and password are fetched from vds_server.conf of ZK and sent back to the primary cluster
The primary cluster will use the response of the DM user and password to perform an LDAP bind and base search on HDAP to fetch the "vdssynccursor“
NOTE: Here the HTTP port will be 8089 or 8090 depending on the SSL checked in the VDSHA data source on the secondary cluster...
if there is any problem in this process, you will see this error | errors | ["SYNC_CURSOR_CONNECT"] |

4. For data sources configured over a secure port, make sure the Settings>>Client Certificate Truststore both the clusters have each other's server certificates.

5. Check if the 'vdsha' data source is configured appropriately with the right credentials and ports. Inconsistencies in this might cause a red arrow with a cross to appear in the Replication Monitoring Tab.

6. Make sure, there are NO firewalls blocking ports 8089, 8090, 9100, and 9101.

7. Check “vdssynccursor” on both clusters' HDAP store to be close to the “lastchangenumber” in the “cn=replicationjournal” store in an Active-Active clusters.

For more details on how the "vdsSyncCursor" works and log messages to look for, please check the below link
https://support.radiantlogic.com/hc/en-us/articles/22722562887444-Troubleshooting-inter-cluster-replication-issues-Part-2-Based-on-Log-analysis-

Troubleshooting inter-cluster replication issues

Question:

Comments

Articles in this section