Troubleshooting inter-cluster replication issues


1. Check cluster names are different across all clusters in replication.

2. Check all the Secondary cluster “replicationjournal” datasource is properly pointing to the CJ/Primary Cluster details only (including the failover datasources)

3. Run the below command on the CJ cluster
$RLI_HOME/bin/advanced/monitoring.bat(.sh) –d cloud-replication
fetch’s properties for HDAP stores under replication.

The above command does the below operations

  • Primary cluster send the Encrypt(RJ password + auth type+ SSHA) to remote HTTP servlet on Secondary Cluster, by getting the hostname from the "replica" in ou=replication,cn=config
  • The Secondary Cluster HTTP servlet will fetch the local Encrypt(RJ password + auth type+ SSHA) to do the comparison
  • If they both match, then as a response to HTTP request the cn=Directory Manager user and password fetched from vds_server.conf of ZK and sent back to primary cluster
  • Primary cluster will use the response of DM user and password to perform an LDAP bind and base search on HDAP to fetch the "vdssynccursor“
  • NOTE : Here the HTTP port will be 8089 or 8090 depending on the SSL checked in VDSHA data source on the secondary cluster...
  • if there is any problem in this process, you will see this error | errors | ["SYNC_CURSOR_CONNECT"] |

4. For datasources configured over a secure port, make sure the Settings>>Client Certificate Truststore both the clusters have each others server certificates.

5. Check if 'vdsha' datasource is configured appropriately with right credentials and the ports. Inconsistencies in this might cause a red arrow with a cross to appear in Replication Monitoring Tab.

6. Make sure, there are NO firewalls blocking the ports 8089, 8090 , 9100 and 9101.

7. Check “vdssynccursor” on both clusters HDAP store to be close to the “lastchangenumber” in the “cn=replicationjournal” store in an Active – Active clusters.

