Troubleshooting RadiantOne Cluster Nodes Not Connecting After Migration (SSL, ADAP, and HDAP State Issues)

Overview

This article describes how to troubleshoot RadiantOne FID cluster follower nodes that stop functioning or disappear from the dashboard after migration, including ADAP connection failures and SSL-related internode communication problems.

Symptoms

  • Follower nodes no longer connect to ZooKeeper and stop functioning after configuration import or migration.
  • Errors such as “Failed to fetch ADAP token” and “Connection refused” when accessing ADAP endpoints over HTTP/HTTPS.
  • Control Panel on port 7171 does not load, or HTTP port shows as 0 in vds_server.log.
  • Directory Browser is reachable after changing HTTP port, but no data/stores load.
  • Leader node is up and running, but does not appear on follower dashboards in the cluster overview.
  • Logs show repeated connection timeouts to leader on port 9101 when adding store cn=config.
  • Logs show HDAP state and index errors (for example, “unrecognized shard ID” and bad hdapstate for multiple stores).

Root Causes

  • SSL configuration for internode communication not correctly migrated; follower nodes cannot validate leader certificates, causing internode SSL failures.
  • Keystore and truststore files (server certificates and CA certificates) are not automatically migrated and must be manually reconfigured on each node.
  • ADAP service not reachable on the expected HTTP/HTTPS ports (7070/7171), or blocked by firewall/local policy.
  • Inconsistent HDAP state and transaction logs (tlog) between leader and followers, causing followers to fail to recover and join the cluster.

Pre‑Checks

  • Confirm RadiantOne version and OS version for all nodes in the cluster.
  • Verify that Control Panel is reachable on the expected HTTP/HTTPS ports and that HTTP Port is not set to 0 under Settings → Server FE → Other Protocols → Web Services.

Step 1 – Validate HTTP/HTTPS and ADAP Access

  • In the Main Control Panel for each node, check:
    • Settings → Server FE → Other Protocols → Web Services → HTTP Port; ensure a valid port (for example, 8089) is set and saved, then perform a graceful restart.
    • Confirm that after restart the Control Panel and Directory Browser load, and ADAP endpoints are reachable on HTTP 7070 or HTTPS 7171 as configured.
  • If ADAP errors persist (for example, “Failed to fetch ADAP token” or I/O error on ADAP URL), verify:
    • ADAP service is running on the node.
    • Firewall or local security policies are not blocking ports 7070/7171.

Step 2 – Review Internode SSL Configuration

  • If internode replication works only when “Never use SSL” is selected, the issue is likely with SSL configuration (certificates or trust).
  • Under Cluster Configuration → Internode Communication, review the SSL settings:
    • If SSL is disabled as a workaround (Never use SSL), plan to restore secure communication after fixing keystores/truststores.
    • Ensure that when “Use SSL” is enabled, all nodes are configured with matching, valid certificates that are mutually trusted.

Step 3 – Reconfigure Keystore and Truststore on Each Node

  • After migration, SSL certificates and related files are not automatically copied; keystore and truststore must be manually set up on every node.
  • For each node in the cluster:
    • Ensure the required keystore and truststore exist and contain the correct server and CA certificates, typically under paths such as:
      • RLI_HOME/vds_server/conf/keystore.jks or RLI_HOME/vds_server/conf/<fid>keystore.jks
      • RLI_HOME/vds_server/conf/truststore.jks or RLI_HOME/vds_server/conf/<fid>truststore.jks
    • If missing or outdated, copy them from the source environment or re-import the correct certs.
    • Use keytool to validate contents:
      • keytool -list -v -keystore conf/<fid>keystore.jks
      • keytool -list -v -keystore conf/<fid>truststore.jks
  • Restart the RadiantOne service on each node after updating keystores, then re‑enable SSL under Cluster Configuration → Internode Communication → Use SSL and test connectivity.

Step 4 – Clean Up HDAP States and Data Folders (Cluster‑Wide)

If logs show errors such as invalid tlog, bad hdapstate, or unrecognized shard IDs, follow this procedure for all nodes in the cluster.

  • Stop VDS server services on all nodes in order:
    • Stop follower nodes first.
    • Stop the leader node last.
  • In the Main Control Panel (while servers are still down), go to the ZooKeeper tab:
    • Expand /radiantone/v1/<cluster-name>/config/nodes/hdapstates in ZooKeeper.
    • Export the subtree for backup if needed.
    • Delete the stores under hdapstates to remove inconsistent HDAP state information.
  • On each node, rename or move the affected data folders under vds\vds_server\data\... so that VDS no longer uses the old inconsistent data files; new data/state will be created at startup.
  • Start VDS services again:
    • Start the leader node first.
    • Start each follower node one by one and confirm that they join the cluster without HDAP state or tlog errors.

Step 5 – Verify Cluster Health

  • In the cluster dashboard:
    • Confirm that leader and follower nodes appear under the cluster overview and that CPU, memory, and ports (LDAP, LDAPS, HTTP, VRS, ZK) show as expected.
    • Confirm ZooKeeper connectivity and internode health show as OK for all nodes.
  • In the Directory Browser:
    • Verify that root naming contexts and stores (for example, cn=configcn=registry, production namespaces) load correctly on all nodes.
  • Check VDS and monitoring logs for any new timeouts, tlog, or HDAP errors before returning the cluster to normal use.
Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.

Articles in this section

See more