Product:
FID
Version:
7.3.23
Description: One of the nodes in a Cluster is getting down, when trying to start the VDS a few minutes later it gets down again.
Procedure:
First review all Zookeeper are up, for all nodes in the cluster.
Always remember to backup the folder we are going to do the changes.
From the vds_events.log you can find the events Leader index not ready yet and retries until the countDown goes to 0.
2023-02-15T14:08:18,615 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=240]
2023-02-15T14:08:19,627 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=239]
2023-02-15T14:08:20,631 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=238]
...
2023-02-15T14:12:16,498 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=3]
2023-02-15T14:12:17,500 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=2]
2023-02-15T14:12:18,503 WARN - IndexCluster - [cn_registry - FS(unmap=true):/apps/radiantone/vds/vds_server/data/cn_registry/] Leader index not ready yet. Retrying [1s countDown=1]
When finish the countDown there is a Critical Event:
2023-02-15T14:12:19,509 ERROR - Failed to add store cn=registry - critical error, stopping server.
How to Fix It:
Stop the VDS Server running on all nodes only keep the Zookeeper running. (Remember the last node to be stopped is the Leader, first stop the follower nodes).
First: On the each node, please navigate to Zookeeper tab, expand radiantone/v2/cluster/shards/shard1/services/vds/hdapstates and delete all the hdapstates related to cn=registry. If there is no hdapstates related to cn=registry you can continue with the next step.
Note: Do not save after deleting, just delete the HDAP states.
Second: Now on the File System, navigate to $RLI_HOME\vds_server\data folder and delete any tlogs* under the folder cn_registry.
Note: Make sure you follow these steps on all the three nodes.
Third: Now start the VDS server, first node you will start the VDS Server is the Leader node, then you can start the followers.
After this the Node with the issue will start correctly.
Comments
Please sign in to leave a comment.