How do I recover my cluster, when I see "Invalid tlog reported" error?
FID/VDS uses the tlog mechanism in order to push changes from leader to all followers in a clustered environment.
if there is a tlog revision mismatch between old leader and the newly elected leader of more than 1 revision.
Example here : the newly elected leader is 5 revisions behind
under $RLI_HOME\logs\vds_events.log, you will see this message
"Invalid tlog reported. Last transactions from leader node haven't been published to this node. Refusing to become a leader to preserve cluster consistency. To force this node as a leader, remove its hdapStates in zookeeper or use the command line option -force when starting the server. Last invalid revision = 1240080 store revision=1240075"
This revision mismatch would lead to a loss of data between the cluster nodes and hence, proactively it will shutdown the whole cluster.
Then, your application/LB should point to another working cluster for the traffic.
To recover from this,
find the "lastLeaderId" from ZK tab under /radiantone/v2/<clustername>, then start that VDS server on that node first. You can start the rest of the nodes later.