Problem Description
Performance issues can occur in production environments related to connection pooling and processing queue management. Symptoms include:
- Connection pool reaching high limits (e.g., 1000+ connections) during peak times
- Processing queue ("Process Q") utilization reaching 200% or higher
- Overall system performance degradation due to simultaneous processes
- Uneven connection distribution across cluster nodes
Root Cause Analysis
Several factors can contribute to connection pool and processing queue performance issues:
- RadiantOne does not have a built-in load-balancing mechanism to distribute connections between cluster nodes; backend connections operate independently per node
- Backend connection issues typically arise when caching or views are not properly leveraged, which increases load on backend servers
- Large volumes of modification requests (provisioning/deprovisioning operations) can impact performance
- Processing queue utilization reaching 200% indicates insufficient worker threads relative to the workload
- Multiple simultaneous API modification requests and backend interception scripts can contribute to delays in processing
Diagnostic Steps
Analyze Connection Usage
Review the fid_dump.log file located at $RLI_HOME/vds_server/logs/fid_dump.log to identify which processes are consuming the most connections and resources. This log helps pinpoint resource-intensive processes and their usage patterns.
Monitor Connection Pools
Add monitoring for connection pool usage via an LDAP query under cn=conn-pools,cn=monitor to track real-time connection consumption across cluster nodes.
Review Modification Requests
Analyze the dump file to count stuck modifications and identify patterns in queued or stuck queries, particularly modification requests involving large user batches.
Solution and Configuration Changes
Increase Worker Threads
Navigate to FID Settings → Limits → Custom Limits and increase Max Concurrent Working Threads from the default (e.g., 16) to a higher value (e.g., 64). Balance this increase with available CPU and memory resources.
Increase Connection Pool Limit
Increase the Connection Pool Limit to accommodate peak loads (e.g., 5,000 connections). Configuration details can be found at:
https://developer.radiantlogic.com/idm/v7.4/sys-admin-guide/connection-pooling/
Restart FID Servers
Restart FID servers one by one to apply configuration changes without causing service interruption.
Implement Request Throttling
Coordinate with upstream systems (e.g., identity management platforms) to implement throttling or request dispersion. Limit simultaneous modification volumes and reduce batch sizes (e.g., limiting bursts to 1,000 users at a time) to allow sufficient processing time.
Isolate Leader Node
Consider excluding the leader node from regular traffic by working with the load balancer team. Verify leader node status using the HTTP query endpoint described in this article:
How to determine who is the Leader Node in a cluster setup https://support.radiantlogic.com/hc/en-us/articles/13472309774996-How-to-determine-who-is-the-Leader-Node-in-a-cluster-setup
Configure Custom Alerts
Set up monitoring and alerting for critical metrics:
- Configure alerts for connection pool usage (e.g., alert when connections reach 950)
- Set up processing queue alerts with appropriate thresholds (e.g., 200% for processing queues)
- Configure alerts under FID Settings → Monitoring → Alerts Settings
Reference documentation:
https://developer.radiantlogic.com/idm/v7.4/monitoring-and-reporting-guide/alerts-settings/#processing-load-on-radiantone
Additional Tuning Recommendations
Processing Queue Configuration
Adjust the number of processing queues based on workload requirements. Reference:
https://developer.radiantlogic.com/idm/v7.4/deployment-and-tuning-guide/01-global-tuning/#number-of-processing-queues
Plan Version Upgrade
Consider upgrading to the latest FID version, which includes multiple performance optimizations, bug fixes, and security remediations. Newer versions contain improvements for connection handling, TLS/SSL connections, inter-cluster replication, and processing queue management.
Expected Outcomes
After implementing these configuration changes:
- Improved handling of peak connection loads
- Reduced processing queue congestion
- Better distribution of workload across cluster nodes
- Enhanced monitoring and alerting capabilities for proactive issue detection
- Overall improved system performance and stability
Comments
Please sign in to leave a comment.