Metrics for troubleshooting Keycloak deployment

Learn about metrics that can indicate where the issue is, for example, when service level objective is not met

For a running Keycloak deployment it is important to understand how the system performs and whether it meets your service level objectives (SLOs). For more details on SLOs, proceed to the Keycloak service level indicators (SLIs) guide.

This guide will provide directions to answer the question: “What can I do when my SLOs are not met?”

Keycloak consists of several components where an issue or misconfiguration of one of them can move your service level indicators to undesirable numbers.

A guidance provided by this guide is illustrated in the following example:

Observation: Latency service level objective is not met.

Metrics that indicate a problem:

  1. Keycloak’s database connection pool is often exhausted, and there are threads queuing for a connection to be retrieved from the pool.

  2. Keycloak’s users cache hit ratio is at a low percentage, around 5%. This means only 1 out of 20 user searches is able to obtain user data from the cache and the rest needs to load it from the database.

Possible mitigations suggested:

  • Increasing the users cache size to a higher number which would decrease the number of reads from the database.

  • Increasing the number of connections in the connection pool. This would need to be checked with metrics for your database and tuning it for a higher load, for example, by increasing the number of available processors.

  • This guide focuses on Keycloak metrics. Troubleshooting the database itself is out of scope.

  • This guide provides general guidance. You should always confirm the configuration change by conducting a performance test comparing the metrics in question for the old and the new configuration.

On this page