Global tags
cache=<name>
-
The cache name.
This is part of the Metrics for troubleshooting Keycloak deployment guide.
Metrics need to be enabled for Keycloak. Follow the Enabling Keycloak Metrics guide for more details.
A monitoring system collecting the metrics.
Global tags
cache=<name>
The cache name.
Monitor the number of entries in your cache using these two metrics. If the cache is clustered, each entry has an owner node and zero or more backup copies of different nodes.
Sum the unique entry size metric to get a cluster total number of entries. |
Metric | Description |
---|---|
|
The approximate number of entries stored by the node, including backup copies. |
|
The approximate number of entries stored by the node, excluding backup copies. |
The following metrics monitor the cache accesses, such as the reads, writes and their duration.
A store operation is a write operation that writes or updates a value stored in the cache.
Metric | Description |
---|---|
|
The total number of store requests. |
|
The total duration of all store requests. |
When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance. |
A read operation reads a value from the cache. It divides into two groups, a hit if a value is found, and a miss if not found.
Metric | Description |
---|---|
|
The total number of read hits requests. |
|
The total duration of all read hits requests. |
|
The total number of read misses requests. |
|
The total duration of all read misses requests. |
When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance. |
A remove operation removes a value from the cache. It divides in two groups, a hit if a value exists, and a miss if the value does not exist.
Metric | Description |
---|---|
|
The total number of remove hits requests. |
|
The total duration of all remove hits requests. |
|
The total number of remove misses requests. |
|
The total duration of all remove misses requests. |
When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance. |
For |
Hit Ratio for read and remove operations
An expression can be used to compute the hit ratio for a cache in systems such as Prometheus. As an example, the hit ratio for read operations can be expressed as:
vendor_statistics_hit_times_seconds_count / (vendor_statistics_hit_times_seconds_count + vendor_statistics_miss_times_seconds_count)
Read/Write ratio
An expression can be used to compute the read-write ratio for a cache, using the metrics above:
(vendor_statistics_hit_times_seconds_count + vendor_statistics_miss_times_seconds_count) / (vendor_statistics_hit_times_seconds_count + vendor_statistics_miss_times_seconds_count + vendor_statistics_remove_hit_times_seconds_count + vendor_statistics_remove_miss_times_seconds_count + vendor_statistics_store_times_seconds_count)
Eviction is the process to limit the cache size and, when full, an entry is removed to make room for a new entry to be cached.
As Keycloak caches the database entities in the users
, realms
and authorization
, database access always proceeds with an eviction event.
Metric | Description |
---|---|
|
The total number of eviction events. |
Eviction rate
A rapid increase of eviction and very high database CPU usage means the users
or realms
cache is too small for smooth Keycloak operation, as data needs to be re-loaded very often from the database which slows down responses.
If enough memory is available, consider increasing the max cache size using the CLI options cache-embedded-users-max-count
or cache-embedded-realms-max-count
Write and remove operations hold the lock until the value is replicated in the local cluster and to the remote site.
On a healthy cluster, the number of locks held should remain constant, but deadlocks may create temporary spikes. |
Metric | Description |
---|---|
|
The number of locks currently being held by this node. |
Transactional caches use both One-Phase-Commit and Two-Phase-Commit protocols to complete a transaction. These metrics keep track of the operation duration.
The PESSMISTIC locking mode uses One-Phase-Commit and does not create commit requests.
|
In a healthy cluster, the number of rollbacks should remain zero. Deadlocks should be rare, but they increase the number of rollbacks. |
Metric | Description |
---|---|
|
The total number of prepare requests. |
|
The total duration of all prepare requests. |
|
The total number of rollback requests. |
|
The total duration of all rollback requests. |
|
The total number of commit requests. |
|
The total duration of all commit requests. |
When histogram is enabled, the percentile buckets are available. Those are useful to create heat maps but, collecting and exposing the percentile buckets may have a negative impact on the deployment performance. |
State transfer happens when a node joins or leaves the cluster. It is required to balance the data stored and guarantee the desired number of copies.
This operation increases the resource usage, and it will affect negatively the overall performance.
Metric | Description |
---|---|
|
The number of in-flight transactional segments the local node requested from other nodes. |
|
The number of in-flight segments the local node requested from other nodes. |
The cluster data replication can be the main source of failure. These metrics not only report the response time, i.e., the time it takes to replicate an update, but also the failures.
On a healthy cluster, the average replication time will be stable or with little variance. The number of failures should not increase. |
Metric | Description |
---|---|
|
The total number of successful replications. |
|
The total number of failed replications. |
|
The average time spent, in milliseconds, replicating data in the cluster. |
Success ratio
An expression can be used to compute the replication success ratio:
(vendor_rpc_manager_replication_count) / (vendor_rpc_manager_replication_count + vendor_rpc_manager_replication_failures)
Return back to the Metrics for troubleshooting Keycloak deployment.