After starting Keycloak, consider adapting your instance to the required load using these scaling and tuning guidelines:
minimize resource utilization
achieve target response times
minimize database pool contention
resolve out of memory errors, or excessive garbage collection overhead
provide higher availability via horizontal scaling
As you monitor your Keycloak workload, check to see if the CPU or memory is under or over utilized. Consult Concepts for sizing CPU and memory resources to better tune the resources available to the Java Virtual Machine (JVM).
Before increasing the amount of memory available to the JVM, in particular when experiencing an out of memory error, it is best to determine what is contributing to the increased footprint using a heap dump. Excessive response times may also indicate the HTTP work queue is too large and tuning for load shedding would be better than simply providing more memory. See the following section.
Keycloak automatically adjusts the number of used threads based upon how many cores you make available. Manually changing the thread count can improve overall throughput. For more details, see Concepts for configuring thread pools. However, changing the thread count must be done in conjunction with other JVM resources, such as database connections; otherwise, you may be moving a bottleneck somewhere else. For more details, see Concepts for database connection pools.
To limit memory utilization of queued work and to provide for load shedding, see Concepts for configuring thread pools.
If you are experiencing timeouts in obtaining database connections, you should consider increasing the number of connections available. For more details, see Concepts for database connection pools.
Some platforms, such as Kubernetes, provide mechanisms to vertically autoscale. Vertical autoscaling is not recommended for Keycloak if it requires restarting the server instance, which is currently the case for Java on Kubernetes. You can consider instead providing higher CPU and/or memory limits to allow your JVM to adapt within those limits as needed.
A single Keycloak instance is susceptible to availability issues. If the instance goes down, you experience a full outage until another instance comes up. By running two or more cluster members on different machines, you greatly increase the availability of Keycloak.
A single JVM has a limit on how many concurrent requests it can handle. Additional server instances can provide roughly linear scaling of throughput until associated resources, such as the database or distributed caching, limit that scaling.
In general, consider allowing the Keycloak Operator to handle horizontal scaling concerns. When using the Operator, set the Keycloak custom resource spec.instances
as desired to horizontally scale. For more details, see Deploy Keycloak for HA with the Keycloak Operator.
If you are not using the Operator, please review the following:
Higher availability is possible of your instances are on separate machines. On Kubernetes, use Pod anti-affinitity to enforce this.
Use distributed caching; for multi-site clusters, use external caching for cluster members to share the same state. For details on the relevant configuration, see Configuring distributed caches. The embedded Infinispan cache has horizontal scaling considerations including:
Your instances need a way to discover each other. For more information, see discovery in Configuring distributed caches.
This cache is not optimal for clusters that span multiple availability zones, which are also called stretch clusters. For embedded Infinispan cache, work to have all instances in one availability zone. The goal is to avoid unnecessary round-trips in the communication that would amplify in the response times. On Kubernetes, use Pod affinity to enforce this grouping of Pods.
This cache does not gracefully handle multiple members joining or leaving concurrently. In particular, members leaving at the same time can lead to data loss. On Kubernetes, you can use a StatefulSet with the default serial handling to ensure Pods are started and stopped sequentially.
To avoid losing service availability when a whole site is unavailable, see the high availability guide for more information on a multi-site deployment. See Multi-site deployments.
Horizontal autoscaling allows for adding or removing Keycloak instances on demand. Keep in mind that startup times will not be instantaneous and that optimized images should be used to minimize the start time.
When using the embedded Infinispan cache cluster, dynamically adding or removing cluster members requires Infinispan to perform a rebalancing of the Infinispan caches, which can get expensive if many entries exist in those caches.
To minimize this time we limit number of entries in session related caches to 10000 by default. Note, this optimization is possible only if persistent-user-sessions
feature is not explicitly disabled in your configuration.
On Kubernetes, the Keycloak custom resource is scalable meaning that it can be targeted by the built-in autoscaler.