A single Keycloak instance is susceptible to availability issues. If the instance goes down, you experience a full outage until another instance comes up. By running two or more cluster members on different machines, you greatly increase the availability of Keycloak.
A single JVM has a limit on how many concurrent requests it can handle. Additional server instances can provide roughly linear scaling of throughput until associated resources, such as the database or distributed caching, limit that scaling.
In general, consider allowing the Keycloak Operator to handle horizontal scaling concerns. When using the Operator, set the Keycloak custom resource spec.instances
as desired to horizontally scale. For more details, see Deploy Keycloak for HA with the Keycloak Operator.
If you are not using the Operator, please review the following:
-
Higher availability is possible of your instances are on separate machines. On Kubernetes, use Pod anti-affinitity to enforce this.
-
Use distributed caching; for multi-site clusters, use external caching for cluster members to share the same state. For details on the relevant configuration, see Configuring distributed caches. The embedded Infinispan cache has horizontal scaling considerations including:
-
Your instances need a way to discover each other. For more information, see discovery in Configuring distributed caches.
-
This cache is not optimal for clusters that span multiple availability zones, which are also called stretch clusters. For embedded Infinispan cache, work to have all instances in one availability zone. The goal is to avoid unnecessary round-trips in the communication that would amplify in the response times. On Kubernetes, use Pod affinity to enforce this grouping of Pods.
-
This cache does not gracefully handle multiple members joining or leaving concurrently. In particular, members leaving at the same time can lead to data loss. On Kubernetes, you can use a StatefulSet with the default serial handling to ensure Pods are started and stopped sequentially.
To avoid losing service availability when a whole site is unavailable, see the high availability guide for more information on a multi-site deployment. See Multi-site deployments.
Horizontal Autoscaling
Horizontal autoscaling allows for adding or removing Keycloak instances on demand. Keep in mind that startup times will not be instantaneous and that optimized images should be used to minimize the start time.
When using the embedded Infinispan cache cluster, dynamically adding or removing cluster members requires Infinispan to perform a rebalancing of the Infinispan caches, which can get expensive if many entries exist in those caches.
To minimize this time we limit number of entries in session related caches to 10000 by default. Note, this optimization is possible only if persistent-user-sessions
feature is not explicitly disabled in your configuration.
On Kubernetes, the Keycloak custom resource is scalable meaning that it can be targeted by the built-in autoscaler. For example to scale on average CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: keycloak-hpa
namespace: keycloak-cluster
spec:
scaleTargetRef:
apiVersion: k8s.keycloak.org/v2alpha1
kind: Keycloak
name: keycloak
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
|
Scaling on memory is generally not needed with persistent sessions enabled, and should not be needed at all when using remote Infinispan. If you are using persistent sessions or remote Infinispan and you experience memory issues, it is best to fully diagnose the problem and revisit the Concepts for sizing CPU and memory resources guide. Adjusting the memory request and limit is preferable to horizontal scaling.
|