Simulate failures of Keycloak in Kubernetes
How to automate the simulation of failures Keycloak Pods in a Kubernetes environment to test the recovery of Keycloak after a failure.
Why failure testing
There is an excellent writeup about why we need chaos testing tools in general in the introduction to the chaos testing tool krkn.
Running the failure test using kc-chaos.sh
script
Preparations
-
Extract the
keycloak-benchmark-${version}.[zip|tar.gz]
file -
Make sure you can access the Kubernetes cluster from where you are planning to run the failure tests and run commands such as
kubectl get pods -n keycloak-keycloak
Simulating load
Use the Running benchmarks from the CLI guide to simulate load against a specific Kubernetes environment.
Running the failure tests
Once there is enough load going against the Keycloak application hosted on an existing Kubernetes/OpenShift cluster, execute below command to:
./kc-chaos.sh <RESULT_DIR_PATH>
Set the environment variables below to configure on how and where this script gets executed.
INITIAL_DELAY_SECS
-
Time in seconds the script waits before it triggers the first failure.
CHAOS_DELAY_SECS
-
Time in seconds the script waits between simulating failures.
PROJECT
-
Namespace of the Keycloak pods.
Running the failure test using Krkn Chaos testing framework
We integrated a Chaos testing framework krkn as part of a Taskfile Chaos.yaml and created individual tasks to run the pod-scenarios
test against different components within the multi-site setup of Keycloak on Kubernetes.
It focuses on simulating Pod failure scenarios for Keycloak and Infinispan applications.
Preparations
-
This Taskfile requires Podman/Docker to be installed and configured on the system.
-
The Kubernetes configuration file for the ROSA cluster must be available in the specified
ISPN_DIR
directory. -
Make sure to set the required environment variables before running the tasks.
-
You can customize the behavior of the tasks by overriding the default values for the variables.
kraken-pod-scenarios
This is an internal task that provides the core functionality for running Kraken pod failure scenarios. It uses the pod-scenarios image from the krkn-chaos/krkn-hub repository. The task requires the following variables:
ROSA_CLUSTER_NAME
-
The name of the ROSA cluster
POD_LABEL
-
A label selector to identify the target pods
EXPECTED_POD_COUNT
-
The expected number of pods after the disruption
ISPN_DIR
-
The directory containing the Infinispan configuration
The task sets some default values for variables like DEFAULT_NAMESPACE
, DISRUPTION_COUNT
, WAIT_DURATION
, and ITERATIONS
. It also has a precondition to ensure the existence of the Kubernetes configuration file.
kill-gossip-router
This task kills the JGroups Gossip Router pod in the Infinispan cluster. It calls the kraken-pod-scenarios
task with specific values for POD_LABEL
, DISRUPTION_COUNT
, and EXPECTED_POD_COUNT
.
Right now, the |
Limitations
-
Currently, we are not able to peek into the Krkn report which gets generated inside the kraken pod but gets removed as its ephemeral storage. This is currently planned to be fixed and tracked in a GitHub issue.