Health checks for Keycloak

When customers run the cross-site setup in their environment, they should be able to point their monitoring to specific URLs to see if everything is up and running as expected.

This page attempts to provide an overview of such URLs, Kubernetes resources and Health check endpoints derived from a cross-site setup of Keycloak.

Overview

A Proactive monitoring strategy, aiming to detect and alert on issues before they impact users is key for a highly resilient and highly available Keycloak application.

Health checks across various architectural components (like application health, load balancing, caching, and overall system status) are critical for:

  • Ensuring High Availability: By verifying that all sites and the load balancer are operational, it helps guarantee that the system can handle requests even if one site goes down.

  • Maintaining Performance: Checking the health and distribution of the Infinispan cache ensures that Keycloak can maintain optimal performance by efficiently handling sessions and other temporary data.

  • Operational Resilience: By continuously monitoring the health of both Keycloak and its dependencies within the OpenShift environment, the system can quickly identify and possibly auto-remediate issues, reducing downtime.

How to set it up

Prerequisite:

  1. Openshift CLI is installed and configured.

  2. Install jq if it is not available on your OS already.

More on the specific Health Checks

Keycloak Load Balancer and Sites

Verifies the health of the Keycloak application through its load balancer and both primary and backup sites. This ensures that Keycloak is accessible and that the load balancing mechanism is functioning correctly across different geographical or network locations.

This command returns the health status of the Keycloak application’s connection to its configured database, thus confirming the reliability of db connections. This is only available on the management port, and not available from the external URL. In a Kubernetes setup, the sub-status health/ready is checked periodically to make the Pod as ready.

curl -s https://keycloak:managementport/health

This command verifies the lb-check endpoint of the load balancer and ensures the Keycloak application cluster is up and running.

curl -s https://keycloak-load-balancer-url/lb-check

These commands will return the running status of the Site A and Site B of the Keycloak in a cross-site setup (be it Active/Passive or Active/Active).

curl -s https://keycloak_site_a_url/lb-check
curl -s https://keycloak_site_b_url/lb-check

Infinispan Cache Health

Checks the health of the default cache manager and individual caches in an external Infinispan cluster. This is vital for Keycloak performance and reliability, as Infinispan is often used for distributed caching and session clustering in Keycloak deployments.

This command returns the overall health of the Infinispan cache manager, this is useful as the Admin user doesn’t need to provide user credentials to get the health status.

curl -s https://infinispan_rest_url/rest/v2/cache-managers/default/health/status

Whereas for these health checks, Admin user needs to provide the infinispan user credentials as part of the request to peek into the overall health of the external Infinispan cluster caches.

curl -u <infinispan_user>:<infinispan_pwd> -s https://infinispan_rest_url/rest/v2/cache-managers/default/health \
 | jq 'if .cluster_health.health_status == "HEALTHY" and (all(.cache_health[].status; . == "HEALTHY")) then "HEALTHY" else "UNHEALTHY" end'

Optionally you can run the above command without the jq filter 'if .cluster_health.health_status == "HEALTHY" and (all(.cache_health[].status; . == "HEALTHY")) then "HEALTHY" else "UNHEALTHY" end' to see the full details, this filter is a convenience to compute the overall health based on the individual cache health.

Infinispan Cluster Distribution

Assesses the distribution health of the Infinispan cluster, ensuring that the cluster’s nodes are correctly distributing data. This step is essential for the scalability and fault tolerance of the caching layer.

You can modify the expectedCount 3 argument to match the total nodes in the cluster and validate if they are healthy or not.

curl <infinispan_user>:<infinispan_pwd> -s https://infinispan_rest_url/rest/v2/cluster\?action\=distribution \
 | jq --argjson expectedCount 3 'if map(select(.node_addresses | length > 0)) | length == $expectedCount then "HEALTHY" else "UNHEALTHY" end'

Overall Infinispan System Health

Uses OpenShift’s CLI tool to query the health status of Infinispan clusters and the Keycloak service in the specified namespace. This comprehensive check ensures that all components of the Keycloak deployment are operational and correctly configured within the OpenShift environment.

oc get infinispan -n <NAMESPACE> -o json  \
| jq '.items[].status.conditions' \
| jq 'map({(.type): .status})' \
| jq 'reduce .[] as $item ([]; . + [keys[] | select($item[.] != "True")]) | if length == 0 then "HEALTHY" else "UNHEALTHY: " + (join(", ")) end'

Keycloak Readiness in Openshift

Specifically checks for the readiness and rolling update conditions of Keycloak deployments in Red Hat OpenShift, ensuring that the Keycloak instances are fully operational and not undergoing updates that could impact availability.

oc wait --for=condition=Ready --timeout=10s keycloaks.k8s.keycloak.org/keycloak -n <NAMESPACE>
oc wait --for=condition=RollingUpdate=False --timeout=10s keycloaks.k8s.keycloak.org/keycloak -n <NAMESPACE>

Optional Bash script

You can use the cross-site-health-checks.sh script and extend it to perform the necessary checks and integrate this into your monitoring architecture.

To run the script, as a pre-requisite, you need to establish a session from your terminal to the target OCP cluster with a command such as,

oc login --token=sha256~masked-key --server=https://api.gh-keycloak-a.masked.openshiftapps.com:6443

To run the script itself once you have an active oc session below is an example usage.

./cross-site-health-checks.sh -d gh-keycloak-a-gh-keycloak-b-masked.keycloak-benchmark.com \
-u developer -p masked-password \
-s gh-keycloak-a.masked.openshiftapps.com \
-c 3 -n runner-keycloak

Usage of the script with details around the different options

Usage: ./cross-site-health-checks.sh [-d domain] [-u infinispan_user] [-p infinispan_pwd] [-s infinispan_url_suffix] [-c expected_count] [-n namespace]
  -d domain: Keycloak domain
  -u infinispan_user: Infinispan user
  -p infinispan_pwd: Infinispan password
  -s infinispan_url_suffix: Infinispan URL suffix
  -c expected_count: Expected Node Count in the Infinispan cluster
  -n namespace: Kubernetes namespace