Health checks for Keycloak
When customers run the cross-site setup in their environment, they should be able to point their monitoring to specific URLs to see if everything is up and running as expected.
This page attempts to provide an overview of such URLs, Kubernetes resources and Health check endpoints derived from a cross-site setup of Keycloak.
Overview
A Proactive monitoring strategy, aiming to detect and alert on issues before they impact users is key for a highly resilient and highly available Keycloak application.
Health checks across various architectural components (like application health, load balancing, caching, and overall system status) are critical for:
-
Ensuring High Availability: By verifying that all sites and the load balancer are operational, it helps guarantee that the system can handle requests even if one site goes down.
-
Maintaining Performance: Checking the health and distribution of the Infinispan cache ensures that Keycloak can maintain optimal performance by efficiently handling sessions and other temporary data.
-
Operational Resilience: By continuously monitoring the health of both Keycloak and its dependencies within the OpenShift environment, the system can quickly identify and possibly auto-remediate issues, reducing downtime.
How to set it up
Prerequisite:
-
Install jq if it is not available on your OS already.
More on the specific Health Checks
Keycloak Load Balancer and Sites
Verifies the health of the Keycloak application through its load balancer and both primary and backup sites. This ensures that Keycloak is accessible and that the load balancing mechanism is functioning correctly across different geographical or network locations.
This command returns the health status of the Keycloak application’s connection to its configured database, thus confirming the reliability of db connections.
This is only available on the management port, and not available from the external URL.
In a Kubernetes setup, the sub-status health/ready
is checked periodically to make the Pod as ready.
curl -s https://keycloak:managementport/health
This command verifies the lb-check
endpoint of the load balancer and ensures the Keycloak application cluster is up and running.
curl -s https://keycloak-load-balancer-url/lb-check
These commands will return the running status of the Site A and Site B of the Keycloak in a cross-site setup (be it Active/Passive or Active/Active).
curl -s https://keycloak_site_a_url/lb-check
curl -s https://keycloak_site_b_url/lb-check
Infinispan Cache Health
Check the health of the default cache manager and individual caches in an external Infinispan cluster. This is vital for Keycloak performance and reliability, as Infinispan is often used for distributed caching and session clustering in Keycloak deployments.
This command returns the overall health of the Infinispan cache manager, this is useful as the Admin user doesn’t need to provide user credentials to get the health status.
curl -s https://infinispan_rest_url/rest/v2/cache-managers/default/health/status
Whereas for these health checks, Admin user needs to provide the infinispan user credentials as part of the request to peek into the overall health of the external Infinispan cluster caches.
curl -u <infinispan_user>:<infinispan_pwd> -s https://infinispan_rest_url/rest/v2/cache-managers/default/health \
| jq 'if .cluster_health.health_status == "HEALTHY" and (all(.cache_health[].status; . == "HEALTHY")) then "HEALTHY" else "UNHEALTHY" end'
Optionally you can run the above command without the jq
filter 'if .cluster_health.health_status == "HEALTHY" and (all(.cache_health[].status; . == "HEALTHY")) then "HEALTHY" else "UNHEALTHY" end'
to see the full details, this filter is a convenience to compute the overall health based on the individual cache health.
Infinispan Cluster Distribution
Assesses the distribution health of the Infinispan cluster, ensuring that the cluster’s nodes are correctly distributing data. This step is essential for the scalability and fault tolerance of the caching layer.
You can modify the expectedCount 3
argument to match the total nodes in the cluster and validate if they are healthy or not.
curl <infinispan_user>:<infinispan_pwd> -s https://infinispan_rest_url/rest/v2/cluster\?action\=distribution \
| jq --argjson expectedCount 3 'if map(select(.node_addresses | length > 0)) | length == $expectedCount then "HEALTHY" else "UNHEALTHY" end'
Overall Infinispan System Health
Uses OpenShift’s CLI tool to query the health status of Infinispan clusters and the Keycloak service in the specified namespace. This comprehensive check ensures that all components of the Keycloak deployment are operational and correctly configured within the OpenShift environment.
oc get infinispan -n <NAMESPACE> -o json \
| jq '.items[].status.conditions' \
| jq 'map({(.type): .status})' \
| jq 'reduce .[] as $item ([]; . + [keys[] | select($item[.] != "True")]) | if length == 0 then "HEALTHY" else "UNHEALTHY: " + (join(", ")) end'
Keycloak Readiness in Openshift
Specifically, checks for the readiness and rolling update conditions of Keycloak deployments in Red Hat OpenShift, ensuring that the Keycloak instances are fully operational and not undergoing updates that could impact availability.
oc wait --for=condition=Ready --timeout=10s keycloaks.k8s.keycloak.org/keycloak -n <NAMESPACE>
oc wait --for=condition=RollingUpdate=False --timeout=10s keycloaks.k8s.keycloak.org/keycloak -n <NAMESPACE>
Optional Bash script
You can use the cross-site-health-checks.sh script and extend it to perform the necessary checks and integrate this into your monitoring architecture.
To run the script, as a pre-requisite, you need to establish a session from your terminal to the target OCP cluster with a command.
oc login --token=sha256~masked-key --server=https://api.gh-keycloak-a.masked.openshiftapps.com:6443
Also note, it would be necessary to run this script against all the clusters in the cluster group,
so the Administrator would have to repeat the above oc
login command for all the clusters,
which could be automated.
To run the script itself once you have an active oc
session below is an example usage.
./cross-site-health-checks.sh \
-n runner-keycloak \
-l <KEYCLOAK_LB_URL> \
-k <KEYCLOAK_SITE_URL> \
-i <KEYCLOAK_ISPN_REST_URL> \
-u developer \
-p <ISPN_REST_URL_PWD> \
-c 3
Verify the Keycloak Load Balancer health check
Checking health for: KEYCLOAK_LB_URL/lb-check
"HEALTHY"
Verify the Load Balancer health check on the Site
Checking health for: KEYCLOAK_SITE_URL/lb-check
"HEALTHY"
Verify the default cache manager health in external ISPN
Checking health for: KEYCLOAK_ISPN_REST_URL/rest/v2/cache-managers/default/health/status
"HEALTHY"
Verify individual cache health
"HEALTHY"
ISPN Cluster Distribution
"HEALTHY"
ISPN Overall Status
"HEALTHY"
Verify for Keycloak condition in ROSA cluster
keycloak.k8s.keycloak.org/keycloak condition met
keycloak.k8s.keycloak.org/keycloak condition met
Usage of the script with details around the different options
Usage: [-n namespace] [-l keycloak_lb_url] [-k keycloak_site_url]
[-i infinispan_rest_url] [-u infinispan_user] [-p infinispan_pwd]
[-c expected_ispn_count]
-n namespace: Kubernetes namespace
-l keycloak_lb_url: Keycloak Load Balancer URL
-k keycloak_site_url: Keycloak Site URL
-i infinispan_rest_url: Infinispan REST URL
-u infinispan_user: Infinispan user
-p infinispan_pwd: Infinispan password
-c expected_ispn_count: Expected Node Count in the Infinispan cluster