AWS Route 53 as loadbalancer for ROSA
This guide describes how to achieve automatic client-failover when a Keycloak deployment when a given site fails.
Route 53 for Client Failover
To provide client failover, we can leverage AWS Route 53's DNS failover capabilities to automatically re-route traffic when the primary site is down. A health check on AWS checks every 30 seconds if a site is responding, and the DNS as seen by the client will update accordingly.
This script generates a subdomain name, with three further host entries for our root domain keycloak-benchmark.com
.
primary.<generated-subdomain>.keycloak-benchmark.com
-
Subdomain for Keycloak site 1
backup.<generated-subdomain>.keycloak-benchmark.com
-
Subdomain for Keycloak site 2
client.<generated-subdomain>.keycloak-benchmark.com
-
Subdomain used by Keycloak clients, that will automatically fail over from 1 to 2 in the event of a failure.
Those DNS entries are registered with the OpenShift clusters so that they respond to requests to that host names. After the setup, the Keycloak deployment is updated to use the new hostnames.
See below for the newly created elements (green) and the updated elements (yellow).
To ensure that a failed Primary site cannot be marked healthy without user input, e.g. in the event of an automated ROSA
cluster restart, we create an AWS SNS topic which fires an event whenever the Primary health check fails. This topic is
used to trigger an AWS Lambda function which updates the health check to point to a non-existent endpoint /lb-check-failed-over
.
In order to failback from the Backup to the Primary cluster, it’s necessary for the health check to be manually updated
back to /lb-check
.
Setup new Route 53 failover
Prerequisites
-
A Route 53 Hosted Zone for your domain
A hosted zone already exists for keycloak-benchmark.com |
Procedure
-
Create two ROSA clusters
-
Create subdomain records and health Checks
PRIMARY_CLUSTER=<name-rosa-cluster> \ BACKUP_CLUSTER=<name-of-rosa_cluster> \ ./provision/aws/route53/route53_create.sh
Note down the domain and URLs generated by the script for the following steps. The generated part of the subdomain name allows for multiple Keycloak instances in the different clusters.
Domain: <generated-subdomain>.keycloak-benchmark.com Client Site URL: client.<generated-subdomain>.keycloak-benchmark.com Primary Site URL: primary.<generated-subdomain>.keycloak-benchmark.com Backup Site URL: backup.<generated-subdomain>.keycloak-benchmark.com
-
Deploy Keycloak as normal, but with the following environment variables set.
-
Primary cluster:
KC_HOSTNAME_OVERRIDE=client.<generated-subdomain>.keycloak-benchmark.com # Hostname used by clients KC_HEALTH_HOSTNAME=primary.<generated-subdomain>.keycloak-benchmark.com # Hostname used by AWS health checks
-
Backup cluster:
KC_HOSTNAME_OVERRIDE=client.<generated-subdomain>.keycloak-benchmark.com # Hostname used by clients KC_HEALTH_HOSTNAME=backup.<generated-subdomain>.keycloak-benchmark.com # Hostname used by AWS health checks
-
Testing Failover
To test failover from primary to the backup site, do the following:
-
Verify that
client.<generated-subdomain>.keycloak-benchmark.com
connects to primary../provision/aws/route53/route53_test_primary_used.sh <generated-subdomain>.keycloak-benchmark.com; echo $?
The script returns
0
if theclient.
subdomain is pointing to the same IP asprimary.
subdomain.This script will fail if the PRIMARY_CLUSTER
andBACKUP_CLUSTER
are set to the same ROSA cluster. -
Login to the primary ROSA cluster and delete the
aws-health-route
Route from the keycloak namespace. -
Wait for about 30 seconds for the Health Checks to determine that
primary.<generated-subdomain>.keycloak-benchmark.com
is no longer healthy. This can be confirmed by inspecting the health check in the AWS console. -
Execute the script from the first step and an exit code of
1
should be returned.
Testing Failback
To Failback from the Backup to the Primary cluster, it’s necessary to perform the following:
-
Recreate the
aws-health-route
on thePRIMARY_CLUSTER
-
Update the
ResourcePath
of the Primary clusters Route53 health check to/lb-check
.
Once both actions have been taken the health check will eventually pass and the client.
record will revert to routing
requests to the primary cluster.