DEV Community

Cover image for Optimizing Keycloak Caches: Best Practices for Embedded and External Infinispan
Mohammed Ammer
Mohammed Ammer

Posted on

Optimizing Keycloak Caches: Best Practices for Embedded and External Infinispan

Although setting up Keycloak is relatively straightforward, regardless of your infrastructure's complexity, optimizing its performance for your specific workload can be challenging.

One common approach is to use an external Infinispan with a database persistence to store sessions outside of Keycloak, at least until version 26 makes the user session persistence feature (introduced in Keycloak version 25) a permanent part of Keycloak, moving beyond its previous preview status.

Basic Keycloak Setup with External Infinispan Cluster and Database persistence

In-Memory Cache

The biggest challenge many encounter with "in-memory" caches is managing memory usage. It's crucial for applications to have defined memory limits; otherwise, you risk facing endless issues, including memory overconsumption and frequent application crashes.

At this point, we can identify two types of caches:

  • External Infinispan Cache: This operates independently of Keycloak but is used by Keycloak as a remote cache.
  • Embedded Infinispan Cache: Managed within Keycloak, this cache allows data (e.g. sessions) to be shared across Keycloak nodes/containers, reducing the need for frequent reads from the external cache.

Let's start with the external Infinispan cache, as it's crucial for safeguarding sessions against any unexpected issues in the Keycloak cluster.

External Infinispan Cache

As mentioned earlier, using an external Infinispan cache solves part of the problem, but since it still stores data in memory, there's a risk of data loss. Adding persistent storage (e.g., a database) ensures the data remains safe.

Limit the memory size

With data persisted in the database, you can set a smaller memory limit for the Infinispan cache configuration to suit your cost-efficient resources. However, you must carefully assess the database capacity that matches your load, as less memory for cache means increased demand on the database.
See Configuring maximum count eviction from Infinispan Documentation

<distributed-cache>
  <memory max-count="10000" when-full="REMOVE"/>
</distributed-cache>
Enter fullscreen mode Exit fullscreen mode

Avoid cache Preload

When an Infinispan instance starts, it preloads sessions from the database into the cache, even if the cache is not currently in use, to meet the max-count configured for each cache. To enable "lazy loading"—where Infinispan loads data only when needed—set the preload option to false. Additionally, ensure that the shared flag is set to true.
See Configuring Persistence Store from Infinispan Documentation

<distributed-cache  name="sessions">
    <persistence>
        <string-keyed-jdbc-store
            xmlns="urn:infinispan:config:store:jdbc:15.0" preload="false" shared="true" dialect="POSTGRES">
            <string-keyed-table prefix="EXT" create-on-start="true" drop-on-exit="false">
                <id-column name="id" type="VARCHAR(255)"/>
                <data-column name="data" type="BYTEA"/>
                <timestamp-column name="timestamp" type="BIGINT"/>
                <segment-column name="segment" type="INT"/>
            </string-keyed-table>
        </string-keyed-jdbc-store>
    </persistence>
</distributed-cache>
Enter fullscreen mode Exit fullscreen mode

Disable the statistics

Unfortunately, querying statistics in Infinispan can lead to numerous executions of the following query:

SELECT COUNT(*) FROM "kc_sessions" WHERE timestamp < 0 OR timestamp > $1
Enter fullscreen mode Exit fullscreen mode

As the amount of data in your sessions table increases, the performance of this query degrades. This is because the query is highly CPU-intensive for the database. If you're using AWS RDS, this could result in exhausting your CPU credits, causing your cluster to become unable to handle incoming traffic effectively.

Does this mean you won’t be able to monitor metrics for the sessions table? Not at all.

If you enable Keycloak metrics, you can still track the number of entries in the external Infinispan cache database through the vendor_cache_store_number_of_persisted_entries metric.

To disable statistics

<distributed-cache name="sessions" statistics="false">
</distributed-cache>

Enter fullscreen mode Exit fullscreen mode

Limit owners

To conserve memory in your Infinispan instance, you can configure it to have the minimum number of owners for the sessions. As long as your database can handle I/O operations within an acceptable timeframe, this approach can be effective.

<distributed-cache name="sessions" owners="2">
</distributed-cache>

Enter fullscreen mode Exit fullscreen mode

State transfer

State transfer is the process by which data is moved between nodes in a distributed cache cluster. This mechanism is essential for ensuring that all nodes in the cluster have a consistent view of the cache's data, especially when nodes are added, removed, or rebalanced.

As long as the preload is set to false, the initial state transfer doesn't really matter as no preload to be considered anyways.

<distributed-cache name="sessions">
    <state-transfer timeout="60000" await-initial-transfer="true"/>
</distributed-cache>
Enter fullscreen mode Exit fullscreen mode

Embedded Infinispan Cache

The embedded Infinispan cache in Keycloak is an internal, local cache used to store session and configuration data to improve performance and reduce database load; however, improper configuration can lead to memory issues for Keycloak.

Limit the memory size

By default, the embedded Infinispan cache managed by Keycloak does not impose a limit on the number of stored sessions. This means that if sessions are configured to last for extended periods, such as 3 months or a year, they will remain in Keycloak until memory is exhausted. When memory becomes full, it can lead to issues.

Consider whether you truly need the embedded Infinispan cache if you already have an external cache in place. While I wouldn’t recommend disabling the embedded cache entirely, setting hard limits can help manage memory usage and prevent problems.

For example, if you're handling around 5,000 sessions at a time, setting a limit of 10,000 or even 50,000 sessions in memory should be reasonable. Once the embedded cache reaches its limit, it will evict sessions, but only from the embedded cache, not the external cache.

When a session is needed and has been evicted from the embedded cache, Keycloak will reload it from the external cache.

To ensure this setup works, you'll need to configure the embedded cache settings in Keycloak appropriately.

<infinispan
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:infinispan:config:15.0 http://www.infinispan.org/schemas/infinispan-config-15.0.xsd"
    xmlns="urn:infinispan:config:15.0">

    <cache-container name="keycloak">
        ...
        <distributed-cache name="sessions">
            <memory max-count="50000"/>
        </distributed-cache>
        ...
    </cache-container>
</infinispan>
Enter fullscreen mode Exit fullscreen mode

Limit owners

Since sessions are stored in the external cache, which typically uses two owners, having just one owner in the embedded cache is usually sufficient. As long as the traffic to the external cache remains manageable for your system, setting the embedded cache to one owner per session should work well.

<infinispan
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:infinispan:config:15.0 http://www.infinispan.org/schemas/infinispan-config-15.0.xsd"
    xmlns="urn:infinispan:config:15.0">
    <cache-container name="keycloak">
        ...
        <distributed-cache name="sessions" owners="1">
        </distributed-cache>
        ...
    </cache-container>
</infinispan>
Enter fullscreen mode Exit fullscreen mode

Keep eyes on Keycloak API performance

While this topic is a bit advanced, it could be quite relevant depending on your needs.

Consider a scenario where you have a custom user attribute, such as globalUserId, which serves as a primary key identifier for users within your ecosystem (distinct from the Keycloak USER ID - UUID).

In this case, if you need to perform operations on a Keycloak user, you would first need to retrieve the Keycloak User ID using your globalUserId. This process can introduce performance bottlenecks.

Keycloak uses JPA to manage its entities. When you use the official Keycloak API to query users by attributes with a request like:

POST /{realm}/users?q=globalUserId:value
Enter fullscreen mode Exit fullscreen mode

the underlying database query executed is:

SELECT ue1_0.ID, ue1_0.CREATED_TIMESTAMP, ue1_0.EMAIL, ue1_0.EMAIL_CONSTRAINT, ue1_0.EMAIL_VERIFIED, ue1_0.ENABLED, ue1_0.FEDERATION_LINK, ue1_0.FIRST_NAME, ue1_0.LAST_NAME, ue1_0.NOT_BEFORE, ue1_0.REALM_ID, ue1_0.SERVICE_ACCOUNT_CLIENT_LINK, ue1_0.USERNAME 
FROM USER_ENTITY ue1_0 
LEFT JOIN USER_ATTRIBUTE a1_0 ON ue1_0.ID = a1_0.USER_ID 
WHERE a1_0.NAME = $1 
AND LOWER(a1_0.VALUE) = $2 
AND ue1_0.REALM_ID = $3 
ORDER BY ue1_0.USERNAME 
OFFSET $4 ROWS 
FETCH FIRST $5 ROWS ONLY
Enter fullscreen mode Exit fullscreen mode

As shown, this query joins the USER_ENTITY table with the USER_ATTRIBUTE table to fetch the entire user entity. For a unique custom attribute with a small user base (fewer than a thousand users), this query can result in a latency of up to one second to retrieve user information, including the Keycloak UUID.

Unfortunately, Keycloak’s Admin API does not offer a way to query directly for the UUID from the USER_ATTRIBUTE table.

To address this, you might consider creating a custom Service Provider Interface (SPI) to build an endpoint that queries only the user attributes and returns the Keycloak user IDs associated with the queried attribute.

There are many resources available online for creating a custom API in Keycloak using SPI, specifically looking into RealmResourceProviderFactory. If you’d like a detailed guide on this process, feel free to ask for a blog post!

I hope you find it useful!

Top comments (2)

Collapse
 
carlos_assub profile image
Carlos Assub

Nice topic.
I had a lot of problems with keycloak after upgrading from v11 to v22 with infinispan local cache, actually after a while with heavy load, Infinispan got an error and all other pods became unhealthy with inifinspan timeout.

Collapse
 
mohammedalics profile image
Mohammed Ammer

I hope you were able to get around it and make it stable again. Hoping also my post makes a difference.