Exploring the core.cache api
I was reading through the code of core.cache and wanted to figure out the "correct" way to use it. The protocol for it is a bit large so I wasn't sure what the best way to check/hit/miss was.
(defprotocol CacheProtocol
"This is the protocol describing the basic cache capability."
(lookup [cache e]
[cache e not-found]
"Retrieve the value associated with `e` if it exists, else `nil` in
the 2-arg case. Retrieve the value associated with `e` if it exists,
else `not-found` in the 3-arg case.")
(has? [cache e]
"Checks if the cache contains a value associated with `e`")
(hit [cache e]
"Is meant to be called if the cache is determined to contain a value
associated with `e`")
(miss [cache e ret]
"Is meant to be called if the cache is determined to **not** contain a
value associated with `e`")
(evict [cache e]
"Removes an entry from the cache")
(seed [cache base]
"Is used to signal that the cache should be created with a seed.
The contract is that said cache should return an instance of its
own type."))
The docs in core.cache mention the common usage pattern:
(if (cache/has? C :c) ;; has? checks that the cache contains an item
(cache/hit C :c) ;; hit returns a cache with any relevant internal information updated
(cache/miss C :c 42)) ;; miss returns a new cache with the new item and without evicted entries
;=> {:a 1, :b 2, :c 42}
Using the
has?/hit/miss\
pattern ensures that the thresholding and eviction logic for each implementation works properly. It is built into thethrough\
andthrough-cache\
functions, as well asclojure.core.cache.wrapped/lookup-or-miss\
. *Avoid this pattern at your own risk.*
This usage shows using a cache value but most usage in the wild will be based on some storage, usually an atom (although there are examples using mongo, sqlite, etc). It also shows hitting and missing but not the best way to do all of this and also get the value out.
I'm using the grep.app webapp to find usages in the wild. Search for usages of core.cache with https://grep.app/search?q=clojure.core.cache. The site grep.app is awesome and I want to contribute to the creator. You should use it to search for code if you don't already.
Examples
Stencil
Stencil is a mustache templating library using cache which is what metabase uses for its email templates in the search results from grep.app. It uses an api of get-in
and assoc-in
, notably never calling hit
(miss
is called by assoc
under the hood). This means that any caches which depend on hit
for stats won't work: notably LRUCache, LUCache.
(defn cache
"Given a template name (string), variant key (string), template source
(string), and optionally a parsed AST, and stores that entry in the
template cache. Returns the parsed template."
([template-name template-variant template-src]
(cache template-name template-variant template-src (parse template-src)))
([template-name template-variant template-src parsed-template]
(swap! parsed-template-cache
assoc-in [template-name template-variant]
(template-cache-entry template-src
parsed-template))
parsed-template))
,,,
(defn cache-get
"Given a template name, attempts to fetch the template with that
name from the template cache. If it is not in the cache, nil will
be returned. Single argument version gets the default variant."
([template-name]
(cache-get template-name :default))
([template-name template-variant]
(get-in @parsed-template-cache [template-name template-variant])))
A couple things stand out here. One, this uses get-in
and assoc-in
but the cache's don't seem to work that way. Second, by only using these two functions, hit
and miss
are not called so LU and LRU cache's can't compute frequency of use. This is especially bad since the default cache for stencil is an LRU cache:
;; The parsed template cache maps a template name to its parsed versions.
(def ^{:private true} parsed-template-cache
(atom (try
(require 'clojure.core.cache)
((resolve 'clojure.core.cache/lru-cache-factory) {})
(catch ExceptionInInitializerError _
(CoreCacheUnavailableStub_SeeReadme.))
(catch FileNotFoundException _
(CoreCacheUnavailableStub_SeeReadme.)))))
Some examples of how this caching strategy disobeys the eviction strategies:
(let [cache (atom (cache/lru-cache-factory {} :threshold 2))]
(swap! cache assoc-in [:a :b] 1)
(swap! cache assoc-in [:c :d] 2)
;; make [:a :b] the most recently used entry
(get-in @cache [:a :b])
(swap! cache assoc-in [:e :f] 3)
;; the [:a :b] value is not present
(get-in @cache [:a :b])
)
;; nil
(let [cache (atom (cache/lu-cache-factory {} :threshold 2))]
(swap! cache assoc-in [:a :b] 1)
(swap! cache assoc-in [:c :d] 2)
(get-in @cache [:a :b])
(get-in @cache [:c :d])
(get-in @cache [:c :d])
(get-in @cache [:c :d])
;; the cache now should only have :c and :e based on least used stats
(swap! cache assoc-in [:e :f] 3)
)
;; {:a {:b 1}, :e {:f 3}}
(let [cache (atom (cache/ttl-cache-factory {} :ttl 400))]
(swap! cache assoc-in [:a :b] 1)
(swap! cache assoc-in [:c :d] 2)
(swap! cache assoc-in [:e :f] 3)
(Thread/sleep 300)
(swap! cache assoc-in [:a :new-key] :should-last-500)
(Thread/sleep 300)
;; the [:a :b] key should be expired, the [:a :new-key] should be present
;; but the cache key is actualy just `:a`
(prn [(get-in @cache [:c :d]) (get-in @cache [:a :b]) (get-in @cache [:a :new-key])])
)
;; [nil 1 :should-last-500]
Rieman
Rieman uses a cache in the following way:
(def fun-cache
"Speeds up the compilation of queries by caching map of ASTs to corresponding
functions."
(atom (cache/lru-cache-factory {} :threshold 64)))
(defn fun
[ast]
(if-let [fun (cache/lookup @fun-cache ast)]
; Cache hit
(do (swap! fun-cache cache/hit ast)
fun)
; Cache miss
(let [fun (eval (list 'fn ['event] (clj-ast ast)))]
(swap! fun-cache cache/miss ast fun)
fun)))
This works well with the lru cache but actually NPEs on the lu cache. The problem is that the cache/hit
function for the lu cache expects that the result is in the cache if hit is called and increments the hit count. But with the pattern above, there's no way to guarantee that. The following should npe spectacularly in a repl:
(let [lu-cache (atom (c/lu-cache-factory {} :threshold 4))
thread-count 200
loop-count 20000
latch (java.util.concurrent.CountDownLatch. thread-count)]
(doseq [i (range thread-count)]
(.start (Thread. (fn []
(loop [c 0]
(if (< c loop-count)
(do
(if-let [v (c/lookup @lu-cache i)]
(swap! lu-cache c/hit i)
(swap! lu-cache c/miss i i))
(recur (inc c)))
(.countDown latch)))))))
(.await latch))
Exception in thread "Thread-1664" Exception in thread "Thread-1682" Exception in thread "Thread-1690" java.lang.NullPointerException
at clojure.lang.Numbers.ops(Numbers.java:1068)
at clojure.lang.Numbers.inc(Numbers.java:137)
at clojure.core$inc.invokeStatic(core.clj:927)
at clojure.core$inc.invoke(core.clj:922)
at clojure.lang.AFn.applyToHelper(AFn.java:154)
at clojure.lang.AFn.applyTo(AFn.java:144)
at clojure.core$apply.invokeStatic(core.clj:667)
at clojure.core$update_in$up__6853.invoke(core.clj:6185)
at clojure.core$update_in.invokeStatic(core.clj:6186)
...
Exception in thread "Thread-1851" Exception in thread "Thread-1792" java.lang.NullPointerException
Exception in thread "Thread-1739" java.lang.NullPointerException
Exception in thread "Thread-1863" java.lang.NullPointerException
Exception in thread "Thread-1840" java.lang.NullPointerException
Exception in thread "Thread-1850" java.lang.NullPointerException
Exception in thread "Thread-1825" java.lang.NullPointerException
Exception in thread "Thread-1859" java.lang.NullPointerException
Exception in thread "Thread-1740" java.lang.NullPointerException
Exception in thread "Thread-1743" java.lang.NullPointerException
Exception in thread "Thread-1741" java.lang.NullPointerException
Exception in thread "Thread-1849" Exception in thread "Thread-1742" java.lang.NullPointerException
Exception in thread "Thread-1839" Exception in thread "Thread-1729" java.lang.NullPointerException
Exception in thread "Thread-1862" java.lang.NullPointerException
Exception in thread "Thread-1845" java.lang.NullPointerException
...
This example also dereferences the cache twice, once to check if the value is present with lookup, and then again when doing the hit or miss functionality. This NPEs on the LU cache and seems less than optimal with the other caches.
clj-http
This repo includes an example of a caching middleware
(defn- cached-response
([client req]
(let [cache-key (str (:server-name req) (:uri req) "?" (:query-string req))]
(if (cache/has? @http-cache cache-key)
(do
(println "CACHE HIT")
(reset! http-cache (cache/hit @http-cache cache-key)) ; update cache stats
(cache/lookup @http-cache cache-key)) ; return cached value
; do not invoke further middleware
(do
(println "CACHE MISS")
(let [resp (update (client req) :body slurp-bytes)] ; middleware chain invoked
(when (and (http/success? resp) (= (:request-method req) :get))
(reset! http-cache (cache/miss @http-cache cache-key resp)) ; update cache value
resp)))))))
;; in essence
(if (cache/has? @cache k)
(do (swap! cache cache/hit k)
(cache/lookup @cache k))
(do (let [value (compute k)]
(swap! cache cache/miss k value)
value)))
This has a good aspect on the miss branch and a bad example on the hit branch. I'm omitting the reset!
usage as that's not great usage of the Atom api. The good aspect in the miss branch is that the value is computed, put into the cache, and then returned without looking the value up in the cache container. It is not guaranteed to be there. This error avoided in the miss branch is exemplified in the hit branch. The fact that one derefed cache value cache/has?\
an entry does not guarantee that lookup will succeed. This is seemingly true of all of the cache's besides the basic cache. And the LU bug should NPE here when hit is called and the cache doesn't have the value.
(doseq [[cache-type cache'] [#_[:lu (c/lu-cache-factory {} :threshold 4)]
[:lru (c/lru-cache-factory {} :threshold 4)]
[:ttl (c/ttl-cache-factory {} :ttl 20)]]]
(let [cache (atom cache')
thread-count 20
loop-count 20000
latch (java.util.concurrent.CountDownLatch. thread-count)
has-but-missing (atom 0)]
(doseq [i (range thread-count)]
(.start (Thread. (fn []
(loop [c 0]
(if (< c loop-count)
(do (if (c/has? @cache i)
(do (swap! cache c/hit i)
(if-let [v (c/lookup @cache i)]
::good
(swap! has-but-missing inc)))
(swap! cache c/miss i i))
(recur (inc c)))
(.countDown latch)))))))
(.await latch)
(println cache-type " has? but missing" @has-but-missing)))
:lru has? but missing 74
:ttl has? but missing 89
nil
Each of these is an example where the cache reported that it had a value and then subsequent access came back empty. This is a classic race condition. Using this pattern allows for the scenario where no value to computed and no value is retrieved from the cache.
Other examples of this:
A good example
https://github.com/datacrypt-project/hitchhiker-tree/blob/master/src/hitchhiker/redis.clj
(let [cache (-> {}
(cache/lru-cache-factory :threshold 10000)
atom)]
(defn totally-fetch
[redis-key]
(let [run (delay
(loop [i 0]
(if (= i 1000)
(do (println "total fail") (throw (ex-info "total fail" {:key redis-key})))
(let [x (wcar {} (car/get redis-key))]
(if x
x
(do (Thread/sleep 25) (recur (inc i))))))))
cs (swap! cache (fn [c]
(if (cache/has? c redis-key)
(cache/hit c redis-key)
(cache/miss c redis-key run))))
val (cache/lookup cs redis-key)]
(if val (<?? val) @run))))
The important part is separating when the cache is a value and when the cache is a mutable location. The hit and has?/miss is inside of the the swap! function. The cache returned from the swap is the one used to get the value. There's a nice function that encapsulates this:
(defn through-cache
"The basic hit/miss logic for the cache system. Like through but always has
the cache argument in the first position for easier use with swap! etc."
([cache item] (through-cache cache item default-wrapper-fn identity))
([cache item value-fn] (through-cache cache item default-wrapper-fn value-fn))
([cache item wrap-fn value-fn]
(if (clojure.core.cache/has? cache item)
(clojure.core.cache/hit cache item)
(clojure.core.cache/miss cache item (wrap-fn #(value-fn %) item)))))
I think this is the proper api of core.cache. Use through-cache
to get a new cache value and use that exclusively. The resulting cache mutable container is not guaranteed to have the value and cannot be relied on for lookup. This is of course standard interaction with an atom where multiple dereferences are bad. And of course the lookup-or-miss
function in clojure.core.cache.wrapped is probably what should always be used:
(defn lookup-or-miss
"Retrieve the value associated with `e` if it exists, else compute the
value (using value-fn, and optionally wrap-fn), update the cache for `e`
and then perform the lookup again.
value-fn (and wrap-fn) will only be called (at most) once even in the
case of retries, so there is no risk of cache stampede.
Since lookup can cause invalidation in some caches (such as TTL), we
trap that case and retry (a maximum of ten times)."
([cache-atom e value-fn]
(lookup-or-miss cache-atom e default-wrapper-fn value-fn))
([cache-atom e wrap-fn value-fn]
(let [d-new-value (delay (wrap-fn value-fn e))]
(loop [n 0
v (c/lookup (swap! cache-atom
c/through-cache
e
default-wrapper-fn
(fn [_] @d-new-value))
e
::expired)]
(when (< n 10)
(if (= ::expired v)
(recur (inc n)
(c/lookup (swap! cache-atom
c/through-cache
e
default-wrapper-fn
(fn [_] @d-new-value))
e
::expired))
v))))))
It seems to me that lookup-or-miss
is by far the best way to interact with an atom backed cache and is basically the only function that should be called. The CacheProtocol
is useful but invites the caller to do too much outside of a swap function.
clojure.core.cache.wrapped
The namespace clojure.core.cache.wrapped provides some helper functions for when operating on an atom stored cache. It provides dereferencing functions for has?
, hit
, miss
, evict
, and seed
. I think these functions are footguns and should almost never be used as they invite checking the cache and operating on a cache which doesn't have the value.
Operating on just a cache value (not an atom)
The protocol even invites using has?
and friends on a cache value rather than an atom, but even this is isn't safe. In the above examples, most of the bugs came from other threads hammering a cache in between a has?
check and then lookup
. However, even a value isn't safe in all cases. Most of the caches either have a value or don't so they are safe. But the TTL cache has a temporal aspect that can cause a has?
check to return true and then the subsequent lookup
to fail.
(doseq [[cache-type init] [[:lu (c/lu-cache-factory {} :threshold 4)]
[:ttl (c/ttl-cache-factory {} :ttl 50)]]]
(let [cache-atom (atom init)
thread-count 200
loop-count 20000
latch (java.util.concurrent.CountDownLatch. thread-count)
has-but-no-value (atom 0)]
(doseq [i (range thread-count)]
(.start (Thread. (fn []
(loop [c 0]
(if (< c loop-count)
(do
(let [cache-value @cache-atom]
(if (c/has? cache-value i)
;; this is just a value, so if it has, we can lookup?
(if-let [v (c/lookup cache-value i)]
::good
(swap! has-but-no-value inc))
(swap! cache-atom c/miss i i)))
(recur (inc c)))
(.countDown latch)))))))
(.await latch)
(println "has? but no value: " @has-but-no-value)))
has? but no value: 0
has? but no value: 224
nil
The lu cache always had a value when has?
returns true. But the ttl expired inbetween the has? and lookup checks on the value 224 times. Therefore, just using the CacheProtocol
on an arbitrary cache isn't safe.
So what if we adjust this and use the through-cache
function which uses the helpful logic that the resulting cache has the value?
(doseq [[cache-type init] [[:lu (c/lu-cache-factory {} :threshold 4)]
[:ttl (c/ttl-cache-factory {} :ttl 50)]]]
(let [cache-atom (atom init)
thread-count 200
loop-count 20000
latch (java.util.concurrent.CountDownLatch. thread-count)
has-but-no-value (atom 0)]
(doseq [i (range thread-count)]
(.start (Thread. (fn []
(loop [c 0]
(if (< c loop-count)
(do
(let [cache-value (swap! cache-atom c/through-cache i (constantly i))]
(if-let [v (c/lookup cache-value i)]
::good
(swap! has-but-no-value inc)))
(recur (inc c)))
(.countDown latch)))))))
(.await latch)
(println "has? but no value: " @has-but-no-value)))
has? but no value: 0
has? but no value: 50
nil
When using the lu, we of course always have a value. But the ttl cache is still susceptible to returning a cache that doesn't have the value. It would seem that the only good way to interact with the cache is with clojure.core.cache.wrapped/lookup-or-miss
.
As an addendum, when running the above comparisons and tracking the computations involved, we get the following numbers:
:lu
has? but no value: 0
computations: 19,566,746
:ttl
has? but no value: 45
computations: 565
and when run with the lookup-or-miss:
:lu
has? but no value: 0
computations: 7549
:ttl
has? but no value: 0
computations: 515
When we used the through-cache
function we computed our value 19 million times, but had we not cached we would only have computed it (* 20 20000)
= 400,000 times!. The contention to save the cached value dwarfed the number of times we even needed to check the value. The lookup-or-miss
has a helpful strategy to just give up after 10 times and return the value. I also saw this same story when using core.memoize which uses core.cache with some major modifications under the hood.
The moral of the story
A couple points:
- a core.cache is either a value or a mutable place which can give you a value. Use
clojure.core.cache.wrapped/lookup-or-miss
on an atom and be done with it. This should be the only function you use in most cases. - don't deref a cache (or in general, an atom) more than once. If you are, figure out how to use
lookup-or-miss
orthrough-cache
instead. - the api kinda invites this. I think the functions in
clojure.core.cache.wrapped
for has?, hit, miss, and evict are foot guns and should never be used. The has?, hit, and miss, functions in clojure.core.cache should only be used inside of a single swap! function. However the protocol invites usage like this and the bugs are subtle.
Most of the bugs explored above arise only under high contention on the cache. Most caches probably aren't under this load and the bugs won't manifest. However, as things scale or caching code goes untouched for a while, this source of bugs could be quite nasty to track down and might only occur under production loads.
The above bugs in stencil, rieman, and the clj-http example should be fixed and I'm going to suubmit PRs for them soon. I want to raise an issue about the LU cache NPE but I'm not sure if its a bug or not. The docstring states that it should be called when the cache is known to have the key, but as discussed this is simply not possible. Most caches handle a missing key just find but the LU cache expects the key there and it to have a usage count.
I don't write often so I hope that the tone of this is exploration and thinking, not pointing at bugs. I wrote this on a Sunday when I was reading the core.cache library. Special thanks to Sean Corfield who writes so much amazing software and even more docs.
Top comments (1)
Hey! Thanks for the insightful article! There is an interesting detail about how lookup-or-miss works: to actually get the value, the function performs a lookup after a miss or a hit. In there hides a dangerous assumption that a cache will always accept a new entry through the miss functionality. While that will probably is fine for the average use case, it fails for the edge cases where you may want a cache either of size 0 or a near-zero TTLs or maybe a custom eviction policy that won't always make space for the new entry. In the implementation of lookup-or-miss from the cache.wrapped namespace, if the the cache fails to accept a new entry, the return value of value-fn will never be returned to the caller, because a lookup will return nil.