Where parallels cross

Interesting bits of life

Datomic: a little snippet to analyze what attributes your transactions change most often

Too long; didn't read

Use Datomic's log index to find out which attributes have been transacted most often.

The problem

I picked an interesting task at work. We use Datomic as our main database. It is one of a kind among databases: you use a Prolog like syntax to query it and it is very powerful even if fundamentally based on tuples. Anyway, I am still pretty new to it and my task was to review its performance.

The bottleneck of Datomic is its transactor: a piece of software dedicated to persist data. If you transact too often, the writes have to queue, slowing down your response time.

So our team question: are we using wisely our transactor? In more useful terms, I wanted to discover the group of Datomic attributes that our transactions changed most often. Knowing those groups of attributes, I will be able to find the bits of code that cause extra writes and look for ways to improve user experience.

And there is a solution

Luckily the power of Clojure and Datomic API made this a simple task to solve. All you need is to use Datomic's log API. This API access the Log index of Datomic, which list all transactions the transactor worked on (for a useful article about Datomic internals see https://tonsky.me/blog/unofficial-guide-to-datomic-internals/).

The API offers a tx-range function that returns all the transactions happened between two instants. When you have a transaction, you can retrieve the attributes it transacted. If you extract a list of those, then you can calculate a distribution with the built-in Clojure function frequencies.

This is the snippet I used:

; after requiring datomic.api as d
(let [conn (d/connect "some-datomic-url")
      get-attribute-name (fn [x]
                           (->> x
                                :a
                                (d/ident (d/db conn))))]
  (->> (d/tx-range (d/log conn) #inst"2022-03-03" #inst"2022-03-04")
       (map #(->> %
                  :data
                  (map get-attribute-name)))
       frequencies
       (sort-by (comp - second))))

The d/ident is necessary to make a transaction attribute readable: in the transaction's field :data the attribute is kept as an identifying integer, while its corresponding keyword is what I needed (to search our codebase).

Also note that I sort the frequencies in descending order, so that I can see the most transacted (and problematic) groups first.

Using Emacs and the fantastic Cider, I can just run cider-inspect at the end of that snippet (given I have a cider session running in that buffer) and inspect its result (after a little bit of waiting).

What you would see in Cider looks like the following (in cider I can click on text to inspect things further).

Class: clojure.lang.ArraySeq
Contents: 
  0. ( :db/txInstant :dlog.checkpoint/tx-t :dlog.checkpoint/tx-t :tx/system ) 259685
  1. ( :db/txInstant :attribute1 :attribute2 ) 69322
  2. ( :db/txInstant :some-attribute1 :some-attribute2 :some-attribute3 ... ) 59239
  3. ( :db/txInstant :some-other-attribute ) 41077
  ...
  Page size: 32, showing page: 1 of 33

In the original of this (highly-edited) example, I discovered that the some-other-attribute should not have been updated so often! A simple analysis like this, allows to significantly improve Datomic performance and so improve latency for users.

That's it! Simple no?

Conclusion

So after all your querying Datomic, keep an eye on its performance! You can query the Log API to discover the transactor's update patterns and improve your code.

Happy analysing!

Comments