LibraryLink ToToggle FramesPrintFeedback

Overview of KahaDB

The KahaDB message store is a file-based persistence adapter that is optimized for maximum performance. The main features of KahaDB are, as follows:

Example 2.1 shows the basic configuration of the KahaDB message store, where the KahaDB files are stored under the activemq-data directory and the maximum size of a journal file is limited to 32 megabytes.


By default, ActiveMQ uses the KahaDB message store to persist message data. The KahaDB message store is an embeddable, transactional message store that is fast and reliable. It is an evolution of the AMQ message store used by ActiveMQ 5.0 to 5.3. It uses a transactional journal to store message data and a B-tree index to store message locations for quick retrieval.

Figure 2.1 shows a high-level view of the KahaDB message store.


Messages are stored in file-based data logs. When all of the messages in a data log have been successfully consumed, the data log is marked as deletable. At a predetermined clean-up interval, logs marked as deletable are removed from the system.

[Note]Note

Message logs can also be archived.

An index of message locations is cached in memory to facilitate quick retrieval of message data. At configurable checkpoint intervals, the references are inserted into the metadata store.

Message data is cached in the broker using message cursors, where a cursor instance is associated with each destination (queue or topic). A message cursor represents a batch of messages cached in memory. When necessary, a message cursor will retrieve persisted messages through the persistence adapter. But the key point you need to understand about message cursors is that the cursors are essentially independent of the persistence layer. It is therefore possible to describe message cursors separately from the persistence layer—see Message Cursors for details.

The data logs are used to store data in the form of journals, where events of all kinds—such as messages, acknowledgments, subscriptions, subscription cancellations, transaction boundaries, and so on— are stored in a rolling log. Because new events are always appended to the end of the log, a data log file can be updated extremely rapidly.

Implicitly, the data logs contain all of the message data and all of the information about destinations, subscriptions, transactions, and so on; but this data is stored in a highly arbitrary manner. In order to facilitate rapid access to the content of the logs, it is essential to construct metadata to reference the data embedded in the logs.

The metadata cache is an in-memory cache consisting mainly of destinations and message references. That is, for each JMS destination, the metadata cache holds a tree of message references, giving the location of every message in the data log files. Each message reference maps a message ID to a particular offset in one of the data log files (there can be multiple data log files). The tree of message references is maintained using a B-tree algorithm, which enables rapid searching, insertion, and deletion operations on an ordered list of messages.

The metadata cache is periodically written to the metadata store on the file system. This procedure is known as checkpointing and the length of time between checkpoints is configurable using the checkpointInterval option. For details of how to configure the metadata cache, see Optimizing the Metadata Cache.

The metadata store contains the complete broker metadata, consisting mainly of a B-tree index giving the message locations in the data logs. The metadata store is written to a file called db.data, which is periodically updated from the metadata cache.

In fact, the metadata store duplicates data that is already stored in the data logs (in a raw, unordered form). The presence of the metadata store, however, enables the broker instance to restart rapidly. If the metadata store got damaged or was accidentally deleted, the broker could recover by reading the data logs, but the restart would then take a considerable length of time.