An introduction to Apache Kafka

The Reactive Manifesto states that responsive systems are resilient and elastic. One way to achieve this, is having our applications deployed multiple times next to each other.

In case one instance goes down, there will be other ones to take up its task, adding more resilience to the system. In case more processing power is necessary, more instances can be spinned up temporarily, to handle the extra workload and hence adding elasticity to our system.

Nowadays, one of the most popular ways to create this kind of architecture is through building microservices. Instead of building a monolith application, where our entire system’s code resides in one single application, we separate it into smaller applications on their natural and transactional boundaries. This enables us to have a number instances of the same microservice deployed based on the current requirements. In case our marketing microservices get heavily loaded, while our account microservices are running almost idle, we can run more marketing microservices while decreasing the number of account microservice instances.

Communication in Microservice architectures

Of course these microservices need a way to communicate with each other. The Reactive Manifesto suggests asynchronous messages to accomplish this. Using asynchronous messages, we can be sure that the message won’t be lost when a microservice goes down. It also enables us to distribute work or events between multiple microservices of the same type to achieve elasticity.

Evidently, this requires some kind of component in our system that handles this concept of messaging for us. This is called a message broker. This broker will act like some kind of “postal service” for us. It will receive and distribute messages within our microservice ecosystem and act like its messaging backbone. The message broker that we’ll be looking into here is Apache Kafka.

Introducting Apache Kafka

Apache Kafka is a well known distributed streaming platform. It’s designed to be linear scalable, fault-tolerant and to support low latency, high throughput messaging. Kafka is used to build real-time data pipelines and streaming applications. This means it can offer a lot in microservice architectures in terms of event and data distribution, eventual consistency, and can form the platform these microservices are built onto.

In a Reactive System, each component should be resilient and elastic. The messaging system should of course be no exception to this rule.

Kafka uses Topics where messages can be placed onto, for example a topic “unconfirmed-transactions” where unconfirmed transactions can be sent to. These unconfirmed transactions could then be read by an application that applies them. These topics each have a number of partitions, a subdivision within the Topic.

New messages are simply appended to the partition’s log, never updated. We can configure Kafka to keep these messages for a number of time – which can be infinite in case we want to keep these messages forever.

In Kafka consumers are subdivided into consumer groups (typically we’d use a consumer group for each different microservice). Each partition within a Topic can be consumed by one consumer concurrently, this means that we get an in-order processing guarantee within a single partition. In case we want to use more consumers for a topic, we add extra partitions into the equation.

Kafka achieves both resilience and elasticity through clustering. The different partitions are spread over the different instances within the cluster. This means that the Topics can grow in a linear fashion.

Kafka provides a number of API’s to help us build applications that use it as their messaging backbone.

  • The general Producer and Consumer API, which enables applications to publish and consume messages through Kafka topics.
  • The connector API, which enables linking data storage systems to the Kafka clusters through pull and push mechanisms, for example to feed a database.
  • The Kafka Streams API which enables building applications that transform and analyse real-time streams that run through Kafka. We’ll be diving deeper into this API in a future article.
  • There is also the Reactor Kafka library, which enables an integration of Kafka with Project Reactor. This means we can have a real-time streams of events running in from Kafka topics to use as Reactive Streams in our applications. We’ll be looking into this library in a future article as well.