I’ve been taking a closer look at Apache Pulsar and how it relates to Apache Kafka. In case you are curious, here are ten of my findings:

  1. Pulsar’s brokers are stateless. The state is kept in a separate storage layer (Apache BookKeeper). This means you can leverage a new broker without the need to re-partition existing data, which is required by Kafka.

  2. Pulsar’s storage layer is organized into segments which are spread across all storage nodes. Segments can be written to the main storage or off-loaded to a different type of storage. This allows Pulsar to offer tiered storage which Kafka does not support yet.

  3. For replication, Pulsar uses a quorum-based algorithm, as opposed to a leader/follower-based approach in Kafka. The guarantees are the same, but the quorum approach tends to yield lower and more consistent latencies.

  4. Pulsar includes support for multi-tenancy which allows multiple user groups to share the same cluster, either via access control, or in entirely different namespaces. In Kafka, this is still under discussion.

  5. Pulsar offers full end-to-end encryption from the client to the storage nodes. Kafka currently does not have end-to-end encryption.

  6. Pulsar speaks other protocols such as RabbitMQ, AMQP, or even Kafka (!) which makes it easy to integrate Pulsar with existing applications. Further, there is support for Presto.

  7. Pulsar Functions is a way to do lightweight stream processing on top of Pulsar, conceptually similar to Kafka Streams. What I found interesting is that Pulsar’s functions are directly deployed on the broker nodes, whereas Kafka’s streams run as separate applications.

  8. The Pulsar community has been very open about the limitations of Pulsar Functions, e.g. state management and DAG flows. In case Pulsar Functions doesn’t do it for you, there is an actively maintained Pulsar <> ApacheFlink connector.

  9. It’s not all sunshine and rainbows: Pulsar requires two systems: Apache BookKeeper and Apache Zookeeper. Kafka just requires Zookeeper. More systems could increase the operational complexity. On the other hand, it’s also the reason why Pulsar provides additional flexibility.

  10. Pulsar is not new. It was originally developed and used at Yahoo, later donated to the Apache Software Foundation in 2016. It’s used by Tencent, Splunk, and many others at large scale.

Obviously, this is not a full comparison of Apache Pulsar and Apache Kafka, but rather a compilation of the things I was surprised to find out about Pulsar, coming from the Kafka landscape.

Join the conversation on Twitter.