Zilla Plus on Confluent Cloud: Aklivity Zilla Benchmark Series (Part 2)

Zilla Plus delivers near-zero latency overhead on Confluent Cloud at every operationally relevant percentile, and actively improves tail latency where it matters most.

Download PDF

Authors

Ankit Kumar

Team Aklivity

Introduction

In Part 1 of this benchmark series, we showed that Zilla Plus in front of a self-managed Apache Kafka cluster on AWS EC2 delivers up to 2x lower p99 latency compared to bare Kafka. Part 2 raises a harder question: what happens when you place Zilla Plus in front of Confluent Cloud, a fully managed, already-optimised Kafka service? Can a proxy still add value without adding meaningful overhead?

Our finding: At p99, Zilla Plus adds fewer than 3 ms on publish and 4 ms end-to-end. Beyond p99.9, the proxy actually reduces latency by 6–10%, smoothing out the tail spikes that Confluent Cloud alone exhibits under sustained load.

Why Confluent Cloud? & What Does Zilla Unlock?

Confluent Cloud is the most widely adopted managed Kafka offering.

Placing Zilla Plus in front of it unlocks capabilities the managed service alone does not provide:

Secure Public Access: expose Kafka topics via a hardened, externally reachable endpoint without opening cluster credentials to the internet.
Multi-Tenant Virtual Clusters: partition a single Confluent Cloud cluster into isolated virtual clusters per team, product, or customer.
Multi-Protocol Gateway: serve Kafka, MQTT, gRPC, HTTP/SSE, and WebSocket clients from the same broker without protocol adapters or connectors.
Fine-Grained Access Control: API key management, rate limiting, and topic-level RBAC at the edge, independent of Confluent's IAM.

The question is whether these capabilities come with a latency cost on top of an already-optimised managed service. This benchmark answers that directly.

Benchmark Setup

We use the same OpenMessaging Benchmark (OMB) framework and fork from Part 1, with Confluent Cloud replacing the self-managed Kafka cluster as the backend.

Deployment Configurations

Confluent Cloud (baseline): benchmark clients connect directly to Confluent Cloud brokers.
Zilla Plus + Confluent Cloud: benchmark clients connect through Zilla Plus instances on AWS EC2, which proxy traffic to the same Confluent Cloud cluster.

Infrastructure

Component	Detail
Kafka backend	Confluent Cloud, 2 CKU (Basic cluster)
Zilla Plus	AWS EC2 · m5.xlarge · 3 instances
Benchmark clients	AWS EC2 · m5.xlarge · 6 instances
Controller	AWS EC2 · m5.xlarge · 1 instance

Workload Configuration

Parameter	Value
Message size	2 KB
Target throughput	100 MB/s
Topics	45
Partitions per topic	200
Producers per topic	1
Consumer groups per topic	3 (1 consumer per group)
Acknowledgement mode	acks=all
Warm-up duration	10 minutes
Test duration	20 minutes

Results

Publish Latency

A positive Difference means Zilla adds latency relative to direct Confluent Cloud; a negative value means Zilla is faster.

Percentile	Confluent Cloud (ms)	Zilla Plus (ms)	Difference
p50	29.4	31.3	+1.9 ms (+6.5%)
p75	43.4	46.0	+2.6 ms (+6.1%)
p95	78.7	82.7	+4.0 ms (+5.0%)
p99	127.2	130.0	+2.8 ms (+2.2%)
p99.9	245.6	230.8	−14.8 ms (−6.0%) [Zilla 6.0% faster]
p99.99	414.6	374.6	−40.0 ms (−9.7%) [Zilla 9.7% faster]
Max	725.0	663.0	−62.0 ms (−8.6%) [Zilla 8.6% faster]

The overhead at p50 through p99 reflects the additional network hop through Zilla, and stays within 3 - 4 ms in absolute terms. The more significant story is the crossover at p99.9 and above: Zilla Plus absorbs the latency spikes that Confluent Cloud alone exhibits under sustained load, delivering 6 - 10% lower tail latency. Rather than amplifying tail behaviour, Zilla's lock-free, stateless data path actively dampens it.

Publish p99 latency over time

Both configurations track closely throughout the 20-minute window. Zilla Plus shows no progressive drift, confirming the proxy introduces no systematic temporal overhead. The Confluent Cloud line exhibits sharper periodic spikes while Zilla stays in a tighter band.

End-to-End Latency

Percentile	Confluent Cloud (ms)	Zilla Plus (ms)	Difference
p50	29.7	32.1	+2.4 ms (+8.3%)
p75	44.2	47.1	+2.9 ms (+6.7%)
p95	80.8	86.0	+5.2 ms (+6.4%)
p99	133.8	137.9	+4.1 ms (+3.0%)
p99.9	262.2	247.2	−15.0 ms (−5.7%) [Zilla 5.7% faster]
p99.99	444.3	404.5	−39.8 ms (−9.0%) [Zilla 9.0% faster]
Max	836.0	854.1	+18.1 ms (+2.2%)

The end-to-end pattern mirrors publish latency: a small overhead at the median and p99, followed by a clear advantage at the tail. At p99.9, Zilla Plus is 15 ms faster end-to-end than direct Confluent Cloud.

End-to-end p99 latency over time

Confluent Cloud alone exhibits sharper, more frequent spikes, some reaching 220 - 230 ms, while Zilla Plus consistently stays closer to the 120 - 150 ms band, reducing tail variance across the test window.

Key Observations

Zilla effective throughput is 2x the workload rate. As a bidirectional proxy, Zilla Plus handles both the inbound client traffic and the outbound broker traffic simultaneously. While the benchmark targets 100 MB/s, each Zilla instance is effectively moving ~200 MB/s of aggregate network traffic. This makes the latency results even more notable: Zilla is processing double the data volume while still matching or beating direct Confluent Cloud latency at the tail.

Zilla CPU utilisation is balanced and well within headroom. Across the three m5.xlarge Zilla Plus instances, CPU load balanced across all four cores at approximately 65%, leaving substantial room before saturation. Zilla's one-worker-per-core architecture ensures this balance by design.

Latency spikes in Confluent Cloud correlate with cluster CPU spikes. The periodic latency spikes visible in the Confluent Cloud time-series align directly with CPU load events in Confluent Cloud's cluster metrics.

Confluent Cloud limitation at the 30/60 scenario. When connecting directly to Confluent Cloud, we consistently observed significant performance degradation after approximately 39 minutes under sustained 100 MB/s load. This instability motivated the 10/20 scenario used throughout this benchmark.

Conclusion

	Publish Latency	End-to-End Latency
p99 overhead	+2.8 ms (+2.2%)	+4.1 ms (+3.0%)
p99.9 improvement	−14.8 ms (−6.0%)	−15.0 ms [Zilla 5.7% faster]
p99.99 improvement	−40.0 ms (−9.7%)	−39.8 ms [Zilla 9.0% faster]

Zilla Plus delivers near-zero latency overhead on Confluent Cloud at every operationally relevant percentile, and actively improves tail latency where it matters most. Teams can deploy Zilla Plus in front of Confluent Cloud to gain secure public access, multi-tenant virtual clusters, and multi-protocol support without sacrificing their latency profile, and in many cases improving it.