hughevans.dev

Get Kafka-Nated Podcast (Episode 1) Apache Kafka®'s Evolution - 14 Yrs of Streaming

Check out the first episode of Get Kafka - Nated! Filip Yonov and I had a great chat exploring everything from Kafka’s journey from on-prem to the cloud, this years major Kafka improvement proposals, to what we’re excited about for the future of Kafka.

Tune in next time for a conversation with Josep Prat about life as a Kafka contributor.

Original release

read more

Getting Started with Diskless Kafka - A Beginner's Guide

Diskless topics are proposed in KIP-1150, which is currently under community review. The examples in this article use “Inkless”, Aiven’s implementation of KIP-1150 that lets you run it in production.

I joined Aiven as a Developer Advocate in May, shortly after the Kafka Improvement Proposal KIP-1150: Diskless Topics was announced, which reduces the total cost of ownership of Kafka by up to 80%! It was very exciting to join Aiven just as the streaming team were making this major contribution to open source but I wanted to take my time to understand the KIP before sharing my thoughts.

In this article I’ll share my first impressions of Diskless Kafka, walk you through a simple example you can use to experiment with Diskless, and highlight some of the great resources that are out there for learning about the topic. First though, what actually is Diskless Kafka?

What is Classic Kafka?

To understand Diskless Kafka, you first need to understand how Apache Kafka® works today. Kafka data is stored and replicated across multiple broker servers using local disks. A designated leader broker handles all writes to a given partition, while follower brokers maintain copies of the data. To ensure high availability Kafka clusters are often deployed with cross-zone replication, where data is duplicated across different cloud availability zones, but this creates a significant cost problem. Up to 80% of Kafka’s total cost of ownership comes from expensive cross-zone network traffic, with cloud providers like AWS charging per GB for data transfer between zones.

a diagram showing the flow of messages in a clasic kafka cloud deployment with the charges at az boundaries marked

What is Diskless Kafka?

Diskless Kafka fundamentally reimagines this architecture by delegating replication directly to object storage services like Amazon S3, eliminating the need for cross-zone disk replication entirely. Instead of storing user data on local broker disks, diskless topics write data straight to object storage, adopting a leaderless design where any broker can handle any partition. This is as opposed to tiered storage, which still relies on local disk replication for recent data before moving older segments to object storage.

a diagram showing the flow of messages in diskless kafka avoiding cross az charges by using object storage

The trade-off of Diskless is that reads and writes from object storage are slower than those from local disk, to mitigate this KIP-1150 has been engineered such that you can run both traditional low-latency topics (sub-100ms) and cost-optimized diskless topics (200-400ms) in the same cluster, allowing you to choose the right performance profile for each workload. KIP-1150 maintains all existing Kafka APIs and client compatibility. Many use cases tolerate higher latency that Diskless topics enable such as logging and are thus a natural fit, but some use cases like high frequency trading or gaming are latency critical.

a chart comparing the latency of classic kafka to diskless kafka

Another snag with “Diskless” Kafka is that the name is somewhat of a misnomer. While “Diskless” implies complete elimination of disk usage, brokers still require local disks for Kafka metadata, batch coordination, temporary operations like compaction, and optional caching. The term “Diskless” specifically refers to topic data storage for Diskless topics - user messages and logs that traditionally consume the vast majority of disk space and I/O resources. Therefore it’s more accurate to describe the changes in KIP-1150 as adding Diskless Topics within classic Kafka than creating a new “Diskless Kafka”.

TL;DR: Naming things is hard. Speaking of naming things -

What is Inkless Kafka?

Inkless is the name the team behind KIP-1150 gave to the temporary GitHub repository that contains the implementation KIP-1150, so you can use Diskless Kafka before it is merged into the Apache Kafka main branch. You can find the Inkless repo here.

a screenshot of the aiven inkless repo

Run Diskless Kafka Locally with Inkless and MinIO

When I first got hands on with Diskless I wanted to experiment with running it locally to see what made it tick. In order to run Inkless locally we also require object storage, I decided to use MinIO, a performant object store that you can deploy locally with a docker container. You can try running Diskless Kafka yourself by following the steps below:

git clone https://github.com/Aiven-Labs/diskless-docker-quickstart.git
cd diskless-docker-quickstart
docker compose up -d
docker compose ps
     # Inkless Storage Configuration
      - KAFKA_INKLESS_STORAGE_BACKEND_CLASS=io.aiven.inkless.storage_backend.s3.S3Storage
      - KAFKA_INKLESS_STORAGE_S3_PATH_STYLE_ACCESS_ENABLED=true
      - KAFKA_INKLESS_STORAGE_S3_BUCKET_NAME=kafka-diskless-data
      - KAFKA_INKLESS_STORAGE_S3_REGION=us-east-1
      - KAFKA_INKLESS_STORAGE_S3_ENDPOINT_URL=http://minio:9000
      - KAFKA_INKLESS_STORAGE_AWS_ACCESS_KEY_ID=minioadmin
      - KAFKA_INKLESS_STORAGE_AWS_SECRET_ACCESS_KEY=minioadmin
read more

From Radio Waves to Kafka Topics - Building a Real-Time Aircraft Data Pipeline

Flight radar talk photo collage

If you want to showcase real-time data architectures you need a data source that’s live, high-volume, varied, and messy enough to showcase real-world challenges. This is an issue I’ve run into several times over the last year whilst giving talks about real-time analytics using Kafka, Druid, ClickHouse, and Grafana in various combinations. You could use a data generator like ShadowTraffic but when trying to bring the sometimes dry topic of data engineering to life nothing beats real data. So when I’m building demos I’ve consistently turned to the same compelling dataset: ADS-B aircraft transmissions.

I was introduced to ADS-B (Automatic Dependent Surveillance–Broadcast) by my former colleague at Imply Hellmar Becker, and is one of the technologies aircraft use to relay data including their position, heading, and speed to air traffic controllers and to other aircraft. This creates a continuous stream of real-time data that’s publicly accessible and rich with analytical possibilities. The dataset perfectly illustrates the complexities of streaming analytics—it arrives at high velocity, contains mixed data types, requires deduplication and enrichment, and benefits from both real-time alerting and historical analysis.

What makes ADS-B particularly valuable for demonstrations is its combination of technical complexity and intuitive appeal. Everyone understands aircraft movement, making it easy to visualize concepts like windowing, aggregation, and anomaly detection. Yet underneath this accessibility lies genuine engineering challenges: handling bursty traffic patterns, dealing with incomplete or duplicate messages, and correlating position data with aircraft metadata.

In this article, I’ll walk through building a complete ADS-B ingestion pipeline—from setting up a simple antenna to producing clean, structured data to Kafka topics ready for real-time analysis. By the end, you’ll have both the technical foundation and a rich dataset to explore your own streaming analytics architectures.


Understanding ADS-B Data

Flight radar 24 gif

ADS-B transmissions use a standardized message format called SBS (BaseStation format), which arrives as comma-separated text lines. Each message contains different types of aircraft information, for example:

Position Messages (MSG,3): Location, altitude, and identification data

MSG,3,1,1,40756A,1,2025/06/01,17:42:30.733,2025/06/01,17:42:30.776,,35000,,,40.1234,-74.5678,,,0,0,0,0

Velocity Messages (MSG,4): Speed, heading, and vertical rate

MSG,4,1,1,40756A,1,2025/06/01,17:42:31.233,2025/06/01,17:42:31.276,,,450,275,,,256,,,,,0

ADS-B data has a high data velocity with anywhere from 100 to 2000 messages a second produced by a receiver depending on location. There are some problems with ADS-B data that present a barrier to real time analytics with this data: the data contains duplicate messages because the same aircraft can be tracked by multiple receivers (as many as 20-30% of messages will be duplicates), the are missing fields because not all messages contain complete information, and traffic varies by time of day and geographic location.

This real-world messiness makes ADS-B data perfect for demonstrating streaming analytics challenges like de-duplication, windowing, and real-time aggregation.


Hardware Setup and Data Collection

You can be receiving live ADS-B data for around £95 (or less, if you already have some of these parts or can pick them up second hand) and have it running in 15 minutes. Here’s my exact setup that’s been reliably collecting ADS-B data for months.

Hardware shopping list

Setup

1) Install a supported OS on your Pi, I’m using a lite version (without a UI) of the official Debian Bookworm build, for details on how to do this follow the steps in the guide on the Raspberry Pi website.

2) Install Docker on your Pi and add your user to the docker group to run docker without sudo. Important: Log out and back in for group changes to take effect.

curl -sSL https://get.docker.com | sh
sudo usermod -aG docker pi
# Log out and back in for group changes to take effect

3) Create a new Docker compose called docker-compose.yml and define an ultrafeeder services as below. Note: this is a very basic ultrafeeder configuration, you may wish to consult the setup guide in the ADS-B Ultrafeeder repo for a more in depth guide to setting up this part.

services:
  ultrafeeder:
    image: ghcr.io/sdr-enthusiasts/docker-adsb-ultrafeeder
    container_name: ultrafeeder
    restart: unless-stopped
    device_cgroup_rules:
      - "c 189:* rwm"
    ports:
      - 8080:80     # Web interface
      - 30003:30003 # SBS output (for Kafka)
    environment:
      - READSB_DEVICE_TYPE=rtlsdr
      - READSB_RTLSDR_DEVICE=00000001  # Usually 00000001
      - READSB_LAT=51.4074             # Your antenna latitude
      - READSB_LON=-0.1278             # Your antenna longitude
      - READSB_ALT=52                  # The altitude of your antenna
    volumes:
      - /dev/bus/usb:/dev/bus/usb

4) Deploy your ultrafeeder services:

docker-compose up -d

5) Optional: Add FlightRadar24 Integration

Flight radar 24

Adding FR24 gives you two immediate benefits: a professional flight tracking interface and confirmation that your data quality meets commercial standards. Plus, contributing data gets you free access to FR24’s premium features. Register via the flight radar site to get your sharing key, you should then be able to find your key in your Account Settings under “My data sharing”.

Add the flight radar feed service to your docker compose to start sending data to FR24.

# Add to existing services
  fr24feed:
    image: ghcr.io/sdr-enthusiasts/docker-flightradar24:latest
    container_name: fr24feed
    restart: always
    ports:
      - 8754:8754
    environment:
      - BEASTHOST=ultrafeeder
      - FR24KEY={your flight radar 24 key}
    dns_search: . # prevents rare connection issues related to a bug in docker and fr24feed

Redeploy with:

docker-compose up -d

Once setup your station should appear on their coverage map within 10-15 minutes.

6) Validate ADS-B Data Reception

Test that you are receiving ADS-B data correctly:

nc localhost 30003

You should see continuous messages like:

MSG,8,1,1,40756A,1,2025/06/01,17:42:30.733,2025/06/01,17:42:30.776,,,,,,,,,,,,0
MSG,3,1,1,40756A,1,2025/06/01,17:42:33.003,2025/06/01,17:42:33.015,,35000,,,40.1234,-74.5678,,,0,0,0,0
MSG,4,1,1,40756A,1,2025/06/01,17:42:35.120,2025/06/01,17:42:35.156,,,450,275,,,256,,,,,0

If your antenna has a good view of the sky you can expect around 100-2000 messages/second (depending on your location) with CPU usage sitting comfortably under 20% on a Pi 3.

Quick Troubleshooting

No aircraft? Check your antenna USB connection:

lsusb | grep RTL

You should see something like:

Bus 001 Device 033: ID 0bda:2838 Realtek Semiconductor Corp. RTL2838 DVB-T

If not, your antenna may not be connected correctly. Verify your antenna connection is secure or try different USB port (preferably USB 2.0+) and try restarting ultrafeeder:

docker-compose down && docker-compose up -d

Tracking very few aircraft? Try placing your antenna higher and away from electronics, for best results try and get an unobstructed view of the sky.


Kafka Integration

Now that we have ADS-B data streaming on port 30003 let’s produce it to Kafka to allow us to work with it as an event stream. We’ll add Kafka to our Docker stack and build a producer that can handle thousands of aircraft updates per second.

read more

What do community organisers need? Not another pizza.

stack of pizza boxes

Part 1 : If I never see another pizza again, it’ll be too soon.

The year is 2022, I am a know-nothing twenty-something cloud consultant. ChatGPT has just been launched and the company I work for has created a brand-new practice with the business to focus on AI, and a colleague approaches me to see if I would be interested in helping to host the AI meetup the company has just adopted.

Now three years later I’m still a know-nothing twenty-something, but now with a little more experience under my belt. I’ve organised 22 AI and Deep Learning for Enterprise events, been a committee member for three conferences, and had a hand in helping to host and organise several other tech focused events in the capital. Participation in these communities has led me to all kinds of interesting places like speaking at events, learning new skills, running AV, and doing video editing, it’s even got me a job or two. Community organising has also left me more burnt out than I’ve ever been in my life.

Organising and hosting community events (particularly solo as I have done on occasion) is exhausting work. If you are putting on an event once a month, which for a while is what we were doing at AIDLE, the work is essentially unending. As soon as we finish hosting one event we immediately move on to planning the next (if we hadn’t started already). Hosting can also be very physically demanding: moving furniture, running to grab stacks of beer and soft drinks, and lugging an AV kit on the tube on top of the late nights can leave you physically exhausted - particularly whilst also having to put on a friendly face and look after attendees and speakers.

Needless to say, I would not still be involved in organising events without a lot of support. Support from other organisers who’ve given advice, promoted and collaborated on events, from friends who have come along to help out running AV and taking pictures, and from sponsors and hosts who provide funding and venues. Co-organisers are perhaps the greatest source of support, as the difference in the feeling of doing something alone versus being part of a team cannot be overstated. Something I wish I had starting out was a way to find other organisers to collaborate with and share resources, I’ve met some great organisers over the last three years, but it took a long time.


a swag table at a Grafana x Confluent event

Part 2: It was never about the Pizza (sorry)

After the last three years of being in technical communities both as an organiser and as a regular member, I’ve eaten a LOT of pizza. I used to really like Pizza, but now not so much, and I know I’m not alone in this. Not Another Pizza is a fun little name for a group for community organisers, but (and you’ll have to forgive me for this) the pizza isn’t pizza, it’s an analogy.

Pizza is a stand in for all the bare minimum requirements of hosting an event: the food yes, but also venues, AV, community event platforms (on which I have thoughts), and so on. Of course community organisers need these things, but often as an organiser you’ll find yourself reaching for something more.

As an organiser I’ve benefited greatly from meeting collaborators, being given advice on tracking community growth and retention, tools to help automate away the drudgery of managing guest lists, resources on managing codes of conduct and code of conduct violations, and maybe a sympathetic ear from someone from time to time. Think soul food, not cold pizza.

Some of the biggest challenges I ran into early on in organising were burnout, managing relationships with sponsors, and finding speakers.

I helped beat my organiser burn out by both cutting down on the frequency of events I was organising and attending, but also by avoiding solo efforts where possible. It’s not about shifting the work on to other people but about working together as part of a team - helping other people tackle their obstacles as much as they help you tackle yours. For Developer advocates this may be a familiar feeling, being the glue between different people helps make you a very useful colleague but also means people are more willing to help you on when you need help.

Finding speakers was made easier by emulating the best practices of other groups and organisers, for example: opening a call for papers so that speakers could apply to speak at AIDLE rather than having to constantly go out chasing them. This was a lesson I only learned because I was attending other peoples events, asking for advice, and shifting my perspective by becoming a speaker myself.

So, the vision for Not Another Pizza is a support network for dev advocates, community managers, and event organisers that can help provide some of these things. There are some great communities and resources out there already doing this like the Developer Marketing Alliance, but I’ve found that resources and collaboration opportunities can be very dependent on geography, so I wanted to set something up more focused on organising technical communities in London and the UK in general.

The London community landscape is highly concentrated around the City, with many smaller communities relying on access to free or subsidised event spaces rather than paying through the nose for renting event spaces. It feels like we are also rapidly approaching “peak meetup” with several events on the same topic on any given night, with 4 different AI and data meetups on the same night as the last AIDLE meetup alone. I don’t see the large number of organisers in London as a problem though - we as organisers don’t need to compete for limited resources and venues if instead organisers and groups collaborate to pool resources, shouldering the work of organising events together. London then is the perfect environment for building this kind of meta-community of community organisers.

My hope is that Not Another Pizza can foster genuine connections between organisers and serve to facilitate meaningful collaboration beyond being just another networking event.


apache druid summit 2024

Part 3: Something better than another pizza

It’s very early days yet, but some friends and I are working on building out our community by inviting other dev rel professionals and organisers to the Discord community and building out the website. We’ve also been building out a catalogue of projects on the site that members of the group have been working on, including tools made by community members designed to help make managing communities easier.

One project we’ve recently started working on is bridge, an API written in Python which will eventually enable the publishing and updating of events across multiple platforms simultaneously. I’ve previously spoken about how I think applying the POSSE (Post Own Site Syndicate Everywhere) content management strategy to community events could potentially help tackle the degradation of platforms like Meetup, and I hope bridge will help enable more organisers to experiment with that.

Going forward I’d love for us to create spaces for authentic collaboration by enabling organisers to share success stories of collaborations born through Not Another Pizza, providing match-making for mentorship opportunities, hosting a library of useful resources and content, and to become a really great place to get help with community organising. Perhaps in future we could even have some in person events, who knows. For now our focus will be on growing the community and continuing development on projects like bridge. I think we’ll know if this community has been successful if we can see concrete examples of members meeting collaborators through Not Another Pizza and going on to run events.

If Not Another Pizza sounds like a useful resource you can join the discord community over at discord.notanother.pizza to meet other organisers who can help your community thrive. Don’t worry if you aren’t based in London either, there are members in the group all over the world and some community organiser experiences really do feel universal at times, like wanting nothing less than another slice of cold pizza.

read more

Does AI actually make you more productive? Speed running my job with Cursor.

Speed running my job with cursor thumbnail

Last week at TurinTech we had a workshop on agentic AI with an unusual challenge: complete a work task without writing any code manually; the goal was to see if we could boost our productivity using these tools. I’m fairly skeptical of using agentic tools at work - my concern was that I would spend as much (or even more) time vetting AI generated work than I would have doing them manually - even without considering that, could I really be productive with my hands tied behind my back?. I decided to try speedrunning a common task in my day to day work to see if I could really improve my productivity.

The Speedrun Setup

Cursor%

Any%

I specifically chose the task of adding an example to this repo as it is both a task I have to do frequently and a task I thought it would be simple for an agentic tool to complete as it just involves copying data from one place to another and reformatting it as a basic report. It was my hope that the simplicity of the sample task would help avoid an automation rabbit hole and give the agent the best possible chance to be genuinely useful.

Usually once I complete an optmisation or someone shares a PR with an optimisation with I manually copy the template directory, write a short summary of the project, copy the data across, and where appropriate add a demo which you can use to show the difference in peformance before and after the optimisation. For my speedrun I would see if I could complete the same task as a tool assisted speedrun using only Cursor without interacting with anything other than the terminal and chat.

The specific example I was documenting in both cases was a DataStax Langflow Optimisation by Jordan Frazier (see the PR here) who used our tool Artemis to improve the performance of the Langflow Python library.

read more