hughevans.dev

Get Kafka-Nated (Episode 2) Life of a Kafka contributor with Josep Prat

Check out the second episode of Get Kafka - Nated! I had a great time chatting with Josep Prat about his experiences as a Kafka Contributor and Apache Kafka PMC member. We covered everything from first open source contributions to how to properly manage a major open source project.

Tune in next time for a conversation with Greg Harris about KIP-1150 which introduces Diskless topics to Apache Kafka.

Original release

read more

How can I weigh things with a Raspberry Pi? - Using a HX711 ADC and load cell with a Raspberry Pi

An image of robins eating bird seed off a wii fit balance board with the caption meanwhile in suburban south london

There is a very large robin that often visits the bird feeder on my office window. It’s clear this robin is much heavier than other robins because when he lands the impact makes a loud thwack sound. I decided to see if I could build a simple setup to figure out exactly how heavy this robin is and in predictable fashion got carried away - this will be the first article in a three part series exploring: building a smart bird feeder than can weigh visiting birds, using AI to identify birds automatically, and bringing it all together with Kafka and Iceberg.

In order to get the weight of birds on my bird feeder I would need to add a load cell to the feeder platform. Whenever I’m building something like this I tend to start with a Raspberry Pi as that’s what I’m most familiar with, there’s a lot of great guides online on how to use Arduinos and other micro controllers with load cells and amplifiers but there isn’t a huge amount out there on Raspberry Pis other than this great tutorial from Tutorials for Raspberry Pi from several years ago. I was able to get a working setup with a cheap 5kg rated load cell and HX711 ADC as explained in the tutorial but I encountered few snags along the way so I thought in addition to documenting my bird feeder project I would write and updated version of the Tutorials for Raspberry Pi guide to help anyone else looking to work with load cells and the Raspberry Pi.

The below guide will talk you through step by step everything you need to do to weigh an object up to 5kg in weight with a Raspberry Pi including selecting components, assembly, and calibration.


What is an HX711?

First though, why do we need an HX711 at all? Load cells convert forces applied to them into analog electrical signals via strain gauges (resistors that change their resistance when bent or stretched) that we can use to measure weight but these signals are both analog and too small to be detected by the Raspberry Pis GPIO (General Purpose Input Output) pins. The HX711 is an ADC (Analog to Digital Convertor) which takes the weak analog signal from the load cell and outputs a digital signal (as a 24bit integer) the Raspberry Pi can read.

HX711 converts analog signals to digital signals


Hardware Setup

Setting up your HX711 will require some soldering, don’t worry if you’ve not done soldering before this is a particularly simple soldering job (even I could do it!) If you follow the method I used you’ll need to cut and drill the some parts to install your load cell - if you’d rather not do this you can buy a load cell and HX711 kit with these parts pre-made, for example this kit with laser cut wooden sheets with mounting holes. If you already have a soldering iron all the parts for this project new should set you back no more than £85 but you could save a fair bit if you pick up the Raspberry Pi second hand (or already have one laying around) and scavenge your bolts and rigid sheets rather than buying them new.

Hardware shopping list

Tools

Essential:

Handy for shaping your rigid sheets and making mounting holes:

Setup

1) Cut your sheets to size and drill two holes in each sheet to attach the load cell and bolt the load cell into place. Your sheets should look something like the diagram below with the holes for mounting the top sheet roughly centered and the hole for mounting the base towards the edge:

A rough schematic of what your rigid sheets should look like with mounting holes

The positioning of the holes is important! We want one end of the load cell to be centered roughly on the middle sheet so the arrow on the end is oriented correctly.

2) Bolt the load cell sandwiched between both rigid sheets as in the diagram below. You may need to add some washers between the load cell and the rigid sheets to stop the strain gauges in the white blob around the middle from getting pinched when weight is added to the top sheet - only the mounting surfaces of the load cell should make contact with the rigid sheets.

Diagram showing how the scale should be assembled with the load cell sandwiched between both rigid plates, the end with the arrow in the center pointed down, and washers between the load cell and rigid sheets

If everything is assembled correctly each of the rigid sheets should be parallel to the load cell, if things are askew or the rigid sheets are resting on the epoxy in the middle of the load cell which covers the strain gauges try adding more washers between the load cell and the rigid sheet to free things up.

3) Solder the leads from the load cell to the correct pads on the HX711 as follows: Red to E+, Black to E-, Green to A-, and White to A+ (the pins labeled B+/B- remain empty).

A schematic showing a red wire connected to E+, a black wire connected to E-, a green wire to A_, and a white wire to A+ on the terminal of a HX711

4) Cut off a 4 pin long strip of the included headers, press them short end first into the holes in the board marked GND, DT, SCK, and VCC and solder them from the reverse of the board. This can be fiddly! I usually use a big blob of Blu Tack to hold my headers in place when soldering them but anything that can hold the headers square (i.e. a second pair of hands!) can be really helpful here.

A diagram showing how to correctly orient the headers with the black part and long pins on the top of the board and the short pins poking through the holes

5) Tear off a strip of four female to female dupont wires, (keeping the four stuck together can help keep your wiring tidy but it can help to tease the ends apart a bit to make it easier to plug them into your headers) and use them to connect the headers on the HX711 to the headers on your Raspberry Pi as follows: VCC to Raspberry Pi Pin 2 (5V),GND to Raspberry Pi Pin 6 (GND), DT to Raspberry Pi Pin 29 (GPIO 5), and SCK to Raspberry Pi Pin 31 (GPIO 6). The pin out of your Raspberry Pi may vary slightly depending on model, for reference check out this awesome resource over on pinout.xyz.

A diagram showing how to correctly connect hx711 to the raspberry pi with dupont connectors

5) Flash your SD card and setup your Raspberry Pi. For instructions on how to do this properly check out this guide on the Raspberry Pi website.

6) Get the library we need to control the HX711 with Python and navigate into the directory:

git clone https://github.com/tatobari/hx711py
cd hx711py

7) Finally, we’re ready to calibrate the load cell. Create a script called calibration.py with the following code and run it:

import time
import RPi.GPIO as GPIO
from hx711 import HX711

# Setup HX711
hx = HX711(5, 6)
hx.set_reading_format("MSB", "MSB")
hx.set_reference_unit(1)
hx.reset()
hx.tare()

# Configuration
num_samples = 15

print(f"Place known weight on scale and enter it's weight in grams:",end="")
known_weight = int(input())

# Collect samples
print("Collecting samples...")
samples = []
for i in range(num_samples):
    reading = hx.get_weight(1)
    samples.append(reading)
    print(f"{i+1}: {reading}")
    time.sleep(0.2)

# Remove outliers (simple method: remove top and bottom 20%)
samples.sort()
clean_samples = samples[3:-3]  # Remove 3 highest and 3 lowest

# Calculate reference unit
average = sum(clean_samples) / len(clean_samples)
reference_unit = average / known_weight

print(f"\nAverage reading: {average:.1f}")
print(f"Reference unit: {reference_unit:.2f}")
print(f"\nAdd this to your script:")
print(f"hx.set_reference_unit({reference_unit:.2f})")

GPIO.cleanup()

When prompted add one of your calibration weight or your known weight to the top of your scale and enter the weight in grams in the script and hit enter:

Place known weight on scale and enter it's weight in grams:50

Keep a note of the reference unit, calculated as referenceUnit = longValueWithOffset / known_weight where longValueWithOffset is the 24bit integer reading from the HX711 minus the tare value.

Average reading: 20873.4
Reference unit: 417.47

Add this to your script:
hx.set_reference_unit(417.47)

8) Remove your test weight from the scale and create a new script with the code below called scale.py (update the reference unit with the value from the step above).

import time
import RPi.GPIO as GPIO
from hx711 import HX711

# Setup HX711
hx = HX711(5, 6)
hx.set_reading_format("MSB", "MSB")
hx.set_reference_unit(417.47)  # Use your calculated reference unit here
hx.reset()
hx.tare()

print("Scale ready! Place items to weigh...")
print("Press Ctrl+C to exit")

try:
    while True:
        weight = hx.get_weight(3)  # Average of 3 readings
        print(f"Weight: {weight:.1f}g")
        
        hx.power_down()
        hx.power_up()
        time.sleep(0.5)
        
except KeyboardInterrupt:
    print("\nExiting...")
    GPIO.cleanup()

Run the script and add the test weight again, you should see it’s weight accurately reported in grams.

read more

Get Kafka-Nated Podcast (Episode 1) Apache Kafka®'s Evolution - 14 Yrs of Streaming

Check out the first episode of Get Kafka - Nated! Filip Yonov and I had a great chat exploring everything from Kafka’s journey from on-prem to the cloud, this years major Kafka improvement proposals, to what we’re excited about for the future of Kafka.

Tune in next time for a conversation with Josep Prat about life as a Kafka contributor.

Original release

read more

Getting Started with Diskless Kafka - A Beginner's Guide

Diskless topics are proposed in KIP-1150, which is currently under community review. The examples in this article use “Inkless”, Aiven’s implementation of KIP-1150 that lets you run it in production.

I joined Aiven as a Developer Advocate in May, shortly after the Kafka Improvement Proposal KIP-1150: Diskless Topics was announced, which reduces the total cost of ownership of Kafka by up to 80%! It was very exciting to join Aiven just as the streaming team were making this major contribution to open source but I wanted to take my time to understand the KIP before sharing my thoughts.

In this article I’ll share my first impressions of Diskless Kafka, walk you through a simple example you can use to experiment with Diskless, and highlight some of the great resources that are out there for learning about the topic. First though, what actually is Diskless Kafka?

What is Classic Kafka?

To understand Diskless Kafka, you first need to understand how Apache Kafka® works today. Kafka data is stored and replicated across multiple broker servers using local disks. A designated leader broker handles all writes to a given partition, while follower brokers maintain copies of the data. To ensure high availability Kafka clusters are often deployed with cross-zone replication, where data is duplicated across different cloud availability zones, but this creates a significant cost problem. Up to 80% of Kafka’s total cost of ownership comes from expensive cross-zone network traffic, with cloud providers like AWS charging per GB for data transfer between zones.

a diagram showing the flow of messages in a clasic kafka cloud deployment with the charges at az boundaries marked

What is Diskless Kafka?

Diskless Kafka fundamentally reimagines this architecture by delegating replication directly to object storage services like Amazon S3, eliminating the need for cross-zone disk replication entirely. Instead of storing user data on local broker disks, diskless topics write data straight to object storage, adopting a leaderless design where any broker can handle any partition. This is as opposed to tiered storage, which still relies on local disk replication for recent data before moving older segments to object storage.

a diagram showing the flow of messages in diskless kafka avoiding cross az charges by using object storage

The trade-off of Diskless is that reads and writes from object storage are slower than those from local disk, to mitigate this KIP-1150 has been engineered such that you can run both traditional low-latency topics (sub-100ms) and cost-optimized diskless topics (200-400ms) in the same cluster, allowing you to choose the right performance profile for each workload. KIP-1150 maintains all existing Kafka APIs and client compatibility. Many use cases tolerate higher latency that Diskless topics enable such as logging and are thus a natural fit, but some use cases like high frequency trading or gaming are latency critical.

a chart comparing the latency of classic kafka to diskless kafka

Another snag with “Diskless” Kafka is that the name is somewhat of a misnomer. While “Diskless” implies complete elimination of disk usage, brokers still require local disks for Kafka metadata, batch coordination, temporary operations like compaction, and optional caching. The term “Diskless” specifically refers to topic data storage for Diskless topics - user messages and logs that traditionally consume the vast majority of disk space and I/O resources. Therefore it’s more accurate to describe the changes in KIP-1150 as adding Diskless Topics within classic Kafka than creating a new “Diskless Kafka”.

TL;DR: Naming things is hard. Speaking of naming things -

What is Inkless Kafka?

Inkless is the name the team behind KIP-1150 gave to the temporary GitHub repository that contains the implementation KIP-1150, so you can use Diskless Kafka before it is merged into the Apache Kafka main branch. You can find the Inkless repo here.

a screenshot of the aiven inkless repo

Run Diskless Kafka Locally with Inkless and MinIO

When I first got hands on with Diskless I wanted to experiment with running it locally to see what made it tick. In order to run Inkless locally we also require object storage, I decided to use MinIO, a performant object store that you can deploy locally with a docker container. You can try running Diskless Kafka yourself by following the steps below:

git clone https://github.com/Aiven-Labs/diskless-docker-quickstart.git
cd diskless-docker-quickstart
docker compose up -d
docker compose ps
     # Inkless Storage Configuration
      - KAFKA_INKLESS_STORAGE_BACKEND_CLASS=io.aiven.inkless.storage_backend.s3.S3Storage
      - KAFKA_INKLESS_STORAGE_S3_PATH_STYLE_ACCESS_ENABLED=true
      - KAFKA_INKLESS_STORAGE_S3_BUCKET_NAME=kafka-diskless-data
      - KAFKA_INKLESS_STORAGE_S3_REGION=us-east-1
      - KAFKA_INKLESS_STORAGE_S3_ENDPOINT_URL=http://minio:9000
      - KAFKA_INKLESS_STORAGE_AWS_ACCESS_KEY_ID=minioadmin
      - KAFKA_INKLESS_STORAGE_AWS_SECRET_ACCESS_KEY=minioadmin
read more

From Radio Waves to Kafka Topics - Building a Real-Time Aircraft Data Pipeline

Flight radar talk photo collage

If you want to showcase real-time data architectures you need a data source that’s live, high-volume, varied, and messy enough to showcase real-world challenges. This is an issue I’ve run into several times over the last year whilst giving talks about real-time analytics using Kafka, Druid, ClickHouse, and Grafana in various combinations. You could use a data generator like ShadowTraffic but when trying to bring the sometimes dry topic of data engineering to life nothing beats real data. So when I’m building demos I’ve consistently turned to the same compelling dataset: ADS-B aircraft transmissions.

I was introduced to ADS-B (Automatic Dependent Surveillance–Broadcast) by my former colleague at Imply Hellmar Becker, and is one of the technologies aircraft use to relay data including their position, heading, and speed to air traffic controllers and to other aircraft. This creates a continuous stream of real-time data that’s publicly accessible and rich with analytical possibilities. The dataset perfectly illustrates the complexities of streaming analytics—it arrives at high velocity, contains mixed data types, requires deduplication and enrichment, and benefits from both real-time alerting and historical analysis.

What makes ADS-B particularly valuable for demonstrations is its combination of technical complexity and intuitive appeal. Everyone understands aircraft movement, making it easy to visualize concepts like windowing, aggregation, and anomaly detection. Yet underneath this accessibility lies genuine engineering challenges: handling bursty traffic patterns, dealing with incomplete or duplicate messages, and correlating position data with aircraft metadata.

In this article, I’ll walk through building a complete ADS-B ingestion pipeline—from setting up a simple antenna to producing clean, structured data to Kafka topics ready for real-time analysis. By the end, you’ll have both the technical foundation and a rich dataset to explore your own streaming analytics architectures.


Understanding ADS-B Data

Flight radar 24 gif

ADS-B transmissions use a standardized message format called SBS (BaseStation format), which arrives as comma-separated text lines. Each message contains different types of aircraft information, for example:

Position Messages (MSG,3): Location, altitude, and identification data

MSG,3,1,1,40756A,1,2025/06/01,17:42:30.733,2025/06/01,17:42:30.776,,35000,,,40.1234,-74.5678,,,0,0,0,0

Velocity Messages (MSG,4): Speed, heading, and vertical rate

MSG,4,1,1,40756A,1,2025/06/01,17:42:31.233,2025/06/01,17:42:31.276,,,450,275,,,256,,,,,0

ADS-B data has a high data velocity with anywhere from 100 to 2000 messages a second produced by a receiver depending on location. There are some problems with ADS-B data that present a barrier to real time analytics with this data: the data contains duplicate messages because the same aircraft can be tracked by multiple receivers (as many as 20-30% of messages will be duplicates), the are missing fields because not all messages contain complete information, and traffic varies by time of day and geographic location.

This real-world messiness makes ADS-B data perfect for demonstrating streaming analytics challenges like de-duplication, windowing, and real-time aggregation.


Hardware Setup and Data Collection

You can be receiving live ADS-B data for around £95 (or less, if you already have some of these parts or can pick them up second hand) and have it running in 15 minutes. Here’s my exact setup that’s been reliably collecting ADS-B data for months.

Hardware shopping list

Setup

1) Install a supported OS on your Pi, I’m using a lite version (without a UI) of the official Debian Bookworm build, for details on how to do this follow the steps in the guide on the Raspberry Pi website.

2) Install Docker on your Pi and add your user to the docker group to run docker without sudo. Important: Log out and back in for group changes to take effect.

curl -sSL https://get.docker.com | sh
sudo usermod -aG docker pi
# Log out and back in for group changes to take effect

3) Create a new Docker compose called docker-compose.yml and define an ultrafeeder services as below. Note: this is a very basic ultrafeeder configuration, you may wish to consult the setup guide in the ADS-B Ultrafeeder repo for a more in depth guide to setting up this part.

services:
  ultrafeeder:
    image: ghcr.io/sdr-enthusiasts/docker-adsb-ultrafeeder
    container_name: ultrafeeder
    restart: unless-stopped
    device_cgroup_rules:
      - "c 189:* rwm"
    ports:
      - 8080:80     # Web interface
      - 30003:30003 # SBS output (for Kafka)
    environment:
      - READSB_DEVICE_TYPE=rtlsdr
      - READSB_RTLSDR_DEVICE=00000001  # Usually 00000001
      - READSB_LAT=51.4074             # Your antenna latitude
      - READSB_LON=-0.1278             # Your antenna longitude
      - READSB_ALT=52                  # The altitude of your antenna
    volumes:
      - /dev/bus/usb:/dev/bus/usb

4) Deploy your ultrafeeder services:

docker-compose up -d

5) Optional: Add FlightRadar24 Integration

Flight radar 24

Adding FR24 gives you two immediate benefits: a professional flight tracking interface and confirmation that your data quality meets commercial standards. Plus, contributing data gets you free access to FR24’s premium features. Register via the flight radar site to get your sharing key, you should then be able to find your key in your Account Settings under “My data sharing”.

Add the flight radar feed service to your docker compose to start sending data to FR24.

# Add to existing services
  fr24feed:
    image: ghcr.io/sdr-enthusiasts/docker-flightradar24:latest
    container_name: fr24feed
    restart: always
    ports:
      - 8754:8754
    environment:
      - BEASTHOST=ultrafeeder
      - FR24KEY={your flight radar 24 key}
    dns_search: . # prevents rare connection issues related to a bug in docker and fr24feed

Redeploy with:

docker-compose up -d

Once setup your station should appear on their coverage map within 10-15 minutes.

6) Validate ADS-B Data Reception

Test that you are receiving ADS-B data correctly:

nc localhost 30003

You should see continuous messages like:

MSG,8,1,1,40756A,1,2025/06/01,17:42:30.733,2025/06/01,17:42:30.776,,,,,,,,,,,,0
MSG,3,1,1,40756A,1,2025/06/01,17:42:33.003,2025/06/01,17:42:33.015,,35000,,,40.1234,-74.5678,,,0,0,0,0
MSG,4,1,1,40756A,1,2025/06/01,17:42:35.120,2025/06/01,17:42:35.156,,,450,275,,,256,,,,,0

If your antenna has a good view of the sky you can expect around 100-2000 messages/second (depending on your location) with CPU usage sitting comfortably under 20% on a Pi 3.

Quick Troubleshooting

No aircraft? Check your antenna USB connection:

lsusb | grep RTL

You should see something like:

Bus 001 Device 033: ID 0bda:2838 Realtek Semiconductor Corp. RTL2838 DVB-T

If not, your antenna may not be connected correctly. Verify your antenna connection is secure or try different USB port (preferably USB 2.0+) and try restarting ultrafeeder:

docker-compose down && docker-compose up -d

Tracking very few aircraft? Try placing your antenna higher and away from electronics, for best results try and get an unobstructed view of the sky.


Kafka Integration

Now that we have ADS-B data streaming on port 30003 let’s produce it to Kafka to allow us to work with it as an event stream. We’ll add Kafka to our Docker stack and build a producer that can handle thousands of aircraft updates per second.

read more