hughevans.dev

Get Kafka-Nated (Episode 5) Uber Staff Engineer and Kafka Tiered Storage Contributor Satish Duggana

Check out the latest episode of Get Kafka-Nated! I had a fantastic conversation with Satish D from Uber, who was a key contributor to KIP 405 that brought tiered storage to Kafka.

We explored what tiered storage actually is, the specific storage challenges at Uber that led to KIP 405, and the biggest technical hurdles in implementing such a fundamental change to Kafka. Satish shared insights on working with the Apache community process including why it can take so long to get major KIPs into open-source Kafka.

I really getting to interview another member of the community who was directly involved in architecting one of Kafka’s most significant features. Check out the recording below.

Original release

You can find all the past episodes of Get Kafka-Nated as well as Kafka news and technical deep dives over at getkafkanated.substack.com

read more

Get Kafka-Nated (Episode 4) Alex Merced Co Author of the Iceberg Definitive Guide

Check out the latest episode of Get Kafka-Nated! I had a fantastic conversation with Alex Merced, Head of DevRel at Dremio and co-author of “Apache Iceberg: The Definitive Guide,” about why Iceberg is becoming essential for streaming data architectures.

We explored how Apache Iceberg is blurring the lines between streaming and analytics, discussed the exciting new “Iceberg Topics” development that makes Kafka topics directly queryable as tables, and dove into real-world patterns for integrating Kafka with modern data lakehouses.

Alex shared insights from his work with enterprises adopting lakehouse architectures, explained how schema evolution works in streaming scenarios, and gave practical advice for teams looking to add Iceberg to their Kafka-based data pipelines.

Original release

You can find all the past episodes of Get Kafka-Nated as well as Kafka news and technical deep dives over at getkafkanated.substack.com

read more

Smart Bird Feeder Part 2 - How can I automatically identify bird species from an image? - Using Tensorflow and a webcam to spot birds

An image of robins eating bird seed with the text this robin weighs 26.3g

In the previous entry in this series I built a smart bird feeder that could weigh birds with the goal of figuring out how heavy a particularly portly looking robin was. This only got my part of the way to my goal of once and for all answering the question: is this an abnormally huge robin?

The next step is to collect pictures of birds that visit my bird feeder and automatically label them with the species to check to see if the image is of a Robin or not, this will let me track just the weights of Robins so I can easily spot any abnormally heavy birds.

The below guide will talk you through step by step everything you need to do to take a picture of a bird using a cheap webcam and a Raspberry Pi and then using an image classifier model to identify the bird species.


What is an image classifier model?

Why do we need an image classifier model at all? Our bird feeder can now weigh visiting birds, but weight alone doesn’t tell us the species: a 60g bird could be an enormous robin or a tiny pigeon. An image classifier model can analyze a photo from our webcam and automatically identify the bird species so we can track weights by species.

The model works by analyzing the mathematical patterns in the image data that distinguish one bird species from another. Rather than training our own model (which would require thousands of labeled bird photos), we’ll use a pre-trained model that already knows how to several British bird and non-bird species including:


Hardware Setup

If you tried setting up your own bird feeder from the first part of this series you’ll have everything you need already apart from the camera, if not you can get everything you need from the list below.

Hardware shopping list

Setup

1) Flash your SD card and setup your Raspberry Pi. For instructions on how to do this properly check out this guide on the Raspberry Pi website. Connect your webcam to a USB port on your Raspberry Pi.

Diagram showing webcam connected to USB port on a raspberry pi

2) Screw one of the suction cups into the threaded insert in your webcam - this will make it easy to position and adjust your webcam in your window.

Diagram showing a suction sub with a thread bolt being inserted into a threaded brass insert in the base of a webcam

3) Stick your webcam somewhere with a good view of your bird feeder, the closer the lens is to the glass the less glare you’ll have in your images. Camera positioning is crucial for accurate bird identification:

Remember that the model was trained on a variety of lighting conditions and angles, so don’t worry about getting perfect shots every time - even a blurry Robin in motion can classify correctly!

A diagram showing a view of a window from the outside with a webcam stuck facing a bird feeder

4) Now that we’ve got a nice little bird photo-booth set up we can start taking some pictures (if you’re following along from part 1 you can update your code to take a photo when a bird is detected see my source code on GitHub for reference), lets install OpenCV for capturing and processing pictures from the webcam.

python3 -m pip install opencv-python-headless==4.8.1.78

5) Create a new script called take_picture.py with the following Python code:

import os
import time
from datetime import datetime
import sys
import cv2

def take_photo():
    """Take a photo when a bird lands"""
    cap = cv2.VideoCapture(0)
    if cap.isOpened():
        # Let camera adjust
        for i in range(5):
            ret, frame = cap.read()
        ret, frame = cap.read()
        if ret:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"bird_{timestamp}.jpg"
            cv2.imwrite("./images/"+filename, frame)
            print(f"📸 Photo: {filename}")
        cap.release()

take_photo()

This script will take a picture and save it to the images directory, lets create that dir now and test our script out.

mkdir images
python take_picture.py

You should end up with a picture like the example below in the images dir (for those following on from part 1 your images will also include the weight measured when the photo was taken).

A picture of a robin on a bird feeder images/bird_20250801_120750.jpg

6) Now that we have an image of a bird we can use a classifier model to predict the species of the bird in the image.

A warning triangle with a broken raspberry pi in it

It is unlikely that your Raspberry Pi will be able to run the model due to how computationally intensive it can be to run - I suggest copying your images dir from the previous step to your laptop or more powerful computer! I tried running the model on my Raspberry Pi on a hot day and it got so hot it was permanently damaged, by default the Pi has no active cooling unlike your PC or laptop so this can be surprisingly easy to do.

For this we’ll use the pre-trained uk garden birds model from secretbatcave. Download the saved model (the .pb stands for ProtoBuff format) and the classes with:

mkdir models
curl -o models/ukGardenModel.pb https://raw.githubusercontent.com/secretbatcave/Uk-Bird-Classifier/master/models/ukGardenModel.pb
curl -o models/ukGardenModel_labels.txt https://raw.githubusercontent.com/secretbatcave/Uk-Bird-Classifier/master/models/ukGardenModel_labels.txt

7) Install tensorflow and its dependencies. Tensorflow is a software library for machine learning that was used to produce the model we’re working with here, we’ll use it now to run the model to make a bird species prediction.

pip install tensorflow "numpy<2"  protobuf==5.28.3

8) Create a new Python script called identify_bird.py with the following Python code:

import os
import sys
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

# Suppress warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.disable_v2_behavior()

# Load model
with tf.io.gfile.GFile('models/ukGardenModel.pb', 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    tf.import_graph_def(graph_def, name='')

# Load labels
with open('models/ukGardenModel_labels.txt', 'r') as f:
    labels = [line.strip() for line in f.readlines()]

# Read image
image_path = sys.argv[1]
with open(image_path, 'rb') as f:
    image_data = f.read()

# Run inference
with tf.Session() as sess:
    predictions = sess.run('final_result:0', {'DecodeJpeg/contents:0': image_data})
    bird_class = labels[np.argmax(predictions)]
    print(bird_class)

Note the use of tensorflow.compat.v1: this is an older model (from 7+ years ago) so we’re using the version 1 compatibility module rather than tensorflow to ensure everything works correctly (this is also why we’re using the "numpy<2" and protobuf==5.28.3 downgrades). There are better models out there but this one is lightweight, free to use, and does not require API access.

Lets try making a prediction with one of your photos to see if everything is working correctly:

python identify_bird.py images/bird_20250801_120750.jpg

You should see a result like:

WARNING:tensorflow:From /Users/hugh/test/.venv/lib/python3.13/site-packages/tensorflow/python/compat/v2_compat.py:98: disable_resource_variables (from tensorflow.python.ops.resource_variables_toggle) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1754516598.102893 5536073 mlir_graph_optimization_pass.cc:437] MLIR V1 optimization pass is not enabled
robin

You should see a predicted bird species on the last line of the output.

Quick Troubleshooting

hugh@bird:~/bird-kafka-demo $ lsusb
Bus 001 Device 004: ID 328f:003f EMEET HD Webcam eMeet C960
Bus 001 Device 003: ID 0424:ec00 Microchip Technology, Inc. (formerly SMSC) SMSC9512/9514 Fast Ethernet Adapter
Bus 001 Device 002: ID 0424:9514 Microchip Technology, Inc. (formerly SMSC) SMC9514 Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Try unplugging and reconnecting the USB cable or trying a different USB port. Some cameras need a moment to initialize after being plugged in. Check with the manufacturers website to see if your webcam requires any specific drivers to work with the Pi.

        for i in range(5): # Try increasing this value
            ret, frame = cap.read()

Conclusion and Next Steps

You now have two separate systems: one that detects and photographs birds (and weighs birds if you’re following on from part 1), and another that identifies species. These systems can’t run on the same hardware though because of the performance limitations of the Raspberry Pi and right now our workflow requires transferring the bird photos to our laptop periodically to run species identification. With this setup I now have some pictures of heavy robins but without storing and analyzing lots of examples of images of birds with species and weight labels I still can’t answer my original question of: is this robin abnormally heavy?

In the third and final entry in this bird feeder series I’ll use Kafka and Iceberg to bridge the gap between my laptop and the bird feeder, analyze all my collected data, and once and for all figure out just how heavy this Robin is.

read more

Get Kafka-Nated (Episode 3) Greg Harris on Contributing to KIP 1150 diskless

Check out the latest episode of Get Kafka-Nated! I had a fantastic conversation with Greg Harris about his work on KIP-1150 and life as an open source software engineer.

We dove deep into the technical architecture, explored the challenges of implementing this game-changing feature, and discussed what Diskless topics mean for the future of real-time data streaming.

Original release

You can find all the past episodes of Get Kafka-Nated as well as Kafka news and technical deep dives over at getkafkanated.substack.com

read more

Get Kafka-Nated (Episode 2) Life of a Kafka contributor with Josep Prat

Check out the second episode of Get Kafka - Nated! I had a great time chatting with Josep Prat about his experiences as a Kafka Contributor and Apache Kafka PMC member. We covered everything from first open source contributions to how to properly manage a major open source project.

Tune in next time for a conversation with Greg Harris about KIP-1150 which introduces Diskless topics to Apache Kafka.

Original release

read more