hughevans.dev

Data from Smart Bird Feeder Project

Update: After some feedback I manually relabeled my data and changed my approach slightly for better graphs

By popular request here’s all the data I collected from my smart bird feeder in histogram form:

This is my first time trying Marimo and I really liked using it for visualising this data, I’ve embedded the Marimo notebook below so you can explore the data further should you wish:

Go fullscreen here.

read more

Is Meetup Terrible or is *my* meetup terrible?

I’ve spent a lot of time and energy this year complaining about Meetup (at the CodeBar Festival Fringe, at PyCon UK, and on my blog). Meetup is becoming more expensive, less reliable, and generally ‘enshittified’ which came to a head at the beginning of this year when AI Signals, a community I organise through Meetup saw a marked drop in attendance.

The thesis of my “Meetup is Terrible Now” talk was that reducing reliance on Meetup could help with declining attendance and the burden of organising events - rather than just speculating about this though I actually followed my own advice to see if it would work in practice.

Meetup as a platform provides network effects that drive attendance, tooling for managing RSVPs and community membership, and has good SEO for discoverability via search (in part because of their capturing of the “Meetup” namespace). Any solution to my problem must still have these things to be a meaningful Meetup alternative.

Some potential solutions I proposed were:

Each has their own merits and issues but my hope was that in combination these three solutions might be enough to break from dependence on Meetup.

What I didn’t cover in my talk was metrics of the success of these solutions which are an important part of any initiative like this, in the end I tracked the following metrics before and after embarking on this experiment:

Putting it into practice

Working with my friend and former colleague Steve Morris, we rebranded AI Signals from AIDLE (AI and Deep Learning for Enterprise) and launched our own website. We were hugely lucky to have support from Steve (you can read more about the strategy and brand work we did with Steve over on his blog here) and as you’ll see this was one of the most impactful actions in this initiative, but I ran out of steam before successfully implementing POSSE for AI Signals. Starting my new role at Aiven meant I no longer had time for larger infrastructure efforts so setting up POSSE was put on hold.

We set up a Luma calendar and started cross promoting events from Meetup there. We also embedded Luma as the registration system for events in our website with a view to encourage our audience to move to Luma from Meetup.

Additionally we worked with other technical communities in London like Women in Data to co-host and cross promote events.

The result: What worked

The rebrand was enormously successful - at the first event after the rebrand we had a turnout of 55 people, more than double the attendance of the previous event at which we only had 20 people.

This increase in attendance has been lasting - we now see an average of 53 attendees at each event post rebrand. We also had several new organisers join, growing our team from 3 to 10 people. Anecdotally, some cited the new brand as sparking their interest in getting involved. We also saw an increase in submissions to our call for papers and three new sponsors engaging with us.

Promoting events in parallel on Luma contributed to some net new RSVPs in this time, it’s hard to say by exactly how much though as we don’t register attendees on the door so we don’t know exactly how many people attend a given event from Meetup vs Luma. With that said though since we started promoting events on Luma 32% of our total RSVPs have been from Luma.

Our event in collaboration with Women in Data had an above average attendance compared to other post rebrand events. I don’t have good data on referrals from cross promotion from other communities but we have seen a lot of new faces since we started collaboration with other groups. We also worked with the Civo team to promote their Civo navigate event and in exchange they helped us out with some free venue use.

The result: What didn’t

Both our audience and audience growth remain concentrated on Meetup. We now have 4,084 members there versus only 236 on Luma. I had hoped to pull the plug on our Meetup page at the end of this year but as it stands I won’t be able to do that - the network effects there remain too strong to leave on the table for now.

Whilst the new brand and our website was extremely impactful, building a website with SquareSpace added additional costs in the form of a £204 a year Squarespace subscription and £16 a year for domain registration. Squarespace is a powerful tool but many of the features that would help us in our efforts to move off Meetup such as managing mailing lists and sending mass emails are paywalled. In the long run I’d like us to migrate off Squarespace to avoid trading one bad platform for another.
We didn’t have as many opportunities to collaborate and for cross promotion with other communities as I would have liked. We started providing a slot in our event for other communities to promote their community announcements but there was virtually zero take up for this after the first time we did this.

Conclusion

Meetup is still Terrible: expensive, unreliable, and plagued with bad user experience. Still though there was a lot we could do better to make our Meetup a success and avoid terrible events with very low attendance and engagement.
We haven’t yet been able to execute a full move off of Meetup so it’s unclear whether moving off Meetup entirely would have been beneficial - although the continued growth of our community on the platform would suggest otherwise. Adding Luma as a platform didn’t hurt but did contribute to workload for our organisers with replicated effort for each additional platform. The rebrand drove new engagement although without directly surveying our volunteers or attendees it’s hard to conclusively link the spike in growth with the new brand.

My approach to measuring metrics of success made it hard to decouple the result of each specific action but overall the result has been positive. Rebranding and parallel promotion of events to Luma definitely helped revive our community but did not measurably reduce our reliance on Meetup. Collaboration was great for bringing some new energy into our community but the impact of our collaboration efforts was hard to measure.

Yes, Meetup is terrible, but by doing things differently our community got back to growing and hosting great AI talks for our community.

Next Steps

I’m working with the awesome team at PyData London in an effort to learn how to coordinate our growing team of volunteers at AI Signals. I’ve learned a lot from volunteering about building teams around trust with PyData and may write up some of this work in future.

At Signals we’re figuring out how to make the most of our regained momentum, considering setting up an LLC to allow us to transact, building new partnerships with sponsors, and scaling the number of events we’re able to deliver.


If you’re a community organiser I hope this article had some useful insights into our experiences trying to move away from Meetup. I’d love to hear from you about your experiences as an organiser, please join the community over at notanother.pizza and join the conversation.

read more

Get Kafka-Nated (Episode 7) Redpanda vs Kafka with Tristan Stevens

Check out the latest episode of Get Kafka-Nated! I get a great conversation with Tristan about the merits of Kafka vs Kafka Compatible solutions. You can watch the recording below.

Original release

You can find all the past episodes of Get Kafka-Nated as well as Kafka news and technical deep dives over at getkafkanated.substack.com

read more

My PyCon UK 2025 Highlights

PyCon UK 2025

I had a great time at my first PyCon UK over the weekend: the community was really friendly and I feel like I missed as many good talks as I saw which feels the mark of a good conference to me.

Day One: Highlights 🐍

I started the day with a nice wander around Manchester before the conference and a great breakfast at Brunchos before heading over to the Contact Theatre for the welcome session.

There was an awesome keynote from Hynek Schlawack about Python’s Super Power followed by David Seddon’s great talk on why other collection types are often better than Python lists.

Hannah Hazi’s talk about recovering from Long Covid as a programmer was full of useful information about the ongoing impact of Covid and a first hand account of surviving long covid.

I had a lot of fun at David Asboth’s talk about using exec to put Python in your Python so you can code while you code (all the best Python features have big red warning boxes in the docs).

CJ Shearwood’s brilliant talk “I’m a Luddite, Why Aren’t You” was an insightful and moving history on the intersection of technology and labour organising with lessons as relevant today as they were in 1800.

We had some awesome lightning talks from Eli Holderness, Lydia Cordell, Anthony Harrison, Jyoti Bhogal, Perla Godinez Castillo, Alex Willmer, and Sheena O’Connell. I took the opportunity to speak a bit about notanother.pizza and community organising in another “Meetup is terrible now” lightning talk.

My favourite talk of the day was a lightning talk from Daniele Procida, the unassumingly titled “What is your favourite film?”. It was as delightful as it was thought provoking - I won’t spoil it, watch the recording if you get the chance!

Day Two: Highlights 🐍

In the morning I had the pleasure of facilitating a session on making ASCII Art with Python for the young coders track alongside some awesome volunteers including Ekaterina Savenya and Nese Dincer. The volunteers who facilitate the young coders track are beyond awesome.

It was really interesting hearing from Kristian Glass about the work of the The UK Python Association and how instrumental they are in supporting the Python community (and not just in the UK!)

The tale of PEP 765 as told by Irit Katriel was a great story of how to find and fix a problem in open-source even when not everyone agrees that the problem should be fixed.

Aivars Kalvans talk on solving a Python mystery took me back to my time in the DevOps trenches and had some great pointers for troubleshooting Python applications in production.

Naturally my favourite talk of the day was the presentation from the young coders sharing their learnings from the day - to say these kids were awesome is an understatement, I do public speaking for a living and I don’t think I’m half as confident in front of a conference audience as these young people!

There were some more brilliant lightning talks covering everything from pencil preferences, to building a crow army, to getting started self-hosting, to exciting news from open-source.

Also there was a badge maker 🤩

Day Three: Highlights 🐍

The morning keynote []”Playing the long game”](https://youtu.be/b0GqRDfumR8?si=Qo6FE8ql9yJVc4s9) by Sheena O’Connell was an insightful session about development in a world of Language Models with some great advice for developers regardless of experience.

Deb Nicholson’s talk about coping with your project becoming popular was full of extremely good advice for both software projects and any community facing projects writ large. I particularly liked the point around letting people know what you need help with and politely pushing back on companies that think you work for them. If you’re in any kind of community organising role this talk is essential reading!

John Carney and I helped facilitate a hallway track conversation with some community organising folks with some really useful insights into volunteer experience, keeping your institutional knowledge somewhere other than in your head, and avoiding (or not avoiding as it happens) burn out.

I’m super excited for t-strings to come to Python now after Dr. Philip Jones shared his work on using them to dynamically build SQL queries in Python. Then of course there was the awesome talk from my wonderful colleague at Aiven Tibs about building an app using CLIP, PostgreSQL® and pgvector. Always fun nerd sniping Tibs with a question in the Q and A!

There was another round of awesome lightning talks including from the amazing Dawn Gibson Wages with a hero’s journey of Python development.

Day Four: Sprints and Contributions 🐍

I had a great time participating in the BeeWare sprints and earning my challenge coin with some tiny docs changes. It was also great helping Ekaterina Savenya and Dan Taylor make some contributions as well, although I feel I might have been more than a hinderance than a help when it came to updating Town Crier 😅 A huge thanks to Russell Keith-Magee for running the session and generally being an exemplary maintainer of such a cool project!


A huge thank you to the The UK Python Association and the organisers and volunteers that put on PyCon UK this year. I had a wonderful time and it was such a treat to spend the weekend with so many interesting people, already looking forward to the next one!

All recordings are now available at the PyCon UK YouTube channel

read more

Getting Started with Iceberg Topics - A Beginner's Guide

Understand how Kafka integrates with Apache Iceberg™ and experiment locally with Docker and Spark

The streaming data landscape is evolving rapidly, and one of the most exciting developments is the integration between Apache Kafka and Apache Iceberg. While Kafka excels at real-time data streaming, organizations often struggle with the complexity of moving streaming data into analytical systems. Iceberg Topics for Apache Kafka promises to bridge this gap by enabling direct integration between Kafka streams and Iceberg tables, creating a seamless path from real-time ingestion to analytical workloads.

In this article, I’ll share what Iceberg Topics are, walk you through a hands-on example you can run locally, and explore the potential this integration holds for modern data architectures. But first, let’s understand what we’re working with.

What is Apache Iceberg?

Apache Iceberg is an open table format designed for huge analytic datasets. Unlike traditional data formats, Iceberg provides features that make it ideal for data lakes and analytical workloads. It’s become increasingly popular because it solves many of the pain points associated with managing large-scale analytical data, including:

What are Iceberg Topics?

Iceberg Topics represent a powerful integration between Kafka’s streaming capabilities and Iceberg’s analytical features. Instead of requiring complex ETL pipelines to move data from Kafka into analytical systems, Iceberg Topics allow Kafka to write data directly into Iceberg table format in object storage like S3 - all zero copy without unnecessary data replication across brokers, sink connectors, and sinks.

Before:

This integration leverages Kafka’s Remote Storage Manager (RSM) plugin architecture to seamlessly transform streaming data into Iceberg tables. When you create a topic with Iceberg integration enabled, Kafka automatically:

  1. Streams data through standard Kafka topics as usual
  2. Transforms messages into Iceberg table format using schema registry integration
  3. Writes data directly to object storage as Iceberg tables
  4. Enables seamless querying through Spark, Trino, or other Iceberg-compatible engines once segments are written to the Iceberg table

After:

The beauty of this approach is that it maintains full Kafka API compatibility while adding analytical capabilities. Your existing producers and consumers continue to work unchanged, but now your streaming data is simultaneously available for real-time processing and analytical queries.

The Benefits of Iceberg Topics

Traditional architectures require separate systems for streaming and analytics, creating operational complexity and data duplication. With Iceberg Topics, you get:

Simplified Architecture: Eliminate complex ETL pipelines between streaming and analytical systems. Data flows directly from Kafka into queryable Iceberg tables.

Unified Data Model: Use the same schema for both streaming and analytical workloads, reducing inconsistencies and maintenance overhead.

Real-time Analytics: Query streaming data without waiting for batch processes to complete.

Cost Efficiency: Reduce infrastructure costs by eliminating duplicate storage and processing systems.

Operational Simplicity: Manage one system instead of coordinating between streaming platforms and data lakes.

Note: Iceberg Topics integration is still evolving in the Kafka ecosystem. The example in this article demonstrates the concept using Aiven’s Remote Storage Manager plugin, which provides Iceberg integration capabilities for experimentation and development.

Run Iceberg Topics Locally with Docker

To understand how Iceberg Topics work, let’s set up a complete local environment with Kafka, MinIO (for S3-compatible storage), Apache Iceberg REST catalog, and Spark for querying. This setup will let you see the entire data flow from Kafka streams to Iceberg tables.

Prerequisites

Before getting started, ensure you have the following installed:

Setting Up the Environment

First, you’ll need to clone the Iceberg demo

git clone https://github.com/Aiven-Open/tiered-storage-for-apache-kafka.git

Build the Remote Storage Manager plugin that handles the Iceberg integration:

cd demo/iceberg
make plugin

This command compiles the necessary components that enable Kafka to write data directly to Iceberg format.

Next, start all the required services using Docker Compose:

docker compose -f docker-compose.yml up -d

This command starts several interconnected services:

Wait for all containers to start completely. You can monitor the startup process by watching the Docker logs.

Creating and Populating Iceberg Topics

Once your environment is running, create a topic and populate it with sample data:

clients/gradlew run -p clients

This demo script performs several important operations:

  1. Creates the people topic with Iceberg integration enabled
  2. Generates sample Avro records representing person data
  3. Produces messages to the Kafka topic using standard Kafka APIs
  4. Triggers automatic conversion of streaming data to Iceberg format

The magic happens behind the scenes - while your application produces and consumes data using standard Kafka APIs, the Remote Storage Manager plugin automatically converts the streaming data into Iceberg table format and stores it in MinIO.

Exploring Your Data

After the demo runs and Kafka uploads segments to remote storage, you can explore your data in multiple ways:

Query with Spark: Visit the Spark notebook at http://localhost:8888/notebooks/notebooks/Demo.ipynb to run SQL queries against your Iceberg tables. You’ll be able to perform analytical queries on the streaming data using familiar SQL syntax.

Inspect Storage: Browse the MinIO interface at http://localhost:9001/browser/warehouse to see the actual Iceberg table files and metadata stored in object storage.

What Makes This Powerful

This local setup demonstrates several key capabilities:

Immediate Querying: As soon as data is produced to Kafka, it becomes available for analytical queries through Spark - no batch processing delays.

Storage Efficiency: Iceberg’s columnar format and compression provide efficient storage for analytical workloads while maintaining streaming performance.

ACID Compliance: Your streaming data benefits from Iceberg’s ACID transaction support, ensuring consistency even with high-throughput streams.

Troubleshooting Common Issues

If you encounter problems during setup:

Build Issues: Ensure you have JDK 17+ installed and that your JAVA_HOME is set correctly before running make plugin.

Container Startup: Check Docker logs with docker compose logs [service-name] to identify startup issues. Services have dependencies, so ensure Kafka is healthy before other services start.

Schema Registry Connection: If you see schema-related errors, verify that Karapace is running and accessible at http://localhost:8081.

Storage Access: MinIO credentials are admin/password by default. If you see S3 access errors, check the MinIO service status and credentials.

Plugin Version Mismatch: If you see ClassNotFoundException: io.aiven.kafka.tieredstorage.RemoteStorageManager, the Makefile version doesn’t match your build output. Check what version was built:

ls -la ../../core/build/distributions/

If you see a SNAPSHOT.tgz with a different version instead of core-0.0.1-SNAPSHOT.tgz, update the Makefile to match the version from the command above, for example:

sed -i '' 's/0\.0\.1-SNAPSHOT/1.1.0-SNAPSHOT/g' Makefile
make plugin

What Do Iceberg Topics Mean for Kafka?

The integration between Kafka and Iceberg represents a fundamental shift toward unified streaming and analytical architectures. Instead of maintaining separate systems for real-time and analytical workloads, organizations can now use Kafka as a single platform that serves both use cases.

For Stream Processing Teams: Continue using familiar Kafka APIs while automatically generating analytical datasets for data science and business intelligence teams.

For Data Engineering Teams: Eliminate complex ETL pipelines and reduce the operational overhead of maintaining separate streaming and analytical systems.

For Analytics Teams: Access streaming data immediately for real-time analytics without waiting for batch processes or dealing with data freshness issues.

For Organizations: Reduce total cost of ownership by consolidating infrastructure and eliminating data duplication across systems.

Ready to Explore Further?

The local example in this article provides a foundation for understanding Iceberg Topics, but the real value comes from experimenting with your own data and use cases. Consider how eliminating the boundary between streaming and analytical systems could simplify your data architecture and enable new capabilities.

The streaming analytics landscape is evolving rapidly, and integrations like Iceberg Topics are leading the way toward more unified, efficient, and capable data platforms. Whether you’re processing IoT sensor data, financial transactions, or user activity streams, the ability to seamlessly bridge real-time and analytical workloads opens up exciting possibilities for your data-driven applications.


Learn More

Explore these resources to deepen your understanding of Kafka and Iceberg integration:

Apache Iceberg Documentation

Kafka Remote Storage Manager

Aiven’s Kafka Tiered Storage

Confluent’s Iceberg Integration

The future of streaming data is here - start building with Iceberg Topics today!

read more