Home | hughevans.dev
A photo of me in front of the Golden Gate Bridge during a visit to San Francisco for Apache Druid Summit
I made a lot of things this year
including mistakes - I started the year out of a job having been let go from a Senior Platform Engineer Role because I’d pursued a Senior role before I was ready. I’m now in a better place with work but it’s been a challenging year and one thing that’s helped me stay grounded and keep moving forward is making.
Life for me has always been divided into time spent working on one project or another, a cardboard city encompassed much of my childhood bedroom, and as a teen I spent the summer tinkering with 3D printers and Raspberry Pis. The process of coming up with an idea for a project, building, and iterating until I have an end product that I’m (mostly) happy with never fails to hold my attention and recharge my creative batteries.
In this blog I’d like to revisit some of the projects I worked on this year and share some of what I learned from each one.
This was a super scrappy project: I’ve been running a tool library out of my garage for the past couple of years and was rapidly running out of space for donated tools so I decided to cobble something together from some grocery crates I picked up on Olio and some parts of an old fitted kitchen.
I’m still using these drawers to store things so they work but they needed a bit of tweaking with shims as the drawers initially kept falling through the runners and getting stuck - should have triple checked my measurements before taking a circular saw to my offcuts of kitchen counter!
ASCII Art Photo Booth for EMF Camp
This year for EMF camp I built an exhibit that would take your picture and print it out as ASCII art on receipt paper. This has been done lots of times before but I fancied trying my hand at it because I thought it would be a fun project.
I got this working at home but once I actually got to EMF I couldn’t figure out how to light the subjects of the photos in the environment properly to get nice ASCII images which was a shame - next time around better lightning, a backdrop, and potentially a more flexible algorithm for generating ASCII art would make for better results.
Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Hidden Labour of Internet Browsing / Licensed by CC-BY 4.0
Post Own Site Syndicate Everywhere
POSSE (Post [on your] Own Site Syndicate Everywhere) is a simple concept: when you create digital content like articles, guides, or videos simply upload it to your own website before posting links back to that content on third party sites like Medium or YouTube. One key advantage of this approach is it provides a degree of indepdence from third party platforms as all your content is preserved on your site so the loss of a social media account doesn’t mean losing years of work.
I first learned about POSSE in the excellent article of the same title by Molly White and have since adopted the approach for my own work.
An issue often sited with POSSE is that posting content across multiple channels can be labour intensive either as a result of the work required in manually posting via several third party platforms or maintaining the tooling required to automate the process. There are some ongoing efforts to help simplify the process of cross posting across multiple channels including Ryan Barrett’s Bridgy Fed project which serves as a bridge between decentralized social networks.
I wanted to build POSSE into my existing CICD for my site which I have configured as GitHub Actions - I decided to (perhaps foolishly) write a few quick scripts to post links to new articles on my Website to my various social media accounts which proved tricker than I expected! In this article I’ll talk you through my attempt at automating the common POSSE task of reposting links with a hope that elements of it may be useful to your own projects. All my code is available on my GitHub.
Posting on your own site…
I’m using Jekyll a Ruby based static website generator along with the awesome Contrast theme by Niklas Buschmann for my personal site. I use GitHub Actions workflows to to build the static HTML from my Jekyll project and then copy it across to an Nginx server running on a Raspberry Pi in my Homelab.
A screenshot of a successfully completed github action for building and deploying a jekyll site
I like being able to write my posts in both HTML and Markdown because of the flexibility it provides so this approach works well for me - it also makes it easy to edit content already published to my site. I can also test what articles will look like by running Jekyll locally which is really helpful for catching mistakes prior to publishing.
…and syndicate everywhere!
Once I’ve written a new article I want to link to it from all the other places I have a presence on the internet like Bluesky, LinkedIn, and Mastodon. Frustratingly at time of writing LinkedIn doesn’t appear to provide any methods for posting to your own LinkedIn account with their API - instead only allowing you to post to a company page so for now I’m limited to (still two great options!) Bluesky and Mastodon.
Getting new posts from a PR
Before I can post any links to articles I need to get new articles from PRs merged to my repo. All the articles on my site are stored in a directory called _posts
so it’s easy enough to get a list of articles by running the handy changed-files
action against that directory.
- name: Get changed files
id: changed-files
uses: tj-actions/[email protected]
with:
files: |
_posts/**
Once I have a list of all the new articles I can easily iterate over them with a Bash loop. At the moment though this list is of file paths rather than links to the actual articles but this is easy enough to fix by extracting the name of the articles from the file path and appending it to the site url.
- name: Post all new posts
if: steps.changed-files.outputs.any_changed == 'true'
env:
ALL_NEW_POSTS: $
run: |
for file in ${ALL_NEW_POSTS}; do
echo $file
# Get new post URLs from diff
blog_url="https://hughevans.dev/${file:18:-3}"
Now that I can get a url for each new article it’s relatively simple to just POST that url in the text field of a post to the social media platform of my choice.
Mastodon
Posting a link to Mastodon and getting a nice embedded card is really easy as Mastodon pulls out all the OpenGraph information and renders it automatically for you into a nice preview card.
All you need to POST via the Mastodon API is your user token which can be found under Preferences > Development (see the Mastodon docs for more information).
With my list of new articles I can easily use the below POST request via curl to create a post on Mastodon with a nice preview card for my articles.
# Post to Mastodon
curl -X POST -d "{\"status\":\"$blog_url\", \
\"media_ids\":null,\"poll\":null}" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $" \
"https://hachyderm.io/api/v1/statuses"
Ta-dah!
A post on mastodon with a link to a blog about me speaking at the Barcelona Aerospike Meetup
Bluesky
I found posting an embed card to Bluesky much trickier than posting via the Mastodon API (as anyone unfortunate enough to be following me on Bluesky whilst I was writing this article probably noticed!)
I found these examples shared by Felicitas Pojtinger really helpful but my main stumbling block was that I needed to manually include the OpenGraph information as fields in the body of my post requests as Bluesky won’t detect this automatically yet.
Before posting via the API you need to create a new app password under Settings > Privacy and Security > App Passwords. I learned that posting to Bluesky isn’t as simple as just making a POST request with my app password, as a result of running on the decentralized AT Protocol, posting via the API required that I find the Decentralized Identifier (or DID) for my handle and with that get a session API key - I did both of these with the pair of curl commands in the Bash snippet below.
# Post to Bluesky
export APP_PASSWORD='$'
HANDLE='hevansdev.bsky.social'
DID_URL="https://bsky.social/xrpc/com.atproto.identity.resolveHandle"
export DID=$(curl -G \
--data-urlencode "handle=$HANDLE" \
"$DID_URL" | jq -r .did)
# Get API key with the app password
API_KEY_URL='https://bsky.social/xrpc/com.atproto.server.createSession'
POST_DATA="{ \"identifier\": \"${DID}\", \"password\": \"${APP_PASSWORD}\" }"
export API_KEY=$(curl -X POST \
-H 'Content-Type: application/json' \
-d "$POST_DATA" \
"$API_KEY_URL" | jq -r .accessJwt)
Once I had an API key posting text to Bluesky was simple, however: I wanted to create a preview card with an embedded image and an article title. I couldn’t find any neat way to do this directly in the Bluesky API - as a work around I used the snippet below to pull the Open Graph data from the blog post automatically so I can pass it in the Bluesky POST body.
# Get page og image
og_img_url=$(curl -L $blog_url | grep 'og.image' | grep -oE "(http|https)://[a-zA-Z0-9./?=_%:-]*")
curl -O $og_img_url
# Get page title
page_title=$(curl $blog_url -so - | grep -o "<title>[^<]*" | tail -c+8)
So with all that done I should be ready to POST the link right? No such luck. The embed image first needs to be uploaded as a blob and the blob link and size recorded for use in the POST.
```
Upload embed image blob
blob=$(curl -X POST
-H “Authorization: Bearer ${API_KEY}”
-H ‘Content-Type: image/jpeg’
–data-binary @$(ls *.jpg)
“https://bsky.social/xrpc/com.atproto.repo.uploadBlob”)
My last week at Imply was at Druid Summit, it was an absolute pleasure working alongside my former colleagues Reena Leone, Peter Marshall, and Dave Klein running the event and getting to see so many excellent presentations from members of the Druid community. I organized all the panels for the 2024 event and you can see the recording of the Ops and Optimization panel I hosted below.
I had an amazing time speaking at the Aerospike Barcelona Data Management Community Meetup this week about working with flight radar data in Apache Druid. The team at Criteo were amazing hosts, super welcoming and friendly and the audience were really engaged with great questions after talks wrapped up. I’m looking forward to speaking at another Aerospike event later this year in Copenhagen.
If you’d like to check out my talk you watch the recording below.
You’re making a change to your OpenSearch managed service and it’s all going great - right up until you make a mistake, destroying your cluster and causing you to lose all your indices. If only you had a snapshot you could restore your cluster from? Too bad you didn’t create any.
Taking OpenSearch snapshots is relatively easy but may require making some configuration changes to your IAM roles. It’s definitely worth doing because once you’ve successfully taken a snapshot you can use it to restore the indices in deleted, destroyed, or corrupted OpenSearch clusters or even create a duplicate cluster with the same data.
Prerequisites
In order to manually take snapshots you’ll need admin access to your OpenSearch service API either via curl or OpenSearch devtools, in this guide I’ll be using the latter method.
Before taking a snapshot you will need to create a role that will allow your OpenSearch service to write the snapshot to an S3 bucket and grant permission to the OpenSearch service to use that role. The Terraform for your IAM config should look something like the below, for more details see the AWS documentation.
IAM role
resource aws_iam_role" "es_snapshot" {
name = "es-snapshot"
managed_policy_arns = [aws_iam_policy.es_snapshot.arn]
assume_role_policy = <<EOF
{
"Version" : "2012-10-17",
"Statement" : [{
"Sid" : "",
"Effect" : "Allow",
"Principal" : {
"Service" : "es.amazonaws.com"
},
"Condition" : {
"StringEquals" : {
"aws:SourceAccount" : "<your aws account id>"
},
"ArnLike" : {
"aws:SourceArn" : "<the arn for your opensearch cluster>"
}
},
"Action" : "sts:AssumeRole"
}]
}
EOF
}
Note the condition in the above terraform statement: this limits access to this role to a specific OpenSearch service with your AWS account, without it any OpenSearch service could assume this role.
IAM policy
resource "aws_iam_policy" "es_snapshot" {
name = "es-snapshot-policy"
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [{
"Action" : [
"s3:ListBucket"
],
"Effect" : "Allow",
"Resource" : [
"<arn of the s3 bucket you want to store your snapshots in>"
]
},
{
"Action" : [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject"
],
"Effect" : "Allow",
"Resource" : [
"<arn of the s3 bucket you want to store your snapshots in>/*"
]
}
]
})
}
Register a snapshot repository
In order to take a snapshot you first need to configure a snapshot repository to store your snapshots. In this guide I’ll be covering how to do this using an S3 bucket
First, if there isn’t one already you will need to register a snapshot repository, you can use the get request below to list any existing repositories (do not use cs-automated-enc, it is reserved by OpenSearch for automated snapshots).
If needed, register a new snapshot repository like so (note the use of the role we created in the previous section).
PUT _snapshot/opensearch-snapshots
{
"type": "s3",
"settings": {
"bucket": "<your s3 bucket name>",
"region": "eu-west-1",
"role_arn": "<arn of your snapshot role>",
"server_side_encryption": true
}
}
Manually taking a snapshot
Check for any ongoing snapshots, you cannot take a snapshot if one is already in progress and OpenSearch automatically takes snapshots periodically.
Take a snapshot. Adding the data to the end of the snapshot name is optional, but I’d recommend adding the correct time here so you can easily find the snapshot if you need to restore from it later.
PUT _snapshot/opensearch-snapshots/snapshot-2023-03-13-1135
Check snapshot progress with the first get request below and then view it with the second once complete. Use of the “pretty” query is not required but helps to make the output more readable.
GET _snapshot/_status
GET _snapshot/opensearch-snapshots/_all?pretty
You should see your snapshot listed alongside any pre-existing snapshots. Congratulations, you’re now ready to restore from a snapshot should you ever need to. Don’t stop here though, I recommend that you continue with the next section to familiarise yourself with the process of restoring from a snapshot - you should also take snapshots regularly to help reduce the risk of data loss.
Restoring from a snapshot
Keycloak is an open source Identity and Access Management tool with features such as Single-Sign-On (SSO), Identity Brokering and Social Login, User Federation, Client Adapters, an Admin Console, and an Account Management Console.
Why use Keycloak?
There are several factors to deciding whether or not to use Keycloak or a SaaS IAM Service like AWS SSO. SaaS IAM services are typically easier to implement, better supported, and do not require manual deployment but Keycloak is free to use, feature rich, and flexible.
Pre-requisites
This guide assumes you already have at least one Keycloak instance with a Postgres database configured, if this is the case your keycloak.conf should include a section that looks something like the example below.
db=postgres
db-password=<your db password>
db-userame=keycloak
db-pool-initial-size=1
db-pool-max-size=10
db-schema=public
db-url-database=keycloak
db-url-host=<url of your db>
db-url-port=5432
If you do not yet have your database configured please refer to the documentation on configuring relational databases for Keycloak.
Configuring JDBC Ping
In order for Keycloak instances to cluster they must discover each other and this can be achieved by using JDBC Ping which allows nodes to discover each other via your existing database. JDBC Ping is a convenient discovery method because it does not require the creation of additional AWS resources and is compatible with AWS unlike the default discovery method (multicast) which is not permitted by AWS.
In order to use JDBC Ping we first need to define a transport stack, this can be achieved by adding the below element to the infinispan tag in your cache-ispn.xml file and replacing the default values (these should match the db-password and db-url-host from your keycloak.conf file).
<jgroups>
<stack name="jdbc-ping-tcp" extends="tcp">
<JDBC_PING connection_driver="org.postgresql.Driver"
connection_username="keycloak"
connection_password="<your database password>"
connection_url="jdbc:postgresql://<url of your database>:5432/keycloak"
initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
info_writer_sleep_time="500"
remove_all_data_on_view_change="true"
stack.combine="REPLACE"
stack.position="MPING" />
</stack>
</jgroups>
We have now defined a new JGroups stack which will create a table in your database if one doesn’t already exist which Keycloak instances can use to discover each other, when you start a new Keycloak instance it will write its name as a new record into this table. To use this stack simply amend the transport element as shown below to reference the newly defined stack.
<transport lock-timeout="60000" stack="jdbc-ping-tcp"/>
Configuring Security Groups
Keycloak uses Infinispan to cache data both locally to the Keycloak instance and for remote caches. Infinispan by default uses port 7800 so we need to configure the Security Group our Keycloak instances are deployed to in order to permit both ingress and egress via port 7800. This can be done in a number of ways such as via the AWS Console, below is an example of configuring ports for Keycloak using Terraform.
## keycloak cluster egress
resource "aws_security_group_rule" "keycloak_cluster_egress_to_keycloak" {
description = "keycloak cluster"
from_port = 7800
protocol = "tcp"
security_group_id = aws_security_group.keycloak.id
source_security_group_id = aws_security_group.keycloak.id
to_port = 7800
type = "egress"
}
## keycloak cluster ingress
resource "aws_security_group_rule" "keycloak_cluster_ingress_to_keycloak" {
description = "keycloak cluster"
from_port = 7800
protocol = "tcp"
security_group_id = aws_security_group.keycloak.id
source_security_group_id = aws_security_group.keycloak.id
to_port = 7800
type = "ingress"
}
Restarting Keycloak
Keycloak does not automatically apply changes made to its configuration so you will need to restart your Keycloak instance/instances for clustering to work. First run the following from the terminal to rebuild your Keycloak instance to register the changes we made to your configuration.
Once you have rebuilt Keycloak restart your Keycloak service by running the following (alternatively you can restart your Keycloak instance).
systemctl restart keycloak
Your Keycloak instances should now be running in a clustered state.
Testing your Keycloak cluster
To check that your Keycloak cluster is functioning correctly check your database and see if the JGROUPSPING table both exists and includes the name of all instances currently in the cluster, your table should look something like the below.
own_addr |
cluster_name |
ping_data |
***** |
ISPN |
***** |
***** |
ISPN |
***** |
If you terminate a Keycloak instance or start a new instance you should see the records in this table change.
Troubleshooting
Changes made to config files aren’t applied after building Keycloak
Ensure that the config files you have changed match those configured in keycloak.conf, this guide for example assumes that you have your Infinispan config file set as cache-ispn.xml in your keycloak.conf file.
cache-config-file: cache-ispn.xml
Keycloak services don’t start after changing config files
Check the Keycloak logs and ensure your database access details (password and host url) are set correctly: if these values are incorrect the Keycloak service will fail to start.
Resources
Use of JDBC_PING with Keycloak 17 (Quarkus distro)
Embedding Infinispan caches in Java applications
Keycloak Server caching
Clustered Keycloak SSO Deployment in AWS was originally published on the Daemon Insights blog
Cheap and easy IoT with AWS
3D render of DALL-E-2 making art in an open office on a red brick background, digital art
What is DALL·E 2?
You’ve done the hard part and added instrumentation to your application to gather metrics, now you just need to expose those metrics to Prometheus so you can alert on them and monitor them, easy right?