hughevans.dev
hughevans.dev

Adventures in POSSE

A collage picturing a chaotic intersection filled with reCAPTCHA items like crosswalks, fire hydrants and traffic lights, representing the unseen labor in data labelling. Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Hidden Labour of Internet Browsing / Licensed by CC-BY 4.0

Post Own Site Syndicate Everywhere

POSSE (Post [on your] Own Site Syndicate Everywhere) is a simple concept: when you create digital content like articles, guides, or videos simply upload it to your own website before posting links back to that content on third party sites like Medium or YouTube. One key advantage of this approach is it provides a degree of indepdence from third party platforms as all your content is preserved on your site so the loss of a social media account doesn’t mean losing years of work.

I first learned about POSSE in the excellent article of the same title by Molly White and have since adopted the approach for my own work.

An issue often sited with POSSE is that posting content across multiple channels can be labour intensive either as a result of the work required in manually posting via several third party platforms or maintaining the tooling required to automate the process. There are some ongoing efforts to help simplify the process of cross posting across multiple channels including Ryan Barrett’s Bridgy Fed project which serves as a bridge between decentralized social networks.

I wanted to build POSSE into my existing CICD for my site which I have configured as GitHub Actions - I decided to (perhaps foolishly) write a few quick scripts to post links to new articles on my Website to my various social media accounts which proved tricker than I expected! In this article I’ll talk you through my attempt at automating the common POSSE task of reposting links with a hope that elements of it may be useful to your own projects. All my code is available on my GitHub.

Posting on your own site…

I’m using Jekyll a Ruby based static website generator along with the awesome Contrast theme by Niklas Buschmann for my personal site. I use GitHub Actions workflows to to build the static HTML from my Jekyll project and then copy it across to an Nginx server running on a Raspberry Pi in my Homelab.

A screenshot of a successfully completed github action for building and deploying a jekyll site A screenshot of a successfully completed github action for building and deploying a jekyll site

I like being able to write my posts in both HTML and Markdown because of the flexibility it provides so this approach works well for me - it also makes it easy to edit content already published to my site. I can also test what articles will look like by running Jekyll locally which is really helpful for catching mistakes prior to publishing.

…and syndicate everywhere!

Once I’ve written a new article I want to link to it from all the other places I have a presence on the internet like Bluesky, LinkedIn, and Mastodon. Frustratingly at time of writing LinkedIn doesn’t appear to provide any methods for posting to your own LinkedIn account with their API - instead only allowing you to post to a company page so for now I’m limited to (still two great options!) Bluesky and Mastodon.

Getting new posts from a PR

Before I can post any links to articles I need to get new articles from PRs merged to my repo. All the articles on my site are stored in a directory called _posts so it’s easy enough to get a list of articles by running the handy changed-files action against that directory.

- name: Get changed files
    id: changed-files
    uses: tj-actions/[email protected]
    with:
    files: |
        _posts/**

Once I have a list of all the new articles I can easily iterate over them with a Bash loop. At the moment though this list is of file paths rather than links to the actual articles but this is easy enough to fix by extracting the name of the articles from the file path and appending it to the site url.

- name: Post all new posts
    if: steps.changed-files.outputs.any_changed == 'true'
    env:
    ALL_NEW_POSTS: $
    run: |
    
    for file in ${ALL_NEW_POSTS}; do
        echo $file
        # Get new post URLs from diff
        blog_url="https://hughevans.dev/${file:18:-3}"

Now that I can get a url for each new article it’s relatively simple to just POST that url in the text field of a post to the social media platform of my choice.

Mastodon

Posting a link to Mastodon and getting a nice embedded card is really easy as Mastodon pulls out all the OpenGraph information and renders it automatically for you into a nice preview card.

All you need to POST via the Mastodon API is your user token which can be found under Preferences > Development (see the Mastodon docs for more information).

With my list of new articles I can easily use the below POST request via curl to create a post on Mastodon with a nice preview card for my articles.

# Post to Mastodon
curl -X POST -d "{\"status\":\"$blog_url\", \
\"media_ids\":null,\"poll\":null}" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $" \
"https://hachyderm.io/api/v1/statuses"

Ta-dah!

A post on mastodon with a link to a blog about me speaking at the Barcelona Aerospike Meetup A post on mastodon with a link to a blog about me speaking at the Barcelona Aerospike Meetup

Bluesky

I found posting an embed card to Bluesky much trickier than posting via the Mastodon API (as anyone unfortunate enough to be following me on Bluesky whilst I was writing this article probably noticed!)

I found these examples shared by Felicitas Pojtinger really helpful but my main stumbling block was that I needed to manually include the OpenGraph information as fields in the body of my post requests as Bluesky won’t detect this automatically yet.

Before posting via the API you need to create a new app password under Settings > Privacy and Security > App Passwords. I learned that posting to Bluesky isn’t as simple as just making a POST request with my app password, as a result of running on the decentralized AT Protocol, posting via the API required that I find the Decentralized Identifier (or DID) for my handle and with that get a session API key - I did both of these with the pair of curl commands in the Bash snippet below.

# Post to Bluesky
export APP_PASSWORD='$'

HANDLE='hevansdev.bsky.social'
DID_URL="https://bsky.social/xrpc/com.atproto.identity.resolveHandle"
export DID=$(curl -G \
    --data-urlencode "handle=$HANDLE" \
    "$DID_URL" | jq -r .did)

# Get API key with the app password
API_KEY_URL='https://bsky.social/xrpc/com.atproto.server.createSession'
POST_DATA="{ \"identifier\": \"${DID}\", \"password\": \"${APP_PASSWORD}\" }"
export API_KEY=$(curl -X POST \
    -H 'Content-Type: application/json' \
    -d "$POST_DATA" \
    "$API_KEY_URL" | jq -r .accessJwt)

Once I had an API key posting text to Bluesky was simple, however: I wanted to create a preview card with an embedded image and an article title. I couldn’t find any neat way to do this directly in the Bluesky API - as a work around I used the snippet below to pull the Open Graph data from the blog post automatically so I can pass it in the Bluesky POST body.

# Get page og image
og_img_url=$(curl -L $blog_url | grep 'og.image' | grep -oE "(http|https)://[a-zA-Z0-9./?=_%:-]*")
curl -O $og_img_url

# Get page title
page_title=$(curl $blog_url -so - | grep -o "<title>[^<]*" | tail -c+8)

So with all that done I should be ready to POST the link right? No such luck. The embed image first needs to be uploaded as a blob and the blob link and size recorded for use in the POST.

```

Upload embed image blob

blob=$(curl -X POST
-H “Authorization: Bearer ${API_KEY}”
-H ‘Content-Type: image/jpeg’
–data-binary @$(ls *.jpg)
“https://bsky.social/xrpc/com.atproto.repo.uploadBlob”)

read more

Aerospike Barcelona Data Management Community Meetup

I had an amazing time speaking at the Aerospike Barcelona Data Management Community Meetup this week about working with flight radar data in Apache Druid. The team at Criteo were amazing hosts, super welcoming and friendly and the audience were really engaged with great questions after talks wrapped up. I’m looking forward to speaking at another Aerospike event later this year in Copenhagen.

If you’d like to check out my talk you watch the recording below.

read more

Backup your OpenSearch indices with manual snapshots

You’re making a change to your OpenSearch managed service and it’s all going great - right up until you make a mistake, destroying your cluster and causing you to lose all your indices. If only you had a snapshot you could restore your cluster from? Too bad you didn’t create any. 

Kermit the frog makes a rookie devops error

Taking OpenSearch snapshots is relatively easy but may require making some configuration changes to your IAM roles. It’s definitely worth doing because once you’ve successfully taken a snapshot you can use it to restore the indices in deleted, destroyed, or corrupted OpenSearch clusters or even create a duplicate cluster with the same data.

Prerequisites

In order to manually take snapshots you’ll need admin access to your OpenSearch service API either via curl or OpenSearch devtools, in this guide I’ll be using the latter method.

Before taking a snapshot you will need to create a role that will allow your OpenSearch service to write the snapshot to an S3 bucket and grant permission to the OpenSearch service to use that role. The Terraform for your IAM config should look something like the below, for more details see the AWS documentation.

IAM role

resource aws_iam_role" "es_snapshot" {
  name                 = "es-snapshot"
  managed_policy_arns  = [aws_iam_policy.es_snapshot.arn]
  assume_role_policy   = <<EOF
{
"Version" : "2012-10-17",
"Statement" : [{
    "Sid" : "",
    "Effect" : "Allow",
    "Principal" : {
    "Service" : "es.amazonaws.com"
    },
    "Condition" : {
    "StringEquals" : {
        "aws:SourceAccount" : "<your aws account id>"
    },
    "ArnLike" : {
        "aws:SourceArn" : "<the arn for your opensearch cluster>"
    }
    },
    "Action" : "sts:AssumeRole"
}]

}
  EOF
}

Note the condition in the above terraform statement: this limits access to this role to a specific OpenSearch service with your AWS account, without it any OpenSearch service could assume this role.

IAM policy

resource "aws_iam_policy" "es_snapshot" {
  name = "es-snapshot-policy"
  policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : [{
      "Action" : [
        "s3:ListBucket"
      ],
      "Effect" : "Allow",
      "Resource" : [
        "<arn of the s3 bucket you want to store your snapshots in>"
      ]
      },
      {
        "Action" : [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject"
        ],
        "Effect" : "Allow",
        "Resource" : [
          "<arn of the s3 bucket you want to store your snapshots in>/*"
        ]
      }
    ]
  })
}

Register a snapshot repository

In order to take a snapshot you first need to configure a snapshot repository to store your snapshots. In this guide I’ll be covering how to do this using an S3 bucket

First, if there isn’t one already you will need to register a snapshot repository, you can use the get request below to list any existing repositories (do not use cs-automated-enc, it is reserved by OpenSearch for automated snapshots).

GET _cat/repositories

If needed, register a new snapshot repository like so (note the use of the role we created in the previous section).

PUT _snapshot/opensearch-snapshots
{
  "type": "s3",
  "settings": {
    "bucket": "<your s3 bucket name>",
    "region": "eu-west-1",
    "role_arn": "<arn of your snapshot role>",
    "server_side_encryption": true
  }
}

Manually taking a snapshot

Check for any ongoing snapshots, you cannot take a snapshot if one is already in progress and OpenSearch automatically takes snapshots periodically.

GET _snapshot/_status

Take a snapshot. Adding the data to the end of the snapshot name is optional, but I’d recommend adding the correct time here so you can easily find the snapshot if you need to restore from it later.

PUT _snapshot/opensearch-snapshots/snapshot-2023-03-13-1135

Check snapshot progress with the first get request below and then view it with the second once complete. Use of the “pretty” query is not required but helps to make the output more readable.

GET _snapshot/_status
GET _snapshot/opensearch-snapshots/_all?pretty

You should see your snapshot listed alongside any pre-existing snapshots. Congratulations, you’re now ready to restore from a snapshot should you ever need to. Don’t stop here though, I recommend that you continue with the next section to familiarise yourself with the process of restoring from a snapshot - you should also take snapshots regularly to help reduce the risk of data loss.

Restoring from a snapshot

read more

Clustered Keycloak SSO Deployment in AWS

 Keycloak is an open source Identity and Access Management tool with features such as Single-Sign-On (SSO), Identity Brokering and Social Login, User Federation, Client Adapters, an Admin Console, and an Account Management Console.

Why use Keycloak?

There are several factors to deciding whether or not to use Keycloak or a SaaS IAM Service like AWS SSO. SaaS IAM services are typically easier to implement, better supported, and do not require manual deployment but Keycloak is free to use, feature rich, and flexible.

Pre-requisites

This guide assumes you already have at least one Keycloak instance with a Postgres database configured, if this is the case your keycloak.conf should include a section that looks something like the example below.

db=postgres
db-password=<your db password>
db-userame=keycloak
db-pool-initial-size=1
db-pool-max-size=10
db-schema=public
db-url-database=keycloak
db-url-host=<url of your db>
db-url-port=5432

If you do not yet have your database configured please refer to the documentation on configuring relational databases for Keycloak.

Configuring JDBC Ping

In order for Keycloak instances to cluster they must discover each other and this can be achieved by using JDBC Ping which allows nodes to discover each other via your existing database. JDBC Ping is a convenient discovery method because it does not require the creation of additional AWS resources and is compatible with AWS unlike the default discovery method (multicast) which is not permitted by AWS.

In order to use JDBC Ping we first need to define a transport stack, this can be achieved by adding the below element to the infinispan tag in your cache-ispn.xml file and replacing the default values (these should match the db-password and db-url-host from your keycloak.conf file).

<jgroups>
    <stack name="jdbc-ping-tcp" extends="tcp">
        <JDBC_PING connection_driver="org.postgresql.Driver"
                    connection_username="keycloak"
                    connection_password="<your database password>"
                    connection_url="jdbc:postgresql://<url of your database>:5432/keycloak"
                    initialize_sql="CREATE TABLE IF NOT EXISTS JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));"
                    info_writer_sleep_time="500"
                    remove_all_data_on_view_change="true"
                    stack.combine="REPLACE"
                    stack.position="MPING" />
    </stack>
</jgroups>

We have now defined a new JGroups stack which will create a table in your database if one doesn’t already exist which Keycloak instances can use to discover each other, when you start a new Keycloak instance it will write its name as a new record into this table. To use this stack simply amend the transport element as shown below to reference the newly defined stack.

<transport lock-timeout="60000" stack="jdbc-ping-tcp"/>

Configuring Security Groups

Keycloak uses Infinispan to cache data both locally to the Keycloak instance and for remote caches. Infinispan by default uses port 7800 so we need to configure the Security Group our Keycloak instances are deployed to in order to permit both ingress and egress via port 7800. This can be done in a number of ways such as via the AWS Console, below is an example of configuring ports for Keycloak using Terraform.

## keycloak cluster egress
resource "aws_security_group_rule" "keycloak_cluster_egress_to_keycloak" {
    description              = "keycloak cluster"
    from_port                = 7800
    protocol                 = "tcp"
    security_group_id        = aws_security_group.keycloak.id
    source_security_group_id = aws_security_group.keycloak.id
    to_port                  = 7800
    type                     = "egress"
}

## keycloak cluster ingress
resource "aws_security_group_rule" "keycloak_cluster_ingress_to_keycloak" {
    description              = "keycloak cluster"
    from_port                = 7800
    protocol                 = "tcp"
    security_group_id        = aws_security_group.keycloak.id
    source_security_group_id = aws_security_group.keycloak.id
    to_port                  = 7800
    type                     = "ingress"
}

Restarting Keycloak

Keycloak does not automatically apply changes made to its configuration so you will need to restart your Keycloak instance/instances for clustering to work. First run the following from the terminal to rebuild your Keycloak instance to register the changes we made to your configuration.

➜/bin/kc.sh build

Once you have rebuilt Keycloak restart your Keycloak service by running the following (alternatively you can restart your Keycloak instance).

systemctl restart keycloak

Your Keycloak instances should now be running in a clustered state.

Testing your Keycloak cluster

To check that your Keycloak cluster is functioning correctly check your database and see if the JGROUPSPING table both exists and includes the name of all instances currently in the cluster, your table should look something like the below.

own_addr cluster_name ping_data
***** ISPN *****
***** ISPN *****

If you terminate a Keycloak instance or start a new instance you should see the records in this table change.

Troubleshooting

Changes made to config files aren’t applied after building Keycloak

Ensure that the config files you have changed match those configured in keycloak.conf, this guide for example assumes that you have your Infinispan config file set as cache-ispn.xml in your keycloak.conf file.

cache-config-file: cache-ispn.xml

Keycloak services don’t start after changing config files

Check the Keycloak logs and ensure your database access details (password and host url) are set correctly: if these values are incorrect the Keycloak service will fail to start.

Resources

Use of JDBC_PING with Keycloak 17 (Quarkus distro)

Embedding Infinispan caches in Java applications

Keycloak Server caching

Clustered Keycloak SSO Deployment in AWS was originally published on the Daemon Insights blog

read more

Turn a Raspberry Pi into an IoT device with AWS

Cheap and easy IoT with AWS

read more

DALL·E 2 - what happens when machines make art?

3D render of DALL-E-2 making art in an open office on a red brick background, digital art

What is DALL·E 2?

read more

Exposing metrics to Prometheus with Service Monitors

You’ve done the hard part and added instrumentation to your application to gather metrics, now you just need to expose those metrics to Prometheus so you can alert on them and monitor them, easy right?

read more