Skip to main content

The Blog

Thoughts on data engineering, software, and the occasional rabbit hole.

Featured

First Look: Phonics Journey Running on Android

2 min read

Last week I wrote about building Phonics Journey. A lot of people asked to see what it actually looks like.

6 min read

The Accidental App Dev: When a Phonics Request Becomes a Full-Stack Journey

I'm a Data Engineer. My day usually involves wrangling SQLMesh models, optimising ETL pipelines, and making sure data stays intact across distributed systems. I think in batches, streams, and schemas.

6 min read

Thinking About Switching from uv + pipx to Pixi? Read This First.

I've been running uv and pipx as my Python toolchain for a while now. It's a good setup — fast, clean, and uv in particular has earned its reputation as the current gold standard for Python package management. So when Pixi came up in conversation recently, I spent some time actually thinking through whether the switch makes sense.

8 min read

ETL vs ELT: Knowing Which One to Reach For

ETL and ELT are two of those terms that get used interchangeably in job descriptions and architecture documents, as if the letter order doesn't matter. It does. They represent genuinely different approaches to moving and transforming data, and picking the wrong one for your situation creates problems that compound over time.

5 min read

Why I Created a GitHub Organisation (And Why You Probably Should Too)

I've had a personal GitHub account for years. It does what it needs to do — it holds my repos, tracks my contributions, and occasionally embarrasses me with commits from 2018 that I'd rather not think about.

6 min read

AI Agents: The Good, the Frustrating, and the Genuinely Useful

Full transparency upfront: this post was written by a GitHub Copilot agent on my behalf. I gave it the brief, it wrote the words. My thoughts, its keyboard. Make of that what you will.

4 min read

How I Used an AI Agent to Modernize My Portfolio

I recently handed the keys of my portfolio site over to a GitHub Copilot coding agent and asked it to modernize the whole thing — version upgrades, UI polish, dependency management, the lot. Here's what that actually looked like in practice.

6 min read

Dimensional Modeling 101 - Design Better Data Warehouses

If you're building a data warehouse, dimensional modeling is your best friend. It makes data easier to query, understand, and analyze. Let's break down this powerful technique!

6 min read

Data Quality Validation - Ensuring Your Data is Trustworthy

Bad data leads to bad decisions. As data engineers, one of our most important jobs is ensuring data quality. Let's explore how to validate and maintain high-quality data!

5 min read

Getting Started with Apache Airflow - Orchestrate Your Data Pipelines

Apache Airflow has become the go-to tool for orchestrating data workflows. If you've ever needed to run tasks in a specific order, on a schedule, with dependencies - Airflow is your friend!