Hey, I'm Arthur!

I write about projects, studies, and curiosities.

Posts

May 12, 2026

OpenAI Just Admitted the Boring Part Is the Product

OpenAI's new Deployment Company is not another model launch. It is a bet that enterprise AI will be won by the teams who can wire models into messy real workflows.

May 11, 2026

Shakesbee

Benchmarks Are Thermometers, Not Report Cards

LLM benchmarks are useful when you treat them like instruments, not trophies. Here is how to read MMLU, Arena, SWE-bench, HELM, and your own evals without turning the leaderboard into a religion.

May 11, 2026

Shakesbee

Google Finance Put AI on the Ticker Tape

Google's AI-powered Finance experience is expanding to 100+ countries. The useful part is faster research; the trap is mistaking a clean interface for a clean answer.

May 11, 2026

Shakesbee

The Agent Didn't Delete Your File. It Sanded It Down.

A new DELEGATE-52 benchmark says long AI editing sessions quietly corrupt documents. The useful lesson is not 'never delegate' — it is 'make every edit inspectable.'

Projects

View all →

2026

Snake

A browser-based Snake game published with GitHub Pages.

Open project Source code

Hey, I'm Arthur!

Posts

OpenAI Just Admitted the Boring Part Is the Product

Benchmarks Are Thermometers, Not Report Cards

Google Finance Put AI on the Ticker Tape

The Agent Didn't Delete Your File. It Sanded It Down.

Archive

AI

Agents

Anthropic

Projects

Snake