00About

It came out of the day job.

Hi, I'm Damien. I run Homemade Software, an AI software consultancy. We've been building software for a couple of decades, and a lot of what we do these days is AI-enhanced apps, LLMs doing real work for clients.

The same thing kept happening on client work. You put an LLM in front of real people, and then you kind of need to know whether it's any good, and whether the last change you made helped or quietly made things worse. That turns out to be a genuinely hard question, and honestly this whole area is new for pretty much everyone. We looked around, nothing quite fit what we needed at the time, so we built something small for ourselves and used it on our own projects first. That became CompletionKit.

I've tried to keep it approachable. You jot down what good looks like, either as a simple 1-to-5 rubric or a plain pass/fail check, and it scores your outputs so you can see what got better and what broke before you ship. I really wanted the whole team to be able to use it, not just the engineers, so it's a flat price with no per-seat tax. It's source-available under BSL 1.1, so you're welcome to read the code or run it yourself.

None of us have this fully worked out, me included. Evaluating AI is still pretty young, and we're all learning it as we go. I've had a ton of help getting here, and I'm grateful for it. Good people poked at the early versions, found the rough edges and told me what was missing, and it's a lot better for that.

If you've got an LLM in front of real users, whether you've done this a hundred times or you're just getting started, I'd genuinely love to hear how it's going. You can reach me at damien@homemade.software.