← Home

Please don't ship the vibes

#engineering

Good agentic programming is still engineering.

You break the work down into small steps, keep the context small, and review the diff before you ship.

The agent writes code.

The engineer designs the system that produces the code.

The point is to make the work small enough that the agent can do something useful and the human can still understand the result.

Context should be tight#

The context stack matters, but it does not have to be complicated.

AGENTS.md = baseline project context
Rules = persistent guardrails
Skills = task-specific playbooks
MCP = access to tools and data

AGENTS.md is the baseline. It should answer, “What does the agent need to know every time it works in this part of the repo?”

Not everything.

Just the stuff that should always be true.

A root AGENTS.md can give the high-level project overview. A nested AGENTS.md can give local context for a package, feature, app, or service.

A quick note for Claude Code users: Claude Code reads CLAUDE.md, not AGENTS.md.

Rules are guardrails. Things like “do not introduce dependencies without asking” or “prefer existing shared components.”

Skills are playbooks. They are what you reach for when the agent needs to do a specific kind of work well, like scaffolding a feature, adding tests, or drafting a PR description.

MCP is access. It lets the agent reach real systems: files, databases, tickets, docs, design tools.

Small PRs are the real vibe#

Every agent step should produce a reviewable diff.

Google’s engineering practices on small CLs still apply here. Smaller changes are easier to reason about, easier to review, easier to roll back, and less likely to hide problems.

Good:

PR 1: add feature contract and types
PR 2: add state model
PR 3: add data service
PR 4: add UI component
PR 5: wire feature entry point

Bad:

PR: build feature with AI
Files changed: 87
Lines added: 6,400

Nobody reviews that properly.

They skim it. They look for obvious weirdness.

Now the team owns a black box.

Agents can help review code. They can catch typos, missing tests, etc.

But they should not be the final reviewer for agent-written code. Humans need to be the final gate.

GitHub’s own guidance on reviewing AI-generated code lands in the same place. Human oversight, tests, and review stay in the loop.

LLMs are not engineers#

LLMs are powerful, but they are not engineers.

They generate plausible output from patterns.

That is not the same thing as understanding your product, your users, your weird legacy decisions, or why a certain edge case matters.

And when models fail, they can fail in strange ways.

OpenAI’s goblin writeup is funny, but it is also a perfect warning. A reward signal made some models overuse goblins, gremlins, raccoons, trolls, ogres, and pigeons in places where they did not belong.

That is harmless when it shows up as weird language. It is less harmless when the same kind of drift shows up as confident code.

AGI is right around the corner folks ;)

-Armin