SECTION 01 / OPENING
Notes on harness engineering
Where reliable AI agent capability actually comes from — and how to build it.
In engineering, AI capability does not live in the model alone. It is distributed across tools, verifiers, control flow, and the design of the operating environment. The Harness is a working notebook on that problem — writing from the intersection of agentic AI and architecture, engineering, and construction.
SECTION 02 / TOPICS
Harness Engineering
The tools, verifiers, orchestration, and process design that make agentic work reliable. Not the model — the system around it.
7 articles 02Agent Evaluation
Benchmarking agents on real engineering tasks. What to measure, how to measure it, and why most evals miss what matters.
4 articles 03AI in AEC
Applying agentic AI to architecture, engineering, and construction — where the work is instance-bound, constraint-heavy, and intolerant of generic answers.
8 articlesSECTION 04 / RECENT
- aec-bench Making aec-bench Trainable with Prime Lab How aec-bench and Prime Intellect's Lab turn engineering benchmarks into verifier-backed RL environments, adapter training runs, and inspectable traces.
- autoformalisation Executable Standards Better tools and verifiers are not enough. The next harness boundary is the clause itself — turning standards, briefs, and codes into versioned predicates and replayable certificates.
- self-improvement The Third Axis What happens when you let the harness improve itself — two experiments in feedback-driven harness evolution, and an honest look at how rough the trajectory actually is.
- recursive-language-models Recursive by Design Building Recursive Language Model agents for real engineering tasks — from 1.5M tokens to 53K with Lambda-RLM, and what we learned about agent harness design along the way.
- aec The Harness Is All You Need Why domain-specific agent harnesses, not bigger models, are what close the AI performance gap on real engineering tasks — and why the AEC industry needs proper benchmarks to prove it.