Topic

Harness Engineering

9 pieces in this thread.

2026-07-08

Mediation, Not Intermediation

Why the 'fix your foundations before AI' message has it backwards: agentic workflows are the way out of legacy data, and governance worth having is co-designed from practice, not committees.

agentic-aiharness-engineeringdata-engineeringgovernance
2026-06-24

Task Worlds and Meta-Harnesses

How task worlds, Badiou, Plasticity, and the AEC-Bench meta-harness turn task prose, evidence, review, governance, and repair into runnable machinery.

agentic-aiharness-engineeringaec-benchtask-worlds
2026-06-02

Plausible Answers, Failed Workflows

An AEC-Bench release evaluation read as workflow reliability, not prose quality. Chapter by chapter: why a model can produce a plausible answer and still fail the durable record a project has to audit.

harness-engineeringagentic-aiai-in-aecai-benchmarks
2026-05-16

Making aec-bench Trainable with Prime Lab

How aec-bench and Prime Intellect's Lab turn engineering benchmarks into verifier-backed RL environments, adapter training runs, and inspectable traces.

aec-benchprime-labreinforcement-learningagent-evaluation
2026-05-03

Executable Standards

Better tools and verifiers are not enough. The next harness boundary is the clause itself — turning standards, briefs, and codes into versioned predicates and replayable certificates.

harness-engineeringautoformalisationai-in-aecformal-methods
2026-04-18

The Third Axis

What happens when you let the harness improve itself — two experiments in feedback-driven harness evolution, and an honest look at how rough the trajectory actually is.

harness-engineeringagentic-aiself-improvementautoresearch
2026-03-14

What If the Harness Could Improve Itself?

Applying the autoresearch pattern to self-improve an engineering agent harness. Automated prompt optimisation across HVAC audit tasks on Claude and GPT-4.1-mini, showing how harness engineering compounds when the improvement loop runs itself.

harness-engineeringautoresearchagentic-aiai-in-aec
2026-03-12

Benchmarking Agents on Real Engineering Work Is Already Teaching Us Something Important

Benchmarking AI agents on real HVAC engineering tasks across Claude and GPT models. Results on harness-dependent capability, agent evaluation design, and why AEC-domain benchmarks reveal what general benchmarks miss.

harness-engineeringagentic-aiai-in-aecai-benchmarks
2026-03-10

Where Capability Actually Lives in Agentic Engineering

In AEC and domain-specific engineering, AI agent capability lives not in the model alone but in harness engineering — the tools, verifiers, orchestration, and process design that make agentic work reliable.

harness-engineeringagentic-aiai-in-aecengineering-ai

Mediation, Not Intermediation

Task Worlds and Meta-Harnesses

Plausible Answers, Failed Workflows

Making aec-bench Trainable with Prime Lab

Executable Standards

The Third Axis

What If the Harness Could Improve Itself?

Benchmarking Agents on Real Engineering Work Is Already Teaching Us Something Important

Where Capability Actually Lives in Agentic Engineering