Topic

Harness Engineering

6 pieces in this thread.

  1. Making aec-bench Trainable with Prime Lab

    How aec-bench and Prime Intellect's Lab turn engineering benchmarks into verifier-backed RL environments, adapter training runs, and inspectable traces.

    aec-benchprime-labreinforcement-learningagent-evaluation
  2. Executable Standards

    Better tools and verifiers are not enough. The next harness boundary is the clause itself — turning standards, briefs, and codes into versioned predicates and replayable certificates.

    harness-engineeringautoformalisationai-in-aecformal-methods
  3. The Third Axis

    What happens when you let the harness improve itself — two experiments in feedback-driven harness evolution, and an honest look at how rough the trajectory actually is.

    harness-engineeringagentic-aiself-improvementautoresearch
  4. What If the Harness Could Improve Itself?

    Applying the autoresearch pattern to self-improve an engineering agent harness. Automated prompt optimisation across HVAC audit tasks on Claude and GPT-4.1-mini, showing how harness engineering compounds when the improvement loop runs itself.

    harness-engineeringautoresearchagentic-aiai-in-aec
  5. Benchmarking Agents on Real Engineering Work Is Already Teaching Us Something Important

    Benchmarking AI agents on real HVAC engineering tasks across Claude and GPT models. Results on harness-dependent capability, agent evaluation design, and why AEC-domain benchmarks reveal what general benchmarks miss.

    harness-engineeringagentic-aiai-in-aecai-benchmarks
  6. Where Capability Actually Lives in Agentic Engineering

    In AEC and domain-specific engineering, AI agent capability lives not in the model alone but in harness engineering — the tools, verifiers, orchestration, and process design that make agentic work reliable.

    harness-engineeringagentic-aiai-in-aecengineering-ai