Skip to content
Xplore
Deploy

Only what passes goes live. Everything else waits.

CI/CD for agents. Promotion gates, version control, rollback. No more "let's try this in production and hope it works."

For development teams.

  • Gated promotion — automated or manual

    Autopromote best, threshold gate, or manual review. Candidates sit on a branch until they earn promotion.

  • Every version tracked

    v1 through v6 — all available. Compare any two. Roll back in one click if production scores drop.

  • Pre-deploy evaluation gate

    The agent must pass your eval suite before promotion. Same evaluators used in training — no gap between dev and prod.

Promote policy configuration — autopromote_best, threshold, manual
Agent overview — 6 versions, 148 runs, performance over time
Deploy

Three promotion policies — pick the risk model that fits.

Every version carries a full evaluation snapshot. The same scores that drove training are the scores that gate deployment.

Autopromote best

The highest-scoring iteration goes live automatically. Zero delay between improvement and deployment.

Threshold gate

Agents only go live when they pass your minimum score. No regression reaches production.

Manual approval

Every promotion requires a human sign-off. Full control for regulated environments.

Deploy

Regression certification — a deployment gate, not a test step.

Agent behavior is non-deterministic. A version that scored well in training may behave differently under production load. Running the full evaluation suite at promotion time catches regressions that static testing misses.

Agent overview showing 6 versions and 148 evaluation runs
Agent structure diff showing exact configuration changes between versions
Deploy

Full version diffs — a governance requirement.

Regulated industries need to answer 'what changed and why.' Every promotion carries the config diff and the score delta. Auditors see exactly which tools were added, which rules were modified, and how each change affected scores.

For the business.

Zero downtime from bad releases

Agents that don't pass evaluation never reach users. No more "we pushed a bad prompt" incidents.

Audit trail for regulators

Every version, every promotion decision, every score — logged. Show compliance officers exactly what's running and why.

Instant rollback

If v6 degrades after a week, roll back to v5 in one click. No re-engineering needed.