Add agent docs eval: test that AI can build transfer scripts #73
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds an end-to-end eval that uses the Amp SDK to test whether AI agents can successfully build working code using Tempo docs.
What it does
Why
Per discussion in #product-docs - we're seeing agents (like Opus 4.5) get confused about:
This eval will help us iterate on docs until agents succeed consistently.
Files changed
e2e/agent-transfer-funds.test.ts- The eval testpackage.json- Added @sourcegraph/amp-sdk dependencyManual step needed
After merging, add this to
.github/workflows/verify.ymlto run the eval on schedule:Also add
AMP_API_KEYto repository secrets.