fix: save trace as agent run suspends and resumes#1350
Merged
Conversation
Chibionos
requested changes
Feb 20, 2026
src/uipath/_cli/_evals/_runtime.py
Outdated
| self._log_handlers.clear() | ||
|
|
||
|
|
||
| def _serialize_span(span: ReadableSpan) -> dict[str, Any]: |
Contributor
There was a problem hiding this comment.
move these to helpers, keep the file clean and add tests for all code added here.
Chibionos
reviewed
Feb 20, 2026
| if self.context.resume: | ||
| saved_spans = await self._load_execution_spans(eval_item.id) | ||
| if saved_spans: | ||
| spans = saved_spans + spans |
Contributor
There was a problem hiding this comment.
check if there are duplicates
Chibionos
approved these changes
Feb 20, 2026
f3b089c to
5842582
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem
When running evaluations locally on agents that use sub-agent tools (like "Backwards-String-Generator"), the evaluation process involves a suspend/resume cycle:
The trajectory evaluator (which grades how well the agent followed expected steps by examining the trace) was scoring 0 for these agents, even though the agent was running as expected.
Root Cause 1: Spans lost across process boundary
OpenTelemetry spans are stored in memory by the ExecutionSpanExporter. When the first process suspends and exits, all
32+ spans from that run are lost due to the line
self.span_exporter.clear(execution_id). The second process starts fresh with zero spans. The trajectory evaluator only sees the resume-phase spans and has no record of the agent's actual work.Fix: We added span persistence to SQLite. On suspend, all collected spans are serialized to JSON and saved to the existing __uipath/state.db database (which was already used for storing resume triggers). On resume, the saved spans are loaded back and prepended to the new spans. This required writing _serialize_span() and _deserialize_span() helper functions to convert OpenTelemetry ReadableSpan objects to/from JSON-compatible dicts.
Result: Trajectory evaluator went from 0 → 50.
Root Cause 2: Resume-phase spans invisible to evaluator (exec_id=None)
Even with span persistence working, the trajectory evaluator scored 50 instead of 100. It could see the tool call (from saved first-run spans) but not the successful tool result (from resume-phase spans).
Every span needs an execution.id attribute to be collected by the ExecutionSpanExporter. This ID is propagated from parent spans to child spans by UiPathExecutionTraceProcessorMixin.on_start() in the tracing infrastructure. However, this propagation requires parent_span.is_recording() to return True.
On resume, the "Evaluation" span is restored as a NonRecordingSpan. NonRecordingSpan.is_recording() returns False, so execution.id propagation breaks at this boundary. Resume-phase spans never get execution.id, so the exporter silently drops them.
The tracing infrastructure code (uipath.core.tracing.processors) is in a separate installed package we can't modify. But the eval-specific ExecutionSpanProcessor (which extends it) is in our code.
Fix: Added a fallback in ExecutionSpanProcessor.on_start(): after the parent's propagation attempt, if execution.id is still missing, read it from the execution_id_context ContextVar (which is already set before the runtime executes). This ensures all spans during eval execution get tagged correctly, regardless of whether the parent span is recording.
Result: Trajectory evaluator went from 50 → 100.
Development Package
uipath pack --nolockto get the latest dev build from this PR (requires version range).