Skip to content

Comments

fix: save trace as agent run suspends and resumes#1350

Merged
mathurk merged 5 commits intomainfrom
fix/suspend_resume_spans
Feb 20, 2026
Merged

fix: save trace as agent run suspends and resumes#1350
mathurk merged 5 commits intomainfrom
fix/suspend_resume_spans

Conversation

@mathurk
Copy link
Collaborator

@mathurk mathurk commented Feb 19, 2026

The Problem

When running evaluations locally on agents that use sub-agent tools (like "Backwards-String-Generator"), the evaluation process involves a suspend/resume cycle:

  1. Initial run: The agent calls a sub-agent tool, which creates a remote job.
  2. Resume run: A second CLI invocation picks up where the first left off, gets the job result, and completes the agent execution.

The trajectory evaluator (which grades how well the agent followed expected steps by examining the trace) was scoring 0 for these agents, even though the agent was running as expected.


Root Cause 1: Spans lost across process boundary

OpenTelemetry spans are stored in memory by the ExecutionSpanExporter. When the first process suspends and exits, all
32+ spans from that run are lost due to the line self.span_exporter.clear(execution_id). The second process starts fresh with zero spans. The trajectory evaluator only sees the resume-phase spans and has no record of the agent's actual work.

Fix: We added span persistence to SQLite. On suspend, all collected spans are serialized to JSON and saved to the existing __uipath/state.db database (which was already used for storing resume triggers). On resume, the saved spans are loaded back and prepended to the new spans. This required writing _serialize_span() and _deserialize_span() helper functions to convert OpenTelemetry ReadableSpan objects to/from JSON-compatible dicts.

Result: Trajectory evaluator went from 0 → 50.


Root Cause 2: Resume-phase spans invisible to evaluator (exec_id=None)

Even with span persistence working, the trajectory evaluator scored 50 instead of 100. It could see the tool call (from saved first-run spans) but not the successful tool result (from resume-phase spans).

Every span needs an execution.id attribute to be collected by the ExecutionSpanExporter. This ID is propagated from parent spans to child spans by UiPathExecutionTraceProcessorMixin.on_start() in the tracing infrastructure. However, this propagation requires parent_span.is_recording() to return True.

On resume, the "Evaluation" span is restored as a NonRecordingSpan. NonRecordingSpan.is_recording() returns False, so execution.id propagation breaks at this boundary. Resume-phase spans never get execution.id, so the exporter silently drops them.

The tracing infrastructure code (uipath.core.tracing.processors) is in a separate installed package we can't modify. But the eval-specific ExecutionSpanProcessor (which extends it) is in our code.

Fix: Added a fallback in ExecutionSpanProcessor.on_start(): after the parent's propagation attempt, if execution.id is still missing, read it from the execution_id_context ContextVar (which is already set before the runtime executes). This ensures all spans during eval execution get tagged correctly, regardless of whether the parent span is recording.

Result: Trajectory evaluator went from 50 → 100.

Screenshot 2026-02-19 at 4 24 45 PM

Development Package

  • Use uipath pack --nolock to get the latest dev build from this PR (requires version range).
  • Add this package as a dependency in your pyproject.toml:
[project]
dependencies = [
  # Exact version:
  "uipath==2.8.47.dev1013504954",

  # Any version from PR
  "uipath>=2.8.47.dev1013500000,<2.8.47.dev1013510000"
]

[[tool.uv.index]]
name = "testpypi"
url = "https://test.pypi.org/simple/"
publish-url = "https://test.pypi.org/legacy/"
explicit = true

[tool.uv.sources]
uipath = { index = "testpypi" }

[tool.uv]
override-dependencies = [
    "uipath>=2.8.47.dev1013500000,<2.8.47.dev1013510000",
]

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Feb 19, 2026
@mathurk mathurk added the build:dev Create a dev build from the pr label Feb 19, 2026
Copy link
Contributor

@Chibionos Chibionos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need tests

self._log_handlers.clear()


def _serialize_span(span: ReadableSpan) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move these to helpers, keep the file clean and add tests for all code added here.

if self.context.resume:
saved_spans = await self._load_execution_spans(eval_item.id)
if saved_spans:
spans = saved_spans + spans
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if there are duplicates

@mathurk mathurk force-pushed the fix/suspend_resume_spans branch from f3b089c to 5842582 Compare February 20, 2026 20:25
@mathurk mathurk merged commit 18493f0 into main Feb 20, 2026
95 checks passed
@mathurk mathurk deleted the fix/suspend_resume_spans branch February 20, 2026 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build:dev Create a dev build from the pr test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants