feat: map legacy conversational eval inputs and outputs by shannonsuhendra · Pull Request #1276 · UiPath/uipath-python

shannonsuhendra · 2026-02-06T21:13:43Z

Demo Video

Screen.Recording.2026-02-19.at.2.45.10.PM.mov

Summary

This PR is one of three that supports end-to-end evaluations for Conversational Agents. This approach first defines the output of low-code Conversational-Agent graph, and modifies the graph to produce this output. It also converts the legacy conversational-agent evaluation file's input/outputs into the same format as the Conversational Agent graph's input/outputs.

In this approach, there is no conversational-agent-specific eval-wrapper code. Low-code Conversational Agents are ran/evaluated identically to Autonomous Agents (input -> output). This should allow us to leverage future evaluation changes/improvements automatically. For more details and approach chosen, see this document.

Implementation details

https://github.com/UiPath/uipath-agents-python/pull/274
- Defines the output-schema of Conversational Agents to be uipath__agent_response_messages: list[ConversationMessageData]. This output will contain the agent run's produced messages/tool-calls in the UiPath CAS data-type format. This ...Data format doesn't contain timestamps, message IDs, etc., see above document for context here.
- Note that input-schema is already defined to contain messages: list[UiPathConversationMessage], which are the input chat-history defined to match the CAS format (contains timestamps, message IDs, etc.)
feat: output conversational message data in terminate-node uipath-langchain-python#601
- Updates to the low-code Conversational-Agent graph to emit output as uipath__agent_response_messages: list[ConversationMessageData].
- Modifies the terminate_node, which in the past returned nothing for conversational, to convert the agent-produced langgraph messages into the ConversationMessageData format and output that.
feat: map legacy conversational eval inputs and outputs #1276 (this one)
- Introduces test_conversational_utils.py‎ which adds the legacy types for the agent.json's conversationalInputs and conversationalExpectedOutput. These are the saved legacy evaluation fields in the design-time evaluation experience.
- In the migrate_evaluation_item, these legacy evaluation types are mapped into the proper input (list[UiPathConversationMessage]) and output (list[UiPathConversationMessageData]) types for Conversational Agents.
- They are then set here as evaluation.inputs["messages"] and evaluation.expected_output["uipath__agent_response_messages"]. So the final evaluation inputs/expected-output contains the fields and proper typing for the conversational agent's inputs/outputs.

Next Steps:

Map attachments from legacy conversational-inputs, @norman-le will add this.

How has this been tested?

Example Legacy Evaluation Set, as downloaded from Studio Web.

{
    "fileName":"evaluation-set-1769099981427.json",
    "id":"e1db7a8c-3ff2-4774-adb4-5e2fb59195a3",
    "name":"Web Search Capabilities Evaluation Set",
    "batchSize":10,
    "evaluatorRefs":[
      "d52815b1-c9f4-4ba5-86ca-14285367fd32",
      "08b73637-b789-4f7f-a004-76a437e1a6d3"
    ],
    "evaluations":[
      {
        "id":"eda1f1bc-5fd4-4094-b0ba-afe7ec3189d4",
        "name":"Simulated Web Search, No Chat History",
        "inputs":{
          
        },
        "expectedOutput":{
          
        },
        "simulationInstructions":"Web Search tool returns accurate information about Paris being the capital of France, and information about Paris.",
        "expectedAgentBehavior":"The agent should use the Web Search tool to find information about the capital of France and provide a clear, concise answer mentioning Paris.",
        "simulateTools":true,
        "toolsToSimulate":[
          {
            "name":"Web Search"
          }
        ],
        "evalSetId":"1442afe8-1e00-4d0e-b20f-620a678c5ece",
        "createdAt":"2026-01-22T16:39:41.427Z",
        "updatedAt":"2026-01-22T20:41:02.151Z",
        "conversationalInputs":{
          "conversationHistory":[
            
          ],
          "currentUserPrompt":{
            "role":"user",
            "text":"Search for the capital of France and tell me about the city."
          }
        },
        "conversationalExpectedOutput":{
          "agentResponse":[
            {
              "role":"agent",
              "text":"I'll search for the capital of France and information about it.",
              "toolCalls":[
                {
                  "name":"Web Search",
                  "arguments":{
                    "provider":"GoogleCustomSearch",
                    "query":"capital of france and information about the city"
                  }
                }
              ]
            },
            {
              "role":"agent",
              "text":"The capital of France is Paris. Paris is known for its rich history, culture, art, and architecture. It is home to iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. The city is also famous for its cuisine, fashion, and vibrant atmosphere."
            }
          ]
        }
      },
      {
        "id":"551c928a-7f5d-4b08-8bd7-c28a1f9acf77",
        "name":"Simulated Web Search, With Chat-History",
        "inputs":{
          
        },
        "expectedOutput":{
          
        },
        "simulationInstructions":"Web Search tool returns recent statistics on the number of successful Everest climbs, mentioning that it's an estimate that changes annually.",
        "expectedAgentBehavior":"The agent should maintain context from the previous exchanges about Mount Everest, use the Web Search tool to find current statistics on successful climbs, and provide a clear answer while noting that the number is an estimate and subject to change.",
        "simulateTools":true,
        "toolsToSimulate":[
          {
            "name":"Web Search"
          }
        ],
        "evalSetId":"1442afe8-1e00-4d0e-b20f-620a678c5ece",
        "createdAt":"2026-01-22T16:39:41.427Z",
        "updatedAt":"2026-01-22T16:39:41.427Z",
        "conversationalInputs":{
          "conversationHistory":[
            [
              {
                "role":"user",
                "text":"What's the tallest mountain in the world?"
              },
              {
                "role":"agent",
                "text":"I'll search for that information for you.",
                "toolCalls":[
                  {
                    "name":"Web Search",
                    "arguments":{
                      "provider":"GoogleCustomSearch",
                      "query":"tallest mountain in the world"
                    },
                    "result":{
                      "value":{
                        "formattedResults":"Mount Everest is the tallest mountain in the world, with a height of 8,848 meters (29,029 feet) above sea level."
                      }
                    }
                  }
                ]
              },
              {
                "role":"agent",
                "text":"The tallest mountain in the world is Mount Everest, which stands at 8,848 meters (29,029 feet) above sea level."
              }
            ],
            [
              {
                "role":"user",
                "text":"What country is it located in?"
              },
              {
                "role":"agent",
                "text":"I'll search for the location of Mount Everest.",
                "toolCalls":[
                  {
                    "name":"Web Search",
                    "arguments":{
                      "provider":"GoogleCustomSearch",
                      "query":"Mount Everest location country"
                    },
                    "result":{
                      "value":{
                        "formattedResults":"Mount Everest is located on the border between Nepal and Tibet (an autonomous region of China)."
                      }
                    }
                  }
                ]
              },
              {
                "role":"agent",
                "text":"Mount Everest is located on the border between Nepal and Tibet, which is an autonomous region of China. So, it's shared between two countries: Nepal and China."
              }
            ]
          ],
          "currentUserPrompt":{
            "role":"user",
            "text":"How many people have successfully climbed it?"
          }
        },
        "conversationalExpectedOutput":{
          "agentResponse":[
            {
              "role":"agent",
              "text":"I'll search for how many people have successfully climbed Mount Everest.",
              "toolCalls":[
                {
                  "name":"Web Search",
                  "arguments":{
                    "provider":"GoogleCustomSearch",
                    "query":"number of successful Mount Everest climbers"
                  }
                }
              ]
            },
            {
              "role":"agent",
              "text":"As of 2026, it is estimated that over 6,000 people have successfully climbed Mount Everest. This number changes each year as new climbers reach the summit during the climbing season. It's important to note that this figure is an estimate and may vary due to ongoing expeditions and record-keeping challenges."
            }
          ]
        }
      }
    ],
    "modelSettings":[
      
    ],
    "createdAt":"2026-01-22T16:39:41.428Z",
    "updatedAt":"2026-01-22T20:41:44.938Z",
    "agentMemoryEnabled":false,
    "agentMemorySettings":[
      
    ],
    "lineByLineEvaluation":false
}

After running uv run uipath eval agent.json evaluations/eval-sets/evaluation-set-web-search.json --output-file evaluation-set-web-search-output.json:

{
  "evaluationSetName": "Web Search Capabilities Evaluation Set",
  "evaluationSetResults": [
    {
      "evaluationName": "Simulated Web Search, No Chat History",
      "evaluationRunResults": [
        {
          "evaluatorName": "Default Evaluator",
          "evaluatorId": "d52815b1-c9f4-4ba5-86ca-14285367fd32",
          "result": {
            "score": 92.0,
            "details": "The semantic similarity between the ExpectedOutput and ActualOutput is very high. Both outputs contain two agent response messages: the first initiates a web search for the capital of France, and the second provides a detailed answer about Paris, including its status as the capital, cultural significance, landmarks, and reputation. The ActualOutput includes additional details (e.g., location on the Seine River, global hub, creativity and innovation) and cites sources, which enhances completeness. Minor differences include the search query wording and the presence of empty contentParts in the first message, but these do not significantly affect the meaning. Overall, the contextual equivalence and accuracy are well maintained, justifying a score of 92.",
            "evaluationTime": 2.122053861618042
          }
        },
        {
          "evaluatorName": "Default Trajectory Evaluator",
          "evaluatorId": "08b73637-b789-4f7f-a004-76a437e1a6d3",
          "result": {
            "score": 100.0,
            "details": "The agent successfully used the Web Search tool to find information about the capital of France and provided a clear, concise answer mentioning Paris. The response included additional relevant details about Paris, such as its cultural significance and famous landmarks, which aligns with the user's request to \"tell me about the city.\" The answer was accurate, complete, and cited multiple authoritative sources, meeting high standards for accuracy and completeness. No errors or omissions were observed, and the agent's behavior matched the expected output, allowing for reasonable variations in language and expression.",
            "evaluationTime": 4.094098091125488
          }
        }
      ],
      "agentExecutionOutput": null
    },
    {
      "evaluationName": "Simulated Web Search, With Chat-History",
      "evaluationRunResults": [
        {
          "evaluatorName": "Default Evaluator",
          "evaluatorId": "d52815b1-c9f4-4ba5-86ca-14285367fd32",
          "result": {
            "score": 95.0,
            "details": "The ActualOutput and ExpectedOutput are semantically very similar. Both contain two messages: the first indicates a web search for the number of successful Mount Everest climbers, and the second provides an estimate (over 6,000 as of 2026) with context that the number changes yearly and is an estimate. Minor differences include: (1) the ActualOutput's search query wording ('how many people have successfully climbed Mount Everest' vs. 'number of successful Mount Everest climbers'), which is equivalent in meaning; (2) the ActualOutput includes citations in the response, which adds value but does not change the core meaning; (3) some fields in ActualOutput are empty or have placeholder values, but this does not affect the semantic content. Overall, the meaning and context are preserved, with only minor variations. Therefore, a score of 95 is appropriate.",
            "evaluationTime": 2.608504056930542
          }
        },
        {
          "evaluatorName": "Default Trajectory Evaluator",
          "evaluatorId": "08b73637-b789-4f7f-a004-76a437e1a6d3",
          "result": {
            "score": 95.0,
            "details": "The agent successfully maintained context about Mount Everest throughout the conversation, used the Web Search tool to retrieve current statistics, and provided a clear, accurate answer: \"As of 2026, over 6,000 people have successfully climbed Mount Everest. This number is an estimate and increases each year as more climbers reach the summit.\" The agent explicitly noted the number is an estimate and subject to change, which aligns with the expected behavior. The response included proper citations from multiple reputable sources. The language was clear, concise, and appropriately hedged. The only minor improvement would be to briefly mention the range (6,000\u20136,500) seen in the sources, but the answer is fully correct and meets high standards for accuracy and completeness. Therefore, a score of 95 is appropriate.",
            "evaluationTime": 4.9156341552734375
          }
        }
      ],
      "agentExecutionOutput": null
    }
  ]
}

Are there any breaking changes?

None
Under Feature Flag
UiPath CLI Runtime changes

maxduu · 2026-02-19T21:11:25Z

src/uipath/_cli/_evals/_conversational_utils.py

+        # TODO Add attachments if present
+        # if eval_input.current_user_prompt.attachments:
+        #     for attachment in eval_input.current_user_prompt.attachments:
+        #         content_parts.append(
+        #             UiPathConversationContentPart(...)
+        #         )


cc @norman-le to add after this PR.

feat: get conversational output and map to eval output

c4d83ff

github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Feb 6, 2026

maxduu added 8 commits February 12, 2026 10:14

Merge branch 'main' into feat/conversational-evals

4e0872c

Merge branch 'main' into feat/conversational-evals

344dc58

Merge branch 'main' into feat/conversational-evals

f9fa507

feat(temp): preliminary eval mapper changes

24ac544

Merge branch 'main' into feat/conversational-evals

64fdd48

feat: map legacy conversation eval inputs and outputs

c225c79

fix: remove logs

7086fe7

Merge branch 'main' into feat/conversational-evals

b271a94

maxduu changed the title ~~feat: get conversational output and map to eval output~~ feat: map conversational eval inputs and outputs Feb 18, 2026

maxduu added 7 commits February 19, 2026 00:17

Merge branch 'main' into feat/conversational-evals

81aa04b

fix: attachment aliases

10cf34f

feat: add tests for conversational_utils

9c6cd46

fix: test mypy issues

27df9df

fix: add tests and handle empty content

dc93263

Merge branch 'main' into feat/conversational-evals

dd669d6

chore: update version

fe731c0

maxduu reviewed Feb 19, 2026

View reviewed changes

maxduu mentioned this pull request Feb 19, 2026

feat: output conversational message data in terminate-node UiPath/uipath-langchain-python#601

Open

3 tasks

maxduu requested review from GabrielVasilescu04, JoshParkSJ, cristipufu, mike-deem-uipath and norman-le February 19, 2026 21:48

maxduu changed the title ~~feat: map conversational eval inputs and outputs~~ feat: map legacy conversational eval inputs and outputs Feb 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

feat: map legacy conversational eval inputs and outputs#1276

feat: map legacy conversational eval inputs and outputs#1276
shannonsuhendra wants to merge 16 commits intomainfrom
feat/conversational-evals

shannonsuhendra commented Feb 6, 2026 •

edited by maxduu

Loading

Uh oh!

maxduu Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

shannonsuhendra commented Feb 6, 2026 • edited by maxduu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Demo Video

Summary

Implementation details

Next Steps:

How has this been tested?

Are there any breaking changes?

Uh oh!

maxduu Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shannonsuhendra commented Feb 6, 2026 •

edited by maxduu

Loading