Skip to content

Comments

feat: map legacy conversational eval inputs and outputs#1276

Open
shannonsuhendra wants to merge 16 commits intomainfrom
feat/conversational-evals
Open

feat: map legacy conversational eval inputs and outputs#1276
shannonsuhendra wants to merge 16 commits intomainfrom
feat/conversational-evals

Conversation

@shannonsuhendra
Copy link

@shannonsuhendra shannonsuhendra commented Feb 6, 2026

Demo Video

Screen.Recording.2026-02-19.at.2.45.10.PM.mov

Summary

This PR is one of three that supports end-to-end evaluations for Conversational Agents. This approach first defines the output of low-code Conversational-Agent graph, and modifies the graph to produce this output. It also converts the legacy conversational-agent evaluation file's input/outputs into the same format as the Conversational Agent graph's input/outputs.

In this approach, there is no conversational-agent-specific eval-wrapper code. Low-code Conversational Agents are ran/evaluated identically to Autonomous Agents (input -> output). This should allow us to leverage future evaluation changes/improvements automatically. For more details and approach chosen, see this document.

Implementation details

  • https://github.com/UiPath/uipath-agents-python/pull/274

    • Defines the output-schema of Conversational Agents to be uipath__agent_response_messages: list[ConversationMessageData]. This output will contain the agent run's produced messages/tool-calls in the UiPath CAS data-type format. This ...Data format doesn't contain timestamps, message IDs, etc., see above document for context here.
    • Note that input-schema is already defined to contain messages: list[UiPathConversationMessage], which are the input chat-history defined to match the CAS format (contains timestamps, message IDs, etc.)
  • feat: output conversational message data in terminate-node uipath-langchain-python#601

    • Updates to the low-code Conversational-Agent graph to emit output as uipath__agent_response_messages: list[ConversationMessageData].
    • Modifies the terminate_node, which in the past returned nothing for conversational, to convert the agent-produced langgraph messages into the ConversationMessageData format and output that.
  • feat: map legacy conversational eval inputs and outputs #1276 (this one)

    • Introduces test_conversational_utils.py‎ which adds the legacy types for the agent.json's conversationalInputs and conversationalExpectedOutput. These are the saved legacy evaluation fields in the design-time evaluation experience.
    • In the migrate_evaluation_item, these legacy evaluation types are mapped into the proper input (list[UiPathConversationMessage]) and output (list[UiPathConversationMessageData]) types for Conversational Agents.
    • They are then set here as evaluation.inputs["messages"] and evaluation.expected_output["uipath__agent_response_messages"]. So the final evaluation inputs/expected-output contains the fields and proper typing for the conversational agent's inputs/outputs.

Next Steps:

  • Map attachments from legacy conversational-inputs, @norman-le will add this.

How has this been tested?

Example Legacy Evaluation Set, as downloaded from Studio Web.

{
    "fileName":"evaluation-set-1769099981427.json",
    "id":"e1db7a8c-3ff2-4774-adb4-5e2fb59195a3",
    "name":"Web Search Capabilities Evaluation Set",
    "batchSize":10,
    "evaluatorRefs":[
      "d52815b1-c9f4-4ba5-86ca-14285367fd32",
      "08b73637-b789-4f7f-a004-76a437e1a6d3"
    ],
    "evaluations":[
      {
        "id":"eda1f1bc-5fd4-4094-b0ba-afe7ec3189d4",
        "name":"Simulated Web Search, No Chat History",
        "inputs":{
          
        },
        "expectedOutput":{
          
        },
        "simulationInstructions":"Web Search tool returns accurate information about Paris being the capital of France, and information about Paris.",
        "expectedAgentBehavior":"The agent should use the Web Search tool to find information about the capital of France and provide a clear, concise answer mentioning Paris.",
        "simulateTools":true,
        "toolsToSimulate":[
          {
            "name":"Web Search"
          }
        ],
        "evalSetId":"1442afe8-1e00-4d0e-b20f-620a678c5ece",
        "createdAt":"2026-01-22T16:39:41.427Z",
        "updatedAt":"2026-01-22T20:41:02.151Z",
        "conversationalInputs":{
          "conversationHistory":[
            
          ],
          "currentUserPrompt":{
            "role":"user",
            "text":"Search for the capital of France and tell me about the city."
          }
        },
        "conversationalExpectedOutput":{
          "agentResponse":[
            {
              "role":"agent",
              "text":"I'll search for the capital of France and information about it.",
              "toolCalls":[
                {
                  "name":"Web Search",
                  "arguments":{
                    "provider":"GoogleCustomSearch",
                    "query":"capital of france and information about the city"
                  }
                }
              ]
            },
            {
              "role":"agent",
              "text":"The capital of France is Paris. Paris is known for its rich history, culture, art, and architecture. It is home to iconic landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. The city is also famous for its cuisine, fashion, and vibrant atmosphere."
            }
          ]
        }
      },
      {
        "id":"551c928a-7f5d-4b08-8bd7-c28a1f9acf77",
        "name":"Simulated Web Search, With Chat-History",
        "inputs":{
          
        },
        "expectedOutput":{
          
        },
        "simulationInstructions":"Web Search tool returns recent statistics on the number of successful Everest climbs, mentioning that it's an estimate that changes annually.",
        "expectedAgentBehavior":"The agent should maintain context from the previous exchanges about Mount Everest, use the Web Search tool to find current statistics on successful climbs, and provide a clear answer while noting that the number is an estimate and subject to change.",
        "simulateTools":true,
        "toolsToSimulate":[
          {
            "name":"Web Search"
          }
        ],
        "evalSetId":"1442afe8-1e00-4d0e-b20f-620a678c5ece",
        "createdAt":"2026-01-22T16:39:41.427Z",
        "updatedAt":"2026-01-22T16:39:41.427Z",
        "conversationalInputs":{
          "conversationHistory":[
            [
              {
                "role":"user",
                "text":"What's the tallest mountain in the world?"
              },
              {
                "role":"agent",
                "text":"I'll search for that information for you.",
                "toolCalls":[
                  {
                    "name":"Web Search",
                    "arguments":{
                      "provider":"GoogleCustomSearch",
                      "query":"tallest mountain in the world"
                    },
                    "result":{
                      "value":{
                        "formattedResults":"Mount Everest is the tallest mountain in the world, with a height of 8,848 meters (29,029 feet) above sea level."
                      }
                    }
                  }
                ]
              },
              {
                "role":"agent",
                "text":"The tallest mountain in the world is Mount Everest, which stands at 8,848 meters (29,029 feet) above sea level."
              }
            ],
            [
              {
                "role":"user",
                "text":"What country is it located in?"
              },
              {
                "role":"agent",
                "text":"I'll search for the location of Mount Everest.",
                "toolCalls":[
                  {
                    "name":"Web Search",
                    "arguments":{
                      "provider":"GoogleCustomSearch",
                      "query":"Mount Everest location country"
                    },
                    "result":{
                      "value":{
                        "formattedResults":"Mount Everest is located on the border between Nepal and Tibet (an autonomous region of China)."
                      }
                    }
                  }
                ]
              },
              {
                "role":"agent",
                "text":"Mount Everest is located on the border between Nepal and Tibet, which is an autonomous region of China. So, it's shared between two countries: Nepal and China."
              }
            ]
          ],
          "currentUserPrompt":{
            "role":"user",
            "text":"How many people have successfully climbed it?"
          }
        },
        "conversationalExpectedOutput":{
          "agentResponse":[
            {
              "role":"agent",
              "text":"I'll search for how many people have successfully climbed Mount Everest.",
              "toolCalls":[
                {
                  "name":"Web Search",
                  "arguments":{
                    "provider":"GoogleCustomSearch",
                    "query":"number of successful Mount Everest climbers"
                  }
                }
              ]
            },
            {
              "role":"agent",
              "text":"As of 2026, it is estimated that over 6,000 people have successfully climbed Mount Everest. This number changes each year as new climbers reach the summit during the climbing season. It's important to note that this figure is an estimate and may vary due to ongoing expeditions and record-keeping challenges."
            }
          ]
        }
      }
    ],
    "modelSettings":[
      
    ],
    "createdAt":"2026-01-22T16:39:41.428Z",
    "updatedAt":"2026-01-22T20:41:44.938Z",
    "agentMemoryEnabled":false,
    "agentMemorySettings":[
      
    ],
    "lineByLineEvaluation":false
}

After running uv run uipath eval agent.json evaluations/eval-sets/evaluation-set-web-search.json --output-file evaluation-set-web-search-output.json:

{
  "evaluationSetName": "Web Search Capabilities Evaluation Set",
  "evaluationSetResults": [
    {
      "evaluationName": "Simulated Web Search, No Chat History",
      "evaluationRunResults": [
        {
          "evaluatorName": "Default Evaluator",
          "evaluatorId": "d52815b1-c9f4-4ba5-86ca-14285367fd32",
          "result": {
            "score": 92.0,
            "details": "The semantic similarity between the ExpectedOutput and ActualOutput is very high. Both outputs contain two agent response messages: the first initiates a web search for the capital of France, and the second provides a detailed answer about Paris, including its status as the capital, cultural significance, landmarks, and reputation. The ActualOutput includes additional details (e.g., location on the Seine River, global hub, creativity and innovation) and cites sources, which enhances completeness. Minor differences include the search query wording and the presence of empty contentParts in the first message, but these do not significantly affect the meaning. Overall, the contextual equivalence and accuracy are well maintained, justifying a score of 92.",
            "evaluationTime": 2.122053861618042
          }
        },
        {
          "evaluatorName": "Default Trajectory Evaluator",
          "evaluatorId": "08b73637-b789-4f7f-a004-76a437e1a6d3",
          "result": {
            "score": 100.0,
            "details": "The agent successfully used the Web Search tool to find information about the capital of France and provided a clear, concise answer mentioning Paris. The response included additional relevant details about Paris, such as its cultural significance and famous landmarks, which aligns with the user's request to \"tell me about the city.\" The answer was accurate, complete, and cited multiple authoritative sources, meeting high standards for accuracy and completeness. No errors or omissions were observed, and the agent's behavior matched the expected output, allowing for reasonable variations in language and expression.",
            "evaluationTime": 4.094098091125488
          }
        }
      ],
      "agentExecutionOutput": null
    },
    {
      "evaluationName": "Simulated Web Search, With Chat-History",
      "evaluationRunResults": [
        {
          "evaluatorName": "Default Evaluator",
          "evaluatorId": "d52815b1-c9f4-4ba5-86ca-14285367fd32",
          "result": {
            "score": 95.0,
            "details": "The ActualOutput and ExpectedOutput are semantically very similar. Both contain two messages: the first indicates a web search for the number of successful Mount Everest climbers, and the second provides an estimate (over 6,000 as of 2026) with context that the number changes yearly and is an estimate. Minor differences include: (1) the ActualOutput's search query wording ('how many people have successfully climbed Mount Everest' vs. 'number of successful Mount Everest climbers'), which is equivalent in meaning; (2) the ActualOutput includes citations in the response, which adds value but does not change the core meaning; (3) some fields in ActualOutput are empty or have placeholder values, but this does not affect the semantic content. Overall, the meaning and context are preserved, with only minor variations. Therefore, a score of 95 is appropriate.",
            "evaluationTime": 2.608504056930542
          }
        },
        {
          "evaluatorName": "Default Trajectory Evaluator",
          "evaluatorId": "08b73637-b789-4f7f-a004-76a437e1a6d3",
          "result": {
            "score": 95.0,
            "details": "The agent successfully maintained context about Mount Everest throughout the conversation, used the Web Search tool to retrieve current statistics, and provided a clear, accurate answer: \"As of 2026, over 6,000 people have successfully climbed Mount Everest. This number is an estimate and increases each year as more climbers reach the summit.\" The agent explicitly noted the number is an estimate and subject to change, which aligns with the expected behavior. The response included proper citations from multiple reputable sources. The language was clear, concise, and appropriately hedged. The only minor improvement would be to briefly mention the range (6,000\u20136,500) seen in the sources, but the answer is fully correct and meets high standards for accuracy and completeness. Therefore, a score of 95 is appropriate.",
            "evaluationTime": 4.9156341552734375
          }
        }
      ],
      "agentExecutionOutput": null
    }
  ]
}

Are there any breaking changes?

  • None
  • Under Feature Flag
  • UiPath CLI Runtime changes

@github-actions github-actions bot added test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository labels Feb 6, 2026
@maxduu maxduu changed the title feat: get conversational output and map to eval output feat: map conversational eval inputs and outputs Feb 18, 2026
Comment on lines +230 to +235
# TODO Add attachments if present
# if eval_input.current_user_prompt.attachments:
# for attachment in eval_input.current_user_prompt.attachments:
# content_parts.append(
# UiPathConversationContentPart(...)
# )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @norman-le to add after this PR.

@maxduu maxduu changed the title feat: map conversational eval inputs and outputs feat: map legacy conversational eval inputs and outputs Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:uipath-langchain Triggers tests in the uipath-langchain-python repository test:uipath-llamaindex Triggers tests in the uipath-llamaindex-python repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants