Conversation
There was a problem hiding this comment.
Pull request overview
Adds additional Node.js E2E coverage by introducing new Vitest suites and the corresponding replay-proxy snapshots, extending coverage around tool results, streaming/event fidelity, lifecycle APIs, hooks, and built-in tool behaviors. It also updates the replaying CAPI proxy to support “request-only” snapshots used for timeout scenarios.
Changes:
- Added multiple new Node.js E2E test suites (tool results, streaming fidelity, lifecycle, hooks, built-in tools, multi-turn, error resilience) and their snapshot YAMLs under
test/snapshots/**. - Updated the replaying proxy to hang indefinitely when a request matches a “request-only” snapshot (for timeout tests).
- Updated a handful of existing snapshots and skipped the Node compaction E2E suite.
Reviewed changes
Copilot reviewed 51 out of 51 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/harness/replayingCapiProxy.ts | Adds “request-only snapshot” detection to support timeout test scenarios. |
| nodejs/test/e2e/builtin_tools.test.ts | New E2E suite covering built-in tools (bash/view/edit/create/grep/glob). |
| nodejs/test/e2e/client_lifecycle.test.ts | New E2E coverage for last-session-id and lifecycle event emission. |
| nodejs/test/e2e/compaction.test.ts | Marks compaction suite skipped. |
| nodejs/test/e2e/error_resilience.test.ts | New E2E coverage for error behaviors (destroyed sessions, abort, resume invalid). |
| nodejs/test/e2e/event_fidelity.test.ts | New E2E coverage validating ordering and required fields on emitted events. |
| nodejs/test/e2e/hooks_extended.test.ts | New E2E coverage for extended session hooks. |
| nodejs/test/e2e/multi_turn.test.ts | New E2E coverage for multi-turn behavior using prior tool/file results. |
| nodejs/test/e2e/session_config.test.ts | New E2E coverage for workingDirectory, provider config, and attachments. |
| nodejs/test/e2e/session_lifecycle.test.ts | New E2E coverage for list/delete sessions and getMessages behavior. |
| nodejs/test/e2e/streaming_fidelity.test.ts | New E2E coverage ensuring deltas appear only when streaming is enabled. |
| nodejs/test/e2e/tool_results.test.ts | New E2E coverage for ToolResultObject handling and Zod param validation. |
| test/snapshots/ask_user/should_invoke_user_input_handler_when_model_uses_ask_user_tool.yaml | Updates expected assistant text for ask_user flow. |
| test/snapshots/builtin_tools/should_capture_exit_code_in_output.yaml | New snapshot for bash output/exit code behavior. |
| test/snapshots/builtin_tools/should_capture_stderr_output.yaml | New snapshot for bash stderr capture behavior. |
| test/snapshots/builtin_tools/should_create_a_new_file.yaml | New snapshot for create tool behavior. |
| test/snapshots/builtin_tools/should_edit_a_file_successfully.yaml | New snapshot for edit tool behavior. |
| test/snapshots/builtin_tools/should_find_files_by_pattern.yaml | New snapshot for glob tool behavior. |
| test/snapshots/builtin_tools/should_handle_nonexistent_file_gracefully.yaml | New snapshot for view error handling. |
| test/snapshots/builtin_tools/should_read_file_with_line_range.yaml | New snapshot for ranged view behavior. |
| test/snapshots/builtin_tools/should_search_for_patterns_in_files.yaml | New snapshot for grep behavior. |
| test/snapshots/client_lifecycle/should_emit_session_lifecycle_events.yaml | New snapshot for lifecycle-events test. |
| test/snapshots/client_lifecycle/should_return_last_session_id_after_sending_a_message.yaml | New snapshot for last-session-id test. |
| test/snapshots/compaction/should_trigger_compaction_with_low_threshold_and_emit_events.yaml | Updates snapshot content for compaction scenario. |
| test/snapshots/event_fidelity/should_emit_assistant_message_with_messageid.yaml | New snapshot for assistant messageId behavior. |
| test/snapshots/event_fidelity/should_emit_events_in_correct_order_for_tool_using_conversation.yaml | New snapshot for event ordering in tool-using flow. |
| test/snapshots/event_fidelity/should_emit_tool_execution_events_with_correct_fields.yaml | New snapshot for tool execution events. |
| test/snapshots/event_fidelity/should_include_valid_fields_on_all_events.yaml | New snapshot for general event field validity. |
| test/snapshots/hooks/should_invoke_both_pretooluse_and_posttooluse_hooks_for_a_single_tool_call.yaml | Updates expected assistant text for hook scenario. |
| test/snapshots/hooks/should_invoke_pretooluse_hook_when_model_runs_a_tool.yaml | Updates expected assistant text for hook scenario. |
| test/snapshots/hooks_extended/should_invoke_onerroroccurred_hook_when_error_occurs.yaml | New snapshot for extended hook coverage. |
| test/snapshots/hooks_extended/should_invoke_onsessionend_hook_when_session_is_destroyed.yaml | New snapshot for extended hook coverage. |
| test/snapshots/hooks_extended/should_invoke_onsessionstart_hook_on_new_session.yaml | New snapshot for extended hook coverage. |
| test/snapshots/hooks_extended/should_invoke_onuserpromptsubmitted_hook_when_sending_a_message.yaml | New snapshot for extended hook coverage. |
| test/snapshots/multi_turn/should_handle_file_creation_then_reading_across_turns.yaml | New snapshot for create+read multi-turn flow. |
| test/snapshots/multi_turn/should_use_tool_results_from_previous_turns.yaml | New snapshot for multi-turn state usage. |
| test/snapshots/permissions/should_deny_permission_when_handler_returns_denied.yaml | Updates expected assistant text in permissions flow. |
| test/snapshots/permissions/should_handle_permission_handler_errors_gracefully.yaml | Updates shell tool description text in permissions flow. |
| test/snapshots/session/should_abort_a_session.yaml | Updates expected assistant response in abort scenario. |
| test/snapshots/session_config/should_accept_message_attachments.yaml | New snapshot for attachments request content. |
| test/snapshots/session_config/should_use_workingdirectory_for_tool_execution.yaml | New snapshot validating workingDirectory paths. |
| test/snapshots/session_lifecycle/should_delete_session_permanently.yaml | New snapshot for delete-session lifecycle behavior. |
| test/snapshots/session_lifecycle/should_list_created_sessions_after_sending_a_message.yaml | New snapshot for session listing after activity. |
| test/snapshots/session_lifecycle/should_return_events_via_getmessages_after_conversation.yaml | New snapshot for getMessages lifecycle behavior. |
| test/snapshots/session_lifecycle/should_support_multiple_concurrent_sessions.yaml | New snapshot for concurrent sessions scenario. |
| test/snapshots/skills/should_load_and_apply_skill_from_skilldirectories.yaml | Updates expected assistant text for skills scenario. |
| test/snapshots/streaming_fidelity/should_not_produce_deltas_when_streaming_is_disabled.yaml | New snapshot for non-streaming case. |
| test/snapshots/streaming_fidelity/should_produce_delta_events_when_streaming_is_enabled.yaml | New snapshot for streaming-enabled case. |
| test/snapshots/tool_results/should_handle_structured_toolresultobject_from_custom_tool.yaml | New snapshot for structured ToolResultObject success case. |
| test/snapshots/tool_results/should_handle_tool_result_with_failure_resulttype.yaml | New snapshot for ToolResultObject failure case. |
| test/snapshots/tool_results/should_pass_validated_zod_parameters_to_tool_handler.yaml | New snapshot for Zod-validated tool params flow. |
Comments suppressed due to low confidence (2)
nodejs/test/e2e/tool_results.test.ts:13
openAiEndpointis destructured fromcreateSdkTestContext()but never used. With the repo's ESLint config (@typescript-eslint/no-unused-vars), this will fail lint; remove it or rename to_openAiEndpointif intentionally unused.
describe("Tool Results", async () => {
const { copilotClient: client } = await createSdkTestContext();
nodejs/test/e2e/hooks_extended.test.ts:16
workDiris destructured fromcreateSdkTestContext()but never used. This will fail ESLint@typescript-eslint/no-unused-vars; remove it or rename to_workDirif intentionally unused.
describe("Extended session hooks", async () => {
const { copilotClient: client } = await createSdkTestContext();
test/harness/replayingCapiProxy.ts
Outdated
| const headers = { | ||
| "content-type": "text/event-stream", | ||
| ...commonResponseHeaders, | ||
| }; | ||
| options.onResponseStart(200, headers); | ||
| // Never call onResponseEnd - hang indefinitely for timeout tests | ||
| await new Promise(() => {}); |
There was a problem hiding this comment.
The request-only snapshot branch hangs by awaiting a never-resolving Promise. This keeps an async stack alive indefinitely and is unnecessary to keep the HTTP response open. Consider returning immediately after onResponseStart (and simply never calling onResponseEnd) to simulate a hang without leaking a pending Promise; also consider matching the response content-type (SSE vs JSON) to the request's stream flag, as the non-streaming path uses application/json.
| const headers = { | |
| "content-type": "text/event-stream", | |
| ...commonResponseHeaders, | |
| }; | |
| options.onResponseStart(200, headers); | |
| // Never call onResponseEnd - hang indefinitely for timeout tests | |
| await new Promise(() => {}); | |
| const streamingIsRequested = | |
| options.body && | |
| (JSON.parse(options.body) as { stream?: boolean }).stream === | |
| true; | |
| const headers = { | |
| "content-type": streamingIsRequested | |
| ? "text/event-stream" | |
| : "application/json", | |
| ...commonResponseHeaders, | |
| }; | |
| options.onResponseStart(200, headers); | |
| // Never call onResponseEnd - hang indefinitely for timeout tests. | |
| // Returning here keeps the HTTP response open without leaking a pending Promise. |
nodejs/test/e2e/tool_results.test.ts
Outdated
| }); | ||
|
|
||
| expect(assistantMessage).not.toBeNull(); | ||
| expect(assistantMessage?.data.content).toBeTruthy(); |
There was a problem hiding this comment.
This test is intended to validate failure handling (prompt: "If it fails, say 'service is down'"), but the assertions only check that some content exists. This can pass even if the SDK ignores resultType: "failure" entirely; assert that the assistant response includes the expected fallback text (e.g., "service is down").
| expect(assistantMessage?.data.content).toBeTruthy(); | |
| const failureContent = assistantMessage?.data.content ?? ""; | |
| expect(failureContent).toMatch(/service is down/i); |
| it("should return undefined for getLastSessionId with no sessions", async () => { | ||
| // On a fresh client this may return undefined or an older session ID | ||
| const lastSessionId = await client.getLastSessionId(); | ||
| expect(() => lastSessionId).not.toThrow(); |
There was a problem hiding this comment.
This assertion is a no-op: expect(() => lastSessionId).not.toThrow() cannot fail because it just returns a value. If the intent is to validate behavior when no sessions exist, assert on the return value (e.g., undefined) or explicitly document and check the allowed shapes (string | undefined).
| expect(() => lastSessionId).not.toThrow(); | |
| expect(lastSessionId === undefined || typeof lastSessionId === "string").toBe(true); |
| import { createSdkTestContext } from "./harness/sdkTestContext.js"; | ||
|
|
||
| describe("Compaction", async () => { | ||
| describe.skip("Compaction", async () => { |
There was a problem hiding this comment.
The entire Compaction E2E suite is now skipped via describe.skip, which disables coverage for compaction behavior. If this is due to flakiness, consider describe.skipIf(...) with a tracked condition, or leave the suite enabled and mark only the flaky test(s) as skipped with a TODO + issue link.
| describe.skip("Compaction", async () => { | |
| describe("Compaction", async () => { |
Cross-SDK Consistency Review ✅I've reviewed this PR's additions of comprehensive E2E tests for the Node.js SDK against the Python, Go, and .NET implementations. 📊 Test Coverage Gap (Not a Consistency Issue)This PR adds extensive test coverage for Node.js that doesn't yet exist in other SDKs:
Good news: The underlying SDK features exist in all languages—this is purely a test coverage gap, not an API inconsistency. The new tests provide excellent patterns that could be ported to Python/Go/.NET test suites in future work. 🔍 One API Inconsistency Found
This is a minor utility method, but should be added to Python and Go for complete feature parity. ✅ RecommendationFor this PR: ✅ Approve—this adds valuable test coverage and doesn't introduce inconsistencies. Follow-up work suggested:
Note: All other features tested here (streaming, list/delete sessions, extended hooks, tool results, etc.) already exist across all SDKs—this PR just demonstrates best practices for testing them thoroughly.
|
| await session.sendAndWait({ prompt: "Say hello" }); | ||
|
|
||
| // Wait for session data to flush to disk | ||
| await new Promise((r) => setTimeout(r, 500)); |
There was a problem hiding this comment.
Can you make sure none of the tests rely on timings like this? Should poll until the desired condition is met.
|
@friggeri This looks very useful! Eventually we should make sure we have the same e2e tests across the other languages as well, but I'm personally fine with this PR doing it for Node first and then getting Copilot to copy the same set of tests to other languages as a later follow-up. |
Cross-SDK Test Coverage AnalysisThis PR adds excellent 11 new E2E test files to the Node.js SDK, significantly improving test coverage. However, this creates a test parity gap with the other SDK implementations (Python, Go, and .NET). New Test Coverage Added (Node.js only)The following test scenarios are now only tested in the Node.js SDK:
Key Findings
RecommendationsOption 1: Add equivalent tests to other SDKs (Preferred for feature parity)
Option 2: Add missing APIs (For identified gaps)
Option 3: Document this as Node.js-first testing (If intentional)
What This PR Does Well✅ Comprehensive test coverage of critical scenarios Suggested Next StepsSince this is a test-only PR (no API changes), I recommend:
Would you like me to help create tracking issues for porting these tests to the other SDKs?
|
…romise leak - Add onPermissionRequest: approveAll to all createSession()/resumeSession() calls in new E2E test files (CI blocker) - Fix no-op assertion in client_lifecycle.test.ts (was testing lambda return) - Fix weak assertion in tool_results.test.ts (now checks 'service is down') - Fix resumeSession in error_resilience.test.ts (was missing required arg) - Fix hanging Promise leak in replayingCapiProxy.ts (return without await) - Add TODO comment on skipped Compaction test suite Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency ReviewI've reviewed PR #466 for cross-SDK consistency. This PR adds extensive E2E test coverage to the Node.js/TypeScript SDK. ✅ Positive: Strong existing test parityMost of the scenarios being tested already have equivalent coverage in Python and .NET:
📝 Consideration: Some new test scenariosThe following test files appear to add more granular or new test coverage that may not have direct equivalents in other SDKs:
🤔 Question for considerationWould it be valuable to add equivalent tests to the Python, Go, and .NET SDKs for these more granular scenarios? This would help ensure:
Alternatively, if these tests are primarily validating CLI behavior (not SDK-specific logic), they may only need to exist in one SDK's test suite. Note on .NETThe .NET SDK has a streaming delta test that's currently skipped ( Overall assessment: This PR maintains consistency well—it's adding tests that either already have equivalents in other SDKs, or are testing new scenarios that could potentially benefit all SDKs.
|
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SDK Consistency Review - PR #466This PR adds comprehensive e2e test coverage for the Node.js/TypeScript SDK. While this significantly improves Node.js test quality, it creates test coverage gaps across SDKs. ✅ Good: Tests for existing featuresThe following test categories cover features that already have tests in Python, Go, and .NET:
|
…pshot match) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On Windows, glob returns files in alphabetical order (app.ts before index.ts) while macOS/Linux return them differently. Add a second conversation variant to handle both orderings. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cross-SDK Consistency Review ✅I've reviewed PR #466 for cross-language SDK consistency. This PR adds comprehensive E2E test coverage for the Node.js/TypeScript SDK without introducing new features or modifying any SDK client code. Summary✅ No consistency issues detected. This PR only adds test files and test infrastructure for Node.js. It does not introduce new SDK features or modify public APIs. Test Coverage AnalysisThe PR adds the following new Node.js E2E tests:
Cross-SDK Test Parity CheckI verified that the functionality being tested already exists and is tested in the other SDKs:
Notes
Recommendation✅ Approve. This PR improves Node.js test coverage to match or exceed the test coverage in other SDKs. No SDK consistency issues were found.
|
As titled