Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions

**Status:** In Progress
**Branch:** feature/FU-P13-T8-web-ui-port-collision
**Priority:** P0
**Phase:** 13 (Follow-up)

---

## Problem Statement

When the `--web-ui` port is occupied (e.g. by a stale/orphan wrapper process from a previous Cursor restart), the current code's `is_port_available()` check prevents starting a duplicate listener. However, there is a **TOCTOU window**: the check may pass, the daemon thread starts, and then uvicorn fails to bind — raising `SystemExit(1)` inside the daemon thread. Since `run_server()` only catches `OSError`, the `SystemExit` propagates as an unhandled thread exception, producing a pytest warning during tests and, in production, leaking a daemon thread exception to stderr noise.

The troubleshooting documentation for stale-process cleanup was already added in `FU-BUG-T6-1` and is present in `docs/troubleshooting.md`.

---

## Current State Analysis

### What already works
- `is_port_available()` pre-check in `__main__.py` (lines 274–285): if the port is occupied at check time, a `Warning:` is printed to stderr and Web UI is skipped; MCP bridge starts normally.
- `--web-ui-only` returns exit code 1 with a clear error message when port is occupied.
- `run_server()` already catches `OSError` from uvicorn.
- `TestPortCollisionHandling` class covers: occupied-port bridge mode, occupied-port `--web-ui-only` mode, free-port normal start, and `is_port_available` socket-level tests.
- Troubleshooting docs in `docs/troubleshooting.md` cover stale-process cleanup.

### Gap: `SystemExit` not caught in `run_server()`
Uvicorn internally catches the `OSError` from port binding and calls `sys.exit(1)`, raising `SystemExit(1)`. The `run_server()` wrapper only catches `OSError`, so `SystemExit` propagates as an unhandled daemon-thread exception. This is visible as:
- `PytestUnhandledThreadExceptionWarning` in test output when real port 8080 is occupied
- Daemon thread stderr noise in production if TOCTOU window is hit

---

## Deliverables

1. **`src/mcpbridge_wrapper/webui/server.py`** — Extend the `except OSError` in `run_server()` to also catch `SystemExit`, with an appropriate stderr message.

2. **`tests/unit/test_main_webui.py`** — Add a test in `TestPortCollisionHandling` for the TOCTOU scenario: `is_port_available` returns `True` but uvicorn fails to bind (simulated by raising `SystemExit(1)` from `uvicorn.run`), verifying the thread does NOT produce an unhandled exception.

3. **`SPECS/Workplan.md`** — Mark FU-P13-T8 acceptance criteria as satisfied.

---

## Acceptance Criteria

- [x] When requested Web UI port is occupied, wrapper behavior is explicit and deterministic (clear error or safe fallback) — handled by `is_port_available` pre-check (already done in BUG-T6)
- [x] MCP stdio protocol output remains valid JSON-RPC only on stdout — maintained; port check and warning go to stderr only
- [x] Repeated client startups no longer accumulate conflicting Web UI listeners on the same port — `is_port_available` prevents duplicate server starts
- [ ] `run_server()` catches `SystemExit` from uvicorn's `sys.exit(1)` on bind failure — **this task's code fix**
- [ ] No unhandled thread exceptions from Web UI daemon thread on port collision — verified by new TOCTOU test
- [ ] Tests cover occupied-port and restart scenarios — existing + new TOCTOU test

---

## Implementation Plan

### 1. Fix `run_server()` — catch `SystemExit`

In `src/mcpbridge_wrapper/webui/server.py`, extend the `except OSError` block:

```python
try:
uvicorn.run(...)
except OSError as exc:
print(f"Warning: Web UI server could not bind to {host}:{port}: {exc}", file=sys.stderr)
except SystemExit:
# uvicorn calls sys.exit(1) when port binding fails; treat as bind error
print(
f"Warning: Web UI server failed to start on {server_config.host}:{server_config.port}. "
"Port may have become occupied after the availability check.",
file=sys.stderr,
)
```

### 2. Add TOCTOU regression test

In `tests/unit/test_main_webui.py`, add to `TestPortCollisionHandling`:

```python
def test_toctou_port_occupied_after_check_does_not_crash_thread(self):
"""If port is free at check time but uvicorn fails to bind (TOCTOU),
the daemon thread exits cleanly without an unhandled exception."""
```

Simulate by patching `uvicorn.run` to raise `SystemExit(1)` and verifying no exception is raised from `run_server`.

### 3. Update workplan

Mark `FU-P13-T8` acceptance criteria as completed in `SPECS/Workplan.md`.

---

## Dependencies

- P10-T1 ✅ (Web UI foundation)
- BUG-T6 ✅ (initial port collision handling)
- FU-BUG-T6-1 ✅ (stale-process troubleshooting docs)

---

## Quality Gates

- `pytest tests/unit/test_main_webui.py` — all pass, no thread exception warnings
- `ruff check src/` — no lint errors
- `pytest --cov` — coverage ≥ 90%
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# FU-P13-T8 Validation Report

**Task:** Prevent Web UI port collision from destabilizing MCP sessions
**Date:** 2026-02-16
**Branch:** feature/FU-P13-T8-web-ui-port-collision

---

## Summary

**PASS** — All acceptance criteria met. Code fix is minimal and targeted (1 new `except` clause). All quality gates pass with improved test coverage.

---

## Changes Made

### `src/mcpbridge_wrapper/webui/server.py`
- Extended `run_server()` to catch `SystemExit` in addition to `OSError`
- Uvicorn internally calls `sys.exit(1)` when port binding fails, which was propagating as an unhandled daemon-thread exception (visible as `PytestUnhandledThreadExceptionWarning`)
- New `except SystemExit` block prints a clear warning to stderr and allows the thread to exit cleanly

### `tests/unit/test_main_webui.py`
- Added `test_toctou_systemexit_from_uvicorn_does_not_crash_thread` to `TestPortCollisionHandling`
- Verifies that when `uvicorn.run()` raises `SystemExit(1)` (TOCTOU scenario), `run_server()` catches it without propagating an unhandled exception
- Verifies the warning message is printed to stderr

### `SPECS/Workplan.md`
- Marked FU-P13-T8 acceptance criteria as `[x]` satisfied
- Added `✅` status marker and implementation date

---

## Quality Gates

| Gate | Command | Result |
|------|---------|--------|
| Unit tests | `pytest tests/` | ✅ 472 passed, 5 skipped |
| Lint | `ruff check src/` | ✅ All checks passed |
| Coverage | `pytest --cov` | ✅ 95.6% (≥90% required) |

### Test File Coverage Before vs After

| Test file | Before | After |
|-----------|--------|-------|
| `test_main_webui.py` | 41 tests, 1 PytestUnhandledThreadExceptionWarning | 42 tests, 0 warnings |

---

## Acceptance Criteria

- [x] When requested Web UI port is occupied, wrapper behavior is explicit and deterministic — handled by `is_port_available()` pre-check (BUG-T6) + new `SystemExit` catch for TOCTOU
- [x] MCP stdio protocol output remains valid JSON-RPC only on stdout — warnings go to stderr only
- [x] Repeated client startups no longer accumulate conflicting Web UI listeners — `is_port_available()` prevents duplicate binds; TOCTOU window covered by `SystemExit` catch
- [x] Tests cover occupied-port and restart scenarios — existing `TestPortCollisionHandling` class + new TOCTOU test

---

## Pre-existing State Verified

- `is_port_available()` pre-check already in `__main__.py` (from BUG-T6)
- Troubleshooting docs for stale-process cleanup already in `docs/troubleshooting.md` (from FU-BUG-T6-1)
- `TestPortCollisionHandling` test class already present (from BUG-T6)

The only gap was the unhandled `SystemExit` from uvicorn in the TOCTOU case — now addressed.
4 changes: 4 additions & 0 deletions SPECS/ARCHIVE/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@
| FU-P11-T1-1 | [FU-P11-T1-1_Refactor_FakeWebUIConfig_to_MagicMock/](FU-P11-T1-1_Refactor_FakeWebUIConfig_to_MagicMock/) | 2026-02-16 | PASS |
| FU-P12-T2-1 | [FU-P12-T2-1_Fix_stacking_click_listeners_in_updateLatencyTable/](FU-P12-T2-1_Fix_stacking_click_listeners_in_updateLatencyTable/) | 2026-02-16 | PASS |
| FU-P13-T7 | [FU-P13-T7_Fix_structuredContent_compliance_for_empty_content_tool_results/](FU-P13-T7_Fix_structuredContent_compliance_for_empty_content_tool_results/) | 2026-02-16 | PASS |
| FU-P13-T8 | [FU-P13-T8_Prevent_Web_UI_port_collision_from_destabilizing_MCP_sessions/](FU-P13-T8_Prevent_Web_UI_port_collision_from_destabilizing_MCP_sessions/) | 2026-02-16 | PASS |

## Historical Artifacts

Expand Down Expand Up @@ -169,6 +170,7 @@
| [REVIEW_FU-P11-T1-1_fake_webuiconfig_refactor.md](_Historical/REVIEW_FU-P11-T1-1_fake_webuiconfig_refactor.md) | Review report for FU-P11-T1-1 |
| [REVIEW_FU-P12-T2-1_stacking_click_listeners.md](_Historical/REVIEW_FU-P12-T2-1_stacking_click_listeners.md) | Review report for FU-P12-T2-1 |
| [REVIEW_FU-P13-T7_structuredcontent_compliance.md](_Historical/REVIEW_FU-P13-T7_structuredcontent_compliance.md) | Review report for FU-P13-T7 |
| [REVIEW_FU-P13-T8_web_ui_port_collision.md](_Historical/REVIEW_FU-P13-T8_web_ui_port_collision.md) | Review report for FU-P13-T8 |

## Archive Log

Expand Down Expand Up @@ -292,3 +294,5 @@
| 2026-02-16 | FU-P12-T2-1 | Archived REVIEW_FU-P12-T2-1_stacking_click_listeners report |
| 2026-02-16 | FU-P13-T7 | Archived Fix_structuredContent_compliance_for_empty_content_tool_results (PASS) |
| 2026-02-16 | FU-P13-T7 | Archived REVIEW_FU-P13-T7_structuredcontent_compliance report |
| 2026-02-16 | FU-P13-T8 | Archived Prevent_Web_UI_port_collision_from_destabilizing_MCP_sessions (PASS) |
| 2026-02-16 | FU-P13-T8 | Archived REVIEW_FU-P13-T8_web_ui_port_collision report |
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
## REVIEW REPORT — FU-P13-T8: Web UI port collision hardening

**Scope:** origin/main..HEAD (commits b06468f..2da4d89)
**Files changed:** 2 (`webui/server.py`, `test_main_webui.py`)
**Date:** 2026-02-16

---

### Summary Verdict
- [x] Approve

---

### Critical Issues

None.

---

### Secondary Issues

**[Low] `except SystemExit` is broad — could mask unrelated `sys.exit()` calls**

The new `except SystemExit` in `run_server()` catches any `SystemExit` raised by
`uvicorn.run()`, including hypothetical future `sys.exit(0)` calls (clean shutdown).
In practice uvicorn only calls `sys.exit(1)` on bind failure, so this is safe.
If desired, the `except` could be narrowed to `except SystemExit as exc: if exc.code != 1: raise`
— but this is over-engineering for the current behavior.

Recommendation: Accept as-is; add a comment clarifying this if the code is revisited.

**[Nit] Test imports inside test method body**

`test_toctou_systemexit_from_uvicorn_does_not_crash_thread` performs several imports
inside the method body (`from mcpbridge_wrapper.webui.server import run_server`, etc.).
The pattern is consistent with other tests in the file, but top-level imports would be
cleaner for readability. Not a correctness issue.

Recommendation: Low-priority; leave consistent with existing style.

---

### Architectural Notes

- The `run_server()` function now has a complete exception surface: `OSError` (bind error surfaced
directly) and `SystemExit` (uvicorn's internal handling of bind error). Both write to stderr and
return cleanly, so the daemon thread lifecycle is fully deterministic.
- The TOCTOU window between `is_port_available()` and `uvicorn.run()` is now fully handled:
pre-check prevents the common case; `SystemExit` catch handles the race.
- No changes to `__main__.py` were needed — the fix is entirely within `server.py`.

---

### Tests

- 42 tests in `test_main_webui.py` (was 41), all pass.
- 472 total tests pass, 5 skipped.
- Coverage: 95.6% (≥90% required).
- `PytestUnhandledThreadExceptionWarning` about port 8080 is resolved — no longer appears in CI.

---

### Next Steps

- No actionable follow-up items. FOLLOW-UP step is skipped per FLOW rules (no findings require
new workplan tasks).
3 changes: 1 addition & 2 deletions SPECS/INPROGRESS/next.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,13 @@ The previously selected task has been archived.

## Recently Archived

- 2026-02-16 — FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions (PASS)
- 2026-02-16 — FU-P13-T7: Enforce strict `structuredContent` compliance for empty-content tool results (PASS)
- 2026-02-16 — FU-P12-T2-1: Fix stacking click event listeners in `updateLatencyTable` (PASS)
- 2026-02-16 — FU-P11-T1-1: Refactor `_FakeWebUIConfig` test stub to use `MagicMock(spec=WebUIConfig)` (PASS)
- 2026-02-16 — FU-P11-T2-2: Add `limit` query param to `GET /api/sessions` (PASS)
- 2026-02-16 — FU-P11-T2-1: Push session data via WebSocket (PASS)

## Suggested Next Tasks

- FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions (P0, parallelizable)
- P13-T1: Design persistent broker architecture and protocol contract (P0, Phase 13 foundation)
- FU-P12-T1-1: Remove or document `MCPInitializeParams` in schemas (P3)
20 changes: 10 additions & 10 deletions SPECS/Workplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -1153,7 +1153,7 @@ Manually kill stale wrapper/uvx processes or use unique `--web-ui-port` values p
#### Resolution Path
- [x] Implement FU-P13-T8
- [x] Add deterministic collision handling tests
- [ ] Document stale-process cleanup in troubleshooting
- [x] Document stale-process cleanup in troubleshooting (done in FU-BUG-T6-1)

---

Expand Down Expand Up @@ -1972,21 +1972,21 @@ Phase 9 Follow-up Backlog

---

#### FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions
#### FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions
- **Description:** Harden startup behavior when `--web-ui` port is already occupied (common with stale/orphan wrapper processes). Ensure collision handling is deterministic and does not silently degrade MCP client stability.
- **Priority:** P0
- **Dependencies:** P10-T1
- **Parallelizable:** yes
- **Status:** ✅ Implemented (2026-02-16)
- **Outputs/Artifacts:**
- Updated `src/mcpbridge_wrapper/__main__.py` Web UI startup collision handling
- Optional single-instance guard (lock/PID) for Web UI mode
- Tests for occupied-port startup behavior
- Troubleshooting updates for stale-process cleanup
- `run_server()` catches `SystemExit` from uvicorn bind failure (TOCTOU window)
- TOCTOU regression test in `tests/unit/test_main_webui.py`
- Stale-process troubleshooting docs in `docs/troubleshooting.md` (FU-BUG-T6-1)
- **Acceptance Criteria:**
- [ ] When requested Web UI port is occupied, wrapper behavior is explicit and deterministic (clear error or safe fallback)
- [ ] MCP stdio protocol output remains valid JSON-RPC only on stdout
- [ ] Repeated client startups no longer accumulate conflicting Web UI listeners on the same port
- [ ] Tests cover occupied-port and restart scenarios
- [x] When requested Web UI port is occupied, wrapper behavior is explicit and deterministic (clear error or safe fallback)
- [x] MCP stdio protocol output remains valid JSON-RPC only on stdout
- [x] Repeated client startups no longer accumulate conflicting Web UI listeners on the same port
- [x] Tests cover occupied-port and restart scenarios

---

Expand Down
11 changes: 11 additions & 0 deletions src/mcpbridge_wrapper/webui/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,6 +411,17 @@ def run_server(
f"{server_config.host}:{server_config.port}: {exc}",
file=sys.stderr,
)
except SystemExit:
# uvicorn calls sys.exit(1) when port binding fails internally (e.g. TOCTOU window
# where port became occupied after is_port_available() returned True). Catch here so
# the daemon thread exits cleanly without an unhandled thread exception.
print(
f"Warning: Web UI server failed to start on "
f"{server_config.host}:{server_config.port}. "
"Port may have become occupied after the availability check. "
"MCP bridge continues without the dashboard.",
file=sys.stderr,
)


def run_server_in_thread(
Expand Down
35 changes: 35 additions & 0 deletions tests/unit/test_main_webui.py
Original file line number Diff line number Diff line change
Expand Up @@ -505,3 +505,38 @@ def test_is_port_available_returns_false_for_occupied_port(self):
occupied_port = occupier.getsockname()[1]
# Port is held; second bind should fail
assert is_port_available("127.0.0.1", occupied_port) is False

def test_toctou_systemexit_from_uvicorn_does_not_crash_thread(self):
"""If port appears free at check time but uvicorn raises SystemExit(1) when binding
(TOCTOU window), run_server() catches it cleanly — no unhandled thread exception."""
pytest.importorskip("fastapi")

from unittest.mock import MagicMock, patch

from mcpbridge_wrapper.webui.audit import AuditLogger
from mcpbridge_wrapper.webui.config import WebUIConfig
from mcpbridge_wrapper.webui.metrics import MetricsCollector
from mcpbridge_wrapper.webui.server import run_server

config = WebUIConfig()
metrics = MetricsCollector()
audit = MagicMock(spec=AuditLogger)

# Simulate uvicorn calling sys.exit(1) on bind failure
with patch("mcpbridge_wrapper.webui.server.uvicorn") as mock_uvicorn, patch(
"sys.stderr"
) as mock_stderr:
mock_uvicorn.run.side_effect = SystemExit(1)
mock_uvicorn.Config.return_value = MagicMock(
host=config.host,
port=config.port,
log_level="warning",
access_log=False,
)
# Must not raise — SystemExit is caught internally
run_server(config, metrics, audit)

# Warning message was printed to stderr
write_calls = " ".join(str(c) for c in mock_stderr.write.call_args_list)
assert "Warning" in write_calls
assert "failed to start" in write_calls