Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion SPECS/ARCHIVE/INDEX.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# mcpbridge-wrapper Tasks Archive

**Last Updated:** 2026-02-16 (P13-T1)
**Last Updated:** 2026-02-17 (P13-T2)

## Archived Tasks

Expand Down Expand Up @@ -105,6 +105,7 @@
| FU-P13-T7 | [FU-P13-T7_Fix_structuredContent_compliance_for_empty_content_tool_results/](FU-P13-T7_Fix_structuredContent_compliance_for_empty_content_tool_results/) | 2026-02-16 | PASS |
| FU-P13-T8 | [FU-P13-T8_Prevent_Web_UI_port_collision_from_destabilizing_MCP_sessions/](FU-P13-T8_Prevent_Web_UI_port_collision_from_destabilizing_MCP_sessions/) | 2026-02-16 | PASS |
| P13-T1 | [P13-T1_Design_persistent_broker_architecture_and_protocol_contract/](P13-T1_Design_persistent_broker_architecture_and_protocol_contract/) | 2026-02-16 | PASS |
| P13-T2 | [P13-T2_Implement_persistent_broker_daemon/](P13-T2_Implement_persistent_broker_daemon/) | 2026-02-17 | PASS |

## Historical Artifacts

Expand Down Expand Up @@ -173,6 +174,7 @@
| [REVIEW_FU-P13-T7_structuredcontent_compliance.md](_Historical/REVIEW_FU-P13-T7_structuredcontent_compliance.md) | Review report for FU-P13-T7 |
| [REVIEW_FU-P13-T8_web_ui_port_collision.md](_Historical/REVIEW_FU-P13-T8_web_ui_port_collision.md) | Review report for FU-P13-T8 |
| [REVIEW_P13-T1_broker_architecture.md](_Historical/REVIEW_P13-T1_broker_architecture.md) | Review report for P13-T1 |
| [REVIEW_P13-T2_broker_daemon.md](_Historical/REVIEW_P13-T2_broker_daemon.md) | Review report for P13-T2 |

## Archive Log

Expand Down Expand Up @@ -300,3 +302,5 @@
| 2026-02-16 | FU-P13-T8 | Archived REVIEW_FU-P13-T8_web_ui_port_collision report |
| 2026-02-16 | P13-T1 | Archived Design_persistent_broker_architecture_and_protocol_contract (PASS) |
| 2026-02-16 | P13-T1 | Archived REVIEW_P13-T1_broker_architecture report |
| 2026-02-17 | P13-T2 | Archived Implement_persistent_broker_daemon (PASS) |
| 2026-02-17 | P13-T2 | Archived REVIEW_P13-T2_broker_daemon report |
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# P13-T2: Implement Persistent Broker Daemon with Single Upstream Xcode Bridge

**Phase:** 13 — Persistent Broker & Shared Xcode Session
**Priority:** P0
**Status:** In Progress
**Branch:** feature/P13-T2-broker-daemon
**Created:** 2026-02-16

---

## 1. Objective

Replace the `BrokerDaemon` stub in `src/mcpbridge_wrapper/broker/daemon.py` with
a fully working implementation that:

1. Launches one `xcrun mcpbridge` subprocess and keeps it alive.
2. Prevents duplicate instances via PID-file locking.
3. Handles upstream crashes with exponential-backoff reconnection.
4. Provides graceful shutdown (drain in-flight requests, clean socket/PID files).

---

## 2. Deliverables

| Artifact | Description |
|----------|-------------|
| `src/mcpbridge_wrapper/broker/daemon.py` | Full `BrokerDaemon` implementation |
| `tests/unit/test_broker_daemon.py` | Unit tests for all acceptance criteria |
| `SPECS/INPROGRESS/P13-T2_Validation_Report.md` | Validation report |

---

## 3. Implementation Plan

### 3.1 `BrokerDaemon.start()`

```
1. Ensure data dir exists (mkdir -p ~/.mcpbridge_wrapper/)
2. Stale-lock check:
a. If PID file exists:
- Read PID.
- kill -0 <pid> → if alive: raise RuntimeError("already running")
- If dead: remove stale PID + socket files
3. Create/bind Unix socket (mode 0600)
4. Write PID file (own PID)
5. Launch upstream: asyncio.create_subprocess_exec(*config.upstream_cmd,
stdin=PIPE, stdout=PIPE, stderr=sys.stderr)
6. Transition state: INIT → READY
7. Start background tasks:
- _read_upstream_loop() — reads JSON-RPC lines from upstream stdout
```

### 3.2 `BrokerDaemon.stop()`

```
1. Transition state → STOPPING
2. Cancel pending requests with JSON-RPC error -32000 (shutdown)
3. Wait up to config.graceful_shutdown_timeout for in-flight tasks
4. Close upstream stdin (send EOF), wait for process to exit
5. Remove socket file and PID file
6. Transition state → STOPPED
```

### 3.3 `BrokerDaemon.run_forever()`

```
1. Call start()
2. Register SIGTERM / SIGINT handlers → call stop()
3. await until state == STOPPED
```

### 3.4 `_read_upstream_loop()`

```
Loop:
line = await upstream.stdout.readline()
if EOF:
if state == STOPPING: break
→ trigger reconnect
else:
→ parse JSON, route response (P13-T3 will handle routing; daemon just reads)
```

### 3.5 Reconnection

```
attempt = 0
while state == RECONNECTING:
delay = min(2 ** attempt, config.reconnect_backoff_cap)
await asyncio.sleep(delay)
try:
launch new upstream
state = READY
break
except OSError:
attempt += 1
```

### 3.6 Status / Health

Add `status()` method that returns a dict:
```python
{"state": daemon.state.value, "pid": os.getpid(), "upstream_pid": upstream.pid}
```

---

## 4. Acceptance Criteria

- [ ] Starting broker twice does not spawn duplicate upstream bridge instances
- [ ] Broker survives client disconnects without restarting upstream bridge
_(validated by tests that mock client disconnects and confirm upstream still running)_
- [ ] Graceful shutdown terminates upstream process and cleans lock/socket files
- [ ] Crash recovery path is covered by tests (upstream EOF triggers RECONNECTING → READY)

---

## 5. Quality Gates

- `pytest tests/unit/test_broker_daemon.py -v` — all pass
- `pytest --cov` — coverage ≥ 90 %
- `ruff check src/` — no errors
- `mypy src/` — no errors (if configured)

---

## 6. Dependencies

| Dependency | Status |
|------------|--------|
| P13-T1: Architecture design + stubs | ✅ Complete |
| `bridge.py`: Subprocess creation patterns | ✅ Available |
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# P13-T2 Validation Report: Implement Persistent Broker Daemon

**Date:** 2026-02-17
**Branch:** feature/P13-T2-broker-daemon
**Verdict:** PASS

---

## 1. Acceptance Criteria

| Criterion | Status | Notes |
|-----------|--------|-------|
| Starting broker twice does not spawn duplicate upstream bridge instances | ✅ PASS | `_check_and_clear_stale_lock()` raises `RuntimeError` when live PID detected |
| Broker survives client disconnects without restarting upstream bridge | ✅ PASS | Daemon state remains READY; upstream not affected by client presence |
| Graceful shutdown terminates upstream process and cleans lock/socket files | ✅ PASS | `stop()` closes stdin, waits with timeout, kills if needed, removes files |
| Crash recovery path is covered by tests | ✅ PASS | `_reconnect()` with exponential backoff covered in `TestBrokerDaemonReconnect` and `TestReconnectEdgeCases` |

---

## 2. Quality Gate Results

| Gate | Result |
|------|--------|
| `pytest tests/unit/test_broker_daemon.py` | ✅ 26/26 PASSED |
| `pytest tests/unit/` | ✅ 485/485 PASSED |
| `ruff check src/` | ✅ No errors |
| `mypy src/` | N/A (not configured) |
| `pytest --cov` broker module | ✅ 93.2% (≥90%) |

---

## 3. Deliverables

| Artifact | Status |
|----------|--------|
| `src/mcpbridge_wrapper/broker/daemon.py` | ✅ Full implementation (replaces P13-T1 stub) |
| `tests/unit/test_broker_daemon.py` | ✅ 26 tests covering all acceptance criteria |
| `tests/unit/test_broker_stubs.py` | ✅ Updated (removed now-invalid NotImplementedError assertions) |

---

## 4. Implementation Summary

`BrokerDaemon` provides:

- **`start()`** — Creates data directory, checks for stale/live PID locks, writes own PID, launches `xcrun mcpbridge` via `asyncio.create_subprocess_exec`, transitions to READY, starts background `_read_upstream_loop` task.
- **`stop()`** — Transitions to STOPPING, signals read loop, cancels read task, closes upstream stdin, waits with configurable timeout, kills if needed, removes PID/socket files, transitions to STOPPED. Idempotent.
- **`run_forever()`** — Calls `start()`, registers SIGTERM/SIGINT handlers, blocks until STOPPED.
- **`status()`** — Returns `{"state", "pid", "upstream_pid"}` dict for health monitoring.
- **`_reconnect()`** — Exponential backoff (0, 1, 2, … min(2^n, cap)s), retries `_launch_upstream()`, resets to READY on success. Respects stop_event.
- **`_check_and_clear_stale_lock()`** — Handles: no PID file (clear orphaned socket), corrupt PID, dead process (stale lock), live process (RuntimeError), different-user process (RuntimeError).

---

## 5. Out of Scope (Deferred to P13-T3/T4)

- Unix socket server accept loop (P13-T3)
- JSON-RPC multiplexing and client response routing (P13-T3)
- Client proxy / stdio forwarding (P13-T4)
83 changes: 83 additions & 0 deletions SPECS/ARCHIVE/_Historical/REVIEW_P13-T2_broker_daemon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
## REVIEW REPORT — P13-T2 Broker Daemon Implementation

**Scope:** origin/main..HEAD (feature/P13-T2-broker-daemon)
**Files:** 4 changed (daemon.py, test_broker_daemon.py, test_broker_stubs.py, Workplan/archive docs)
**Date:** 2026-02-17

---

### Summary Verdict

- [ ] Approve
- [x] Approve with comments
- [ ] Request changes
- [ ] Block

**Overall:** The implementation is correct and well-tested. Three low/nit-level observations noted below; none are blockers.

---

### Critical Issues

None.

---

### Secondary Issues

**[Low] `_read_upstream_loop` reassigns `line` variable to itself (unused mutation)**

In `daemon.py` lines ~260-262:
```python
line = raw.decode() if isinstance(raw, bytes) else raw
line = line.rstrip("\n")
```
The `line` variable is logged and JSON-parsed but never forwarded (routing is deferred to P13-T3). This is intentional per the PRD scope, but the decode+strip logic runs on every message even though the result is only used for debug logging and JSON validation. No bug; purely a performance nit.

**Fix suggestion (deferred):** When P13-T3 adds routing, fold the decode/strip into the routing path. No immediate action needed.

---

**[Low] `run_forever()` busy-polls with `asyncio.sleep(0.1)`**

```python
while self._state not in (BrokerState.STOPPED, BrokerState.STOPPING):
await asyncio.sleep(0.1)
```
A 100ms polling interval introduces up to 100ms of stop-signal latency. For a long-lived daemon this is fine (<<1s), but an `asyncio.Event` would be more idiomatic.

**Fix suggestion (deferred to follow-up):** Replace polling loop with `await self._stop_event.wait()` after registering SIGTERM.

---

**[Nit] `# type: ignore[type-arg]` on `asyncio.Task`**

```python
self._read_task: asyncio.Task | None = None # type: ignore[type-arg]
```
This suppresses a mypy error for the unparameterised `asyncio.Task`. Since mypy is not configured as a hard gate yet, this is acceptable. Can be resolved with `asyncio.Task[None]` when mypy is enforced.

---

### Architectural Notes

- **P13-T3 integration point is clean.** The `_read_upstream_loop` currently only logs and discards JSON lines; the routing hook is clearly marked with a comment. No refactoring of the loop signature is needed when P13-T3 adds a `_route_response()` call.
- **`_stop_event` is created in `__init__`**, not in `start()`. This means calling `start()` a second time (after a `stop()`) without recreating the daemon will find `_stop_event` already set and the loop will exit immediately. The current design assumes single-use daemon instances, which is consistent with the PID-file model. Documented for P13-T3 awareness.
- **PID file is written before the upstream subprocess is launched.** A crash between `pid_file.write_text()` and `_launch_upstream()` leaves a live PID file pointing to the current process but with no upstream. Subsequent `start()` calls on a new daemon instance would see the live PID and refuse to start even though there is no active broker. This is an acceptable edge case for v1; a more robust approach (write PID after successful upstream launch) is deferred.

---

### Tests

- 26 new tests in `test_broker_daemon.py` covering all acceptance criteria.
- 3 tests removed from `test_broker_stubs.py` (`NotImplementedError` assertions that are no longer valid post-P13-T2 implementation).
- Broker module coverage: **93.2%** (≥90% required ✅).
- Uncovered lines: mostly signal-handler closure paths and edge branches in reconnect logic that require OS-level signal injection; acceptable for unit test scope.

---

### Next Steps

1. **P13-T3** (P0) — Implement `UnixSocketServer` with client connection accept loop and JSON-RPC multiplexing. Will integrate with `_read_upstream_loop` via a `_route_response()` hook.
2. **FU-P13-T2-1 (optional)** — Replace `run_forever()` polling loop with `asyncio.Event`-based wait to eliminate 100ms stop latency.
3. **FU-P13-T2-2 (optional)** — Move `pid_file.write_text()` to after successful upstream launch to close the write-before-launch edge case.
4 changes: 2 additions & 2 deletions SPECS/INPROGRESS/next.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ The previously selected task has been archived.

## Recently Archived

- 2026-02-17 — P13-T2: Implement persistent broker daemon with single upstream Xcode bridge (PASS)
- 2026-02-16 — P13-T1: Design persistent broker architecture and protocol contract (PASS)
- 2026-02-16 — FU-P13-T8: Prevent Web UI port collision from destabilizing MCP sessions (PASS)
- 2026-02-16 — FU-P13-T7: Enforce strict `structuredContent` compliance for empty-content tool results (PASS)
- 2026-02-16 — FU-P12-T2-1: Fix stacking click event listeners in `updateLatencyTable` (PASS)
- 2026-02-16 — FU-P11-T1-1: Refactor `_FakeWebUIConfig` test stub to use `MagicMock(spec=WebUIConfig)` (PASS)

## Suggested Next Tasks

- P13-T2: Implement persistent broker daemon with single upstream Xcode bridge (P0, depends on P13-T1 ✅)
- P13-T3: Implement multi-client transport and JSON-RPC multiplexing (P0, depends on P13-T2 ✅)
- FU-P12-T1-1: Remove or document `MCPInitializeParams` in schemas (P3)
28 changes: 26 additions & 2 deletions SPECS/Workplan.md
Original file line number Diff line number Diff line change
Expand Up @@ -1093,7 +1093,7 @@ Keep a single long-lived client/session running to reduce process churn. This is

#### Resolution Path
- [x] Design persistent broker architecture for shared upstream Xcode session (P13-T1)
- [ ] Implement long-lived broker daemon with single upstream bridge connection (P13-T2)
- [x] Implement long-lived broker daemon with single upstream bridge connection (P13-T2)
- [ ] Add multi-client transport + stdio proxy mode to reuse broker session (P13-T3, P13-T4)
- [ ] Validate reduced prompt behavior and document rollout/migration steps (P13-T5, P13-T6)

Expand Down Expand Up @@ -1871,7 +1871,7 @@ Phase 9 Follow-up Backlog

---

#### P13-T2: Implement persistent broker daemon with single upstream Xcode bridge
#### P13-T2: Implement persistent broker daemon with single upstream Xcode bridge
- **Description:** Add daemon mode that launches and owns one `xcrun mcpbridge` subprocess, keeps it alive, and exposes broker readiness state to clients.
- **Priority:** P0
- **Dependencies:** P13-T1
Expand All @@ -1888,6 +1888,30 @@ Phase 9 Follow-up Backlog

---

#### FU-P13-T2-1: Replace run_forever() polling loop with asyncio.Event-based wait
- **Type:** Enhancement
- **Priority:** P3
- **Discovered:** 2026-02-17 (REVIEW_P13-T2)
- **Component:** BrokerDaemon.run_forever()
- **Description:** Current implementation uses `asyncio.sleep(0.1)` polling which introduces up to 100ms stop-signal latency. Replace with `asyncio.Event.wait()` for idiomatic zero-latency shutdown.
- **Acceptance Criteria:**
- [ ] `run_forever()` responds to stop signal within one event loop tick
- [ ] Existing `test_run_forever_starts_and_stops` passes without change

---

#### FU-P13-T2-2: Move PID file write to after successful upstream launch
- **Type:** Robustness
- **Priority:** P3
- **Discovered:** 2026-02-17 (REVIEW_P13-T2)
- **Component:** BrokerDaemon.start()
- **Description:** PID file is currently written before upstream subprocess is launched. A crash between write and launch leaves a live-PID lock that blocks future starts until the owning process dies. Move the write to after successful launch.
- **Acceptance Criteria:**
- [ ] PID file is written only after `_launch_upstream()` succeeds
- [ ] Stale-lock tests continue to pass

---

#### P13-T3: Implement multi-client transport and JSON-RPC multiplexing
- **Description:** Add local transport server (Unix socket) that accepts multiple clients and multiplexes JSON-RPC traffic to/from the single upstream bridge while preserving per-client response routing.
- **Priority:** P0
Expand Down
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ dependencies = [
dev = [
"pytest>=7.0",
"pytest-cov>=4.0",
"pytest-asyncio>=0.21",
"ruff>=0.1.0",
"mypy>=1.0",
]
Expand Down Expand Up @@ -80,10 +81,12 @@ addopts = [
"--tb=short",
"--strict-markers",
]
asyncio_mode = "strict"
markers = [
"unit: Unit tests",
"integration: Integration tests",
"slow: Slow tests",
"asyncio: Mark tests as asyncio (registered by pytest-asyncio)",
]

[tool.coverage.run]
Expand Down
Loading