Skip to content

SuperagenticAI/supercodemode

Super Code Mode (supercodemode)

SuperCodeMode

PyPI version Python 3.10+ License CI Docs Deploy Docs GEPA

SuperCodeMode is a Python CLI and demo harness for optimizing Code Mode style client behavior in MCP workflows with GEPA.

Optimize Code Mode with GEPA. Run anywhere.

It improves the text and routing policy around a small tool surface (typically discovery + execution), so agents make better tool choices and produce more reliable results without backend lock-in.

🧭 What You Need To Use This In Your Workflow

SuperCodeMode does not replace your MCP server. It optimizes the client-side Code Mode behavior that uses your server.

For real usage, bring:

  • an MCP server or Code Mode runtime (Cloudflare, local MCP, UTCP, internal)
  • a small dataset of real tasks
  • a scoring metric (what counts as success)
  • a client config where you can apply optimized prompts / Code Mode text

Start with the built-in smoke tests, then follow the docs guide for real usage:

πŸ”— Quick Links

✨ What This Project Solves

Many tool systems fail because the client logic is weak even when the tools are good.

Typical failures:

  • execution tool is used too early
  • discovery step is skipped
  • execution instructions are vague
  • final answers are noisy or inconsistent

SuperCodeMode gives you a repeatable GEPA-driven optimization loop to improve those behaviors.

πŸ‘₯ Who This Is For

  • Cloudflare Code Mode MCP users
  • MCP users running discovery + execution style tool patterns
  • platform engineers and evaluation teams
  • teams experimenting with Code Mode style agent behavior before changing server code

βœ… What Is Included

  • MCP stdio runner for local workflows
  • MCP streamable HTTP runner for direct Cloudflare MCP
  • HTTP bridge runner for custom runtime bridges
  • local, Docker, and Monty execution backends in the demo MCP server
  • scm doctor preflight checks
  • artifact saving for showcase/optimization runs
  • observability output (JSONL and OTLP)

🧩 What "Code Mode" Means Here

Code Mode here means a code-first MCP orchestration pattern where the model uses a small tool surface and generates code for multi-step work.

Background:

πŸ“¦ Install

From PyPI:

pip install supercodemode

With uv (tool install, recommended for CLI usage):

uv tool install supercodemode

With uv (current environment):

uv pip install supercodemode

Optional Monty executor backend:

pip install "supercodemode[monty]"

With uv:

uv pip install "supercodemode[monty]"

Optional observability integrations (LangSmith, Logfire, MLflow, Langfuse):

pip install "supercodemode[observability]"

With uv:

uv pip install "supercodemode[observability]"

Then verify install:

scm --help

For local development:

pip install -e .

With uv:

uv pip install -e .

⚑ Quick Start

Check your environment:

scm doctor

Run a Cloudflare MCP showcase (defaults to https://mcp.cloudflare.com/mcp):

scm showcase --runner mcp-http

Cloudflare MCP usually requires auth:

scm showcase --runner mcp-http --auth-bearer "$CODEMODE_TOKEN"

Run a local MCP showcase (demo server over stdio):

scm showcase --runner mcp-stdio

If Cloudflare MCP requires auth in your environment:

scm showcase --runner mcp-http --auth-bearer "$CODEMODE_TOKEN"

🎯 What You Can Optimize

SuperCodeMode uses GEPA to optimize Code Mode client-side text such as:

  • system prompt text
  • Code Mode description / routing guidance
  • tool alias mappings
  • tool description overrides

This improves client behavior without requiring server/runtime code changes.

🧠 How It Works (High Level)

SuperCodeMode demonstrates a GEPA-centric adapter approach where:

  1. GEPA optimizes client text policy
  2. runners execute tools on MCP or HTTP runtimes
  3. the same optimization logic can be reused across local and remote transports

This keeps GEPA optimization logic separate from runtime transport details.

For the full practical workflow (dataset, metric, optimize, apply best candidate):

πŸ› οΈ Common Commands

Preflight

scm doctor
scm doctor --json
scm doctor --strict

Showcase runs (baseline vs tuned)

scm showcase --runner mcp-http
scm showcase --runner mcp-stdio
scm showcase --runner mcp-stdio --executor-backend monty
scm showcase --runner mcp-stdio --executor-backend docker
scm showcase --runner http --endpoint http://localhost:8080/run-codemode

Note: showcase is an active CLI command. The removed showcase/ directory was an older repo layout, not the scm showcase command.

Optimization runs

scm optimize --runner mcp-http --max-metric-calls 10
scm optimize --runner mcp-stdio --max-metric-calls 10
scm optimize --runner mcp-stdio --executor-backend monty --max-metric-calls 10
scm optimize --runner mcp-stdio --executor-backend docker --max-metric-calls 10
scm optimize --runner http --endpoint http://localhost:8080/run-codemode --max-metric-calls 10

Save artifacts:

scm showcase --runner mcp-stdio --save-artifact
scm optimize --runner mcp-http --max-metric-calls 10 --save-artifact

When --save-artifact is enabled, SuperCodeMode also writes compact summary files:

  • showcase: comparison_summary, baseline_run_summary, tuned_run_summary
  • optimize: run_summary
  • benchmark: benchmark_summary + per-variant run_summary

Direct MCP connectivity checks

scm mcp-client
scm mcp-client --executor-backend monty
scm mcp-client --executor-backend docker

Strategy benchmark (tool-call vs Code Mode)

scm benchmark --runner mcp-stdio
scm benchmark --runner mcp-stdio --executor-backend monty
scm benchmark --runner mcp-http

This compares three policy profiles on the same runner/dataset:

  • tool_call (naive execution-first policy)
  • codemode_baseline
  • codemode_optimized

πŸ§ͺ Examples

All runnable examples are under examples/.

Recommended starting points:

python examples/showcase_mcp_cloudflare.py
python examples/showcase_mcp_stdio.py
python examples/optimize_mcp_cloudflare.py --max-metric-calls 10
python examples/optimize_mcp_stdio.py --max-metric-calls 10

Real LLM optimization demo (Gemini, low-cost settings):

export GOOGLE_API_KEY=your_key_here
python examples/optimize_gemini_flash.py --max-metric-calls 4

Full example list:

☁️ Cloudflare MCP Notes

  • mcp-http runner defaults to https://mcp.cloudflare.com/mcp
  • Cloudflare MCP may require auth for your usage:
scm showcase --runner mcp-http --auth-bearer "$CODEMODE_TOKEN"
  • Demo scoring can show 0.5 even when integration works if Cloudflare returns structured JSON for search and the metric expects a literal keyword match

In that case, the primary success signal is:

  • case 1 selects search
  • case 2 selects execute
  • case 2 returns 42

🧱 Local, Docker, and Monty Execution

Use Monty for a Python-native sandboxed execution path in demo MCP flows:

scm showcase --runner mcp-stdio --executor-backend monty

Requirements:

  • install pydantic-monty (or pip install "supercodemode[monty]")

Use Docker for safer local code execution in demo MCP flows:

scm showcase --runner mcp-stdio --executor-backend docker

Requirements:

  • Docker daemon running
  • your user can run docker run

πŸ“ˆ Observability

JSONL:

scm --obs-backend jsonl --obs-jsonl-path artifacts/obs.jsonl showcase --runner mcp-stdio

OTLP:

scm --obs-backend otlp --obs-otlp-endpoint http://localhost:4318/v1/traces showcase --runner mcp-stdio

Optional SDK backends (same event schema, best-effort adapters):

scm --obs-backend logfire showcase --runner mcp-stdio
scm --obs-backend mlflow showcase --runner mcp-stdio
scm --obs-backend langsmith showcase --runner mcp-stdio
scm --obs-backend langfuse showcase --runner mcp-stdio

Install optional integrations:

pip install "supercodemode[observability]"

Environment variables (alternative to CLI flags):

  • SCM_OBS_BACKEND=none|jsonl|otlp|logfire|mlflow|langsmith|langfuse
  • SCM_OBS_JSONL_PATH=artifacts/obs.jsonl
  • SCM_OBS_OTLP_ENDPOINT=http://localhost:4318/v1/traces
  • SCM_RUN_ID=demo-run-001
  • SCM_OBS_DATASET_NAME=two_tool_dataset (optional)
  • SCM_OBS_TAGS_JSON='{"env":"dev","team":"research"}' (optional)

Event payloads include GEPA/Code Mode run fields such as selected tool, tool call count, score, and error state, and the saved summary artifacts provide compact rollups for comparisons and quick benchmarking.

CLI commands also stamp command context into events (for example cli_command, cli_runner, and cli_executor_backend) to make JSONL/OTLP filtering easier.

Benchmark and run summaries also include:

  • runtime capability hints (for example local vs docker vs monty constraints)
  • error taxonomy rollups (error_categories) for quick failure analysis

🧠 Relationship to GEPA

This repo is the end-to-end GEPA optimization demo and experimentation harness for the GEPA Code Mode adapter work (examples, CLI, docs, local/docker/monty execution, observability).

GEPA docs (main site): https://gepa-ai.github.io/gepa/

GEPA PR (status may change):

Whether the adapter lands in GEPA mainline now or later, SuperCodeMode can be used directly for GEPA-based optimization of Code Mode behavior.

🚫 What Is Not Included by Default

  • automatic server code mutation
  • automatic deploy pipelines for MCP servers
  • provider-specific server-side optimization logic

This project is focused on client-side behavior optimization and runnable demos.

πŸ“š Documentation

🧰 Development Notes

  • scm uses installed gepa and mcp from your environment
  • a vendored GEPA contribution snapshot exists in vendor/gepa_new_files
  • refresh vendor snapshot with:
    • GEPA_SOURCE_DIR=/path/to/gepa ./scripts/sync_gepa_vendor.sh