LLM Interactive Proxy

A swiss-army knife proxy that sits between your LLM client and provider—giving you a universal adapter, cost optimization, and full visibility with zero code changes.

Quick Start

1. Installation

git clone https://github.com/matdev83/llm-interactive-proxy.git
cd llm-interactive-proxy
python -m venv .venv
source .venv/Scripts/activate  # Windows: .venv\Scripts\activate
pip install -e .[dev]

2. Start the Proxy

export OPENAI_API_KEY="your-key-here"
python -m src.core.cli --default-backend openai:gpt-4o

3. Point Your Client at the Proxy

# Instead of direct API calls:
from openai import OpenAI
client = OpenAI(api_key="your-key")

# Use the proxy (base_url only):
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="dummy-key"  # Proxy handles real authentication
)

# Now use normally - requests go through the proxy
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

That's it. All your existing code works unchanged—the proxy handles routing, translation, and monitoring transparently.

See Quick Start Guide for detailed configuration.

Why Use LLM Interactive Proxy?

One configuration. Any client. Any provider.

Stop rewriting your code every time you want to try a different LLM. Stop managing API keys in a dozen different tools. Stop wondering why your agent is stuck in an infinite loop or why your API bill suddenly spiked.

Solve Real Problems

Tired of juggling multiple LLM subscriptions?
Connect all your premium accounts—GPT Plus/Pro, Gemini Advanced, Qwen, GLM Code, and more—through one endpoint. Use them all without switching tools.

Worried about agent misbehavior?
Fix stuck agents with automatic loop detection. Reduce token costs with intelligent context compression. Get a second opinion mid-conversation by switching models seamlessly.

Need more control over what LLMs actually do?
Rewrite prompts and responses on-the-fly without touching client code. Block dangerous git commands before they execute. Add a "guardian angel" model that monitors and helps when your primary model drifts off track.

Want visibility into what's happening?
Capture every request and response in CBOR format. Debug issues, audit usage, and understand exactly what your LLM apps are doing.

Zero changes to your client code. Just point it at the proxy and gain control.

Key Capabilities

Universal Connectivity

Protocol Translation — Use OpenAI SDK with Anthropic, Claude client with Gemini, any combination
Subscription Consolidation — Leverage all your premium LLM accounts through one endpoint
Flexible Deployment — Single-user mode for development, multi-user mode for production

Cost & Performance Optimization

Smart Routing — Rotate API keys to maximize free tiers, automatically fallback to cheaper models
Context Window Compression — Reduce token usage and improve inference speed without losing quality
Full Observability — Wire capture, usage tracking, token counting, performance metrics

Intelligent Session Control

Loop Detection — Automatically detect and resolve infinite loops and repetitive patterns
Dynamic Model Switching — Change models mid-conversation for diverse perspectives without losing context
Quality Verifier — Deploy a secondary model to verify responses when the primary model struggles

Behavioral Customization

Prompt & Response Rewriting — Modify content on-the-fly to fine-tune agent behavior
Tool Call Reactors — Override and intercept tool calls to suppress unwanted behaviors
Usage Limits — Enforce quotas and control resource consumption

Security & Safety

Key Isolation — Configure API keys once, never expose them to clients
Directory Sandboxing — Restrict LLM tool access to designated safe directories
Command Protection — Block harmful operations like aggressive git commands
Tool Access Control — Fine-grained control over which tools LLMs can invoke

Enterprise Features

B2BUA Session Isolation — Internal session identity generation and strict trust boundaries (enabled by default; use --disable-b2bua-session-handling to opt out)

See User Guide for the complete feature list.

Routing Selector Semantics

backend:model selects an explicit backend family.
backend-instance:model (for example openai.1:gpt-4o) targets a concrete backend instance.
model and vendor/model are model-only selectors.
vendor/model:variant remains model-only (the : suffix is part of the model payload unless : appears before the first /).
URI-style parameters in selectors (for example model?temperature=0.5) are parsed and propagated through routing metadata.
Explicit-backend configuration and command surfaces (for example --static-route, replacement targets, and one-off routing) require strict backend:model format.

Architecture

graph TD
    subgraph "Clients"
        A[OpenAI Client]
        B[OpenAI Responses API Client]
        C[Anthropic Client]
        D[Gemini Client]
        E[Any LLM App]
    end

    subgraph "LLM Interactive Proxy"
        FE["Front-end APIs<br/>(OpenAI, Anthropic, Gemini)"]
        Core["Core Proxy Logic<br/>(Routing, Translation, Safety)"]
        BE["Back-end Connectors<br/>(OpenAI, Anthropic, Gemini, etc.)"]
        FE --> Core --> BE
    end

    subgraph "Providers"
        P1[OpenAI API]
        P2[Anthropic API]
        P3[Google Gemini API]
        P4[OpenRouter API]
    end

    A --> FE
    B --> FE
    C --> FE
    D --> FE
    BE --> P1
    BE --> P2
    BE --> P3
    BE --> P4

Documentation

User Guide - Feature documentation, configuration, backends, debugging
Development Guide - Architecture, building, testing, contributing
Configuration Guide - Complete parameter reference
CHANGELOG - Version history and updates
CONTRIBUTING - Contribution guidelines

Supported Front-end Interfaces

The proxy exposes multiple standard API surfaces, allowing you to use your favorite clients with any backend:

OpenAI Chat Completions (/v1/chat/completions) - Compatible with OpenAI SDKs and most tools.
Reasoning-model token floor guard - For reasoning-first models (e.g. openrouter:stepfun/step-3.5-flash:free, kimi-code:kimi/kimi-for-coding), explicit low max_tokens/max_completion_tokens values are raised to a configurable minimum (default 512) to prevent empty assistant messages. Configure via reasoning_model_token_floor in app config.
OpenAI Responses (/v1/responses) - Optimized for structured output generation.
OpenAI Models (/v1/models) - Canonical backend-agnostic model discovery from the capability index (canonical vendor/model IDs only).
Anthropic Messages (/anthropic/v1/messages) - Native support for Claude clients/SDKs.
Dedicated Anthropic Server (http://host:8001/v1/messages) - Drop-in replacement for Anthropic API on a separate port (default: 8001).
Google Gemini v1beta (/v1beta/models, :generateContent) - Native support for Gemini tools.
Routing Error Parity - Dynamic routing failures are emitted in protocol-native error envelopes while preserving canonical details.code and details.retryable semantics across OpenAI, Anthropic, and Gemini surfaces.

See Front-End APIs Overview for more details.

Diagnostics Endpoint (/v1/diagnostics) includes bounded routing metadata: availability status per backend instance (active, rate_limited, disabled), canonical model-to-eligible-instance summaries, preference/tie-set diagnostics, and deterministic truncation metadata.
Reactivation Control Endpoint (/v1/diagnostics/backends/{backend_instance}/reactivate) explicitly reactivates disabled backend instances and can optionally clear permanent unsupported (instance, model) state.

Supported Backends

OpenAI (Legacy) (GPT-4, GPT-4o, o1, standard Chat Completions)
OpenAI Responses API (Optimized for structured output generation)
Anthropic (Claude 3.5 Sonnet, Opus, Haiku)
Google Gemini (API Key, OAuth, GCP, Vertex AI, Auto-OAuth)
OpenRouter (Access to 100+ models)
ZAI (Zhipu AI) (GLM models, including support for the GLM Coding Plan)
Alibaba Qwen (Coding-optimized LLM models)
MiniMax (Hailuo AI reasoning models)
InternLM (InternLM AI models with API key rotation)
ZenMux (Unified model aggregator)
Moonshot AI (Kimi models, including Kimi Code for coding)
Cline (Specialized debugging backend)
Hybrid (Virtual backend for two-phase reasoning)
Antigravity (Internal debugging backends for Gemini/Claude)

See Backends Overview for full details and configuration.

Access Modes

The proxy supports two operational modes to enforce appropriate security boundaries:

Single User Mode (default): For local development. Allows OAuth connectors, optional authentication, localhost-only binding.
Multi User Mode: For production/shared deployments. Blocks OAuth connectors, requires authentication for remote access, allows any IP binding.

Quick Examples

# Single User Mode (default) - local development
./.venv/Scripts/python.exe -m src.core.cli

# Multi User Mode - production deployment
./.venv/Scripts/python.exe -m src.core.cli --multi-user-mode --host=0.0.0.0 --api-keys key1,key2

See Access Modes User Guide for detailed documentation.

Support

GitHub Issues - Report bugs or request features
Discussions - Ask questions and share ideas

License

This project is licensed under the GNU AGPL v3.0 or later.

Development

# Run tests
python -m pytest

# Run linter
python -m ruff --fix check .

# Format code
python -m black .

# Validate unified outbound routing compliance (same check as CI gate)
python dev/scripts/check_routing_unification_compliance.py

See Development Guide for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 2,736 Commits
.factory/commands		.factory/commands
.gemini		.gemini
.github		.github
.kiro		.kiro
config		config
dev		dev
docs		docs
examples		examples
scripts		scripts
src		src
stubs		stubs
tests		tests
var		var
.ckignore		.ckignore
.coderabbit.yaml		.coderabbit.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pymarkdown.json		.pymarkdown.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
alembic.example.ini		alembic.example.ini
alembic.ini		alembic.ini
check_tool_duplication.py		check_tool_duplication.py
codecov.yml		codecov.yml
entry_28147.json		entry_28147.json
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
pyrightconfig.src.json		pyrightconfig.src.json
setup.py		setup.py
vulture_suppressions.ini		vulture_suppressions.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Interactive Proxy

Quick Start

1. Installation

2. Start the Proxy

3. Point Your Client at the Proxy

Why Use LLM Interactive Proxy?

Solve Real Problems

Key Capabilities

Universal Connectivity

Cost & Performance Optimization

Intelligent Session Control

Behavioral Customization

Security & Safety

Enterprise Features

Routing Selector Semantics

Architecture

Documentation

Supported Front-end Interfaces

Supported Backends

Access Modes

Quick Examples

Support

License

Development

About

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

License

matdev83/llm-interactive-proxy

Folders and files

Latest commit

History

Repository files navigation

LLM Interactive Proxy

Quick Start

1. Installation

2. Start the Proxy

3. Point Your Client at the Proxy

Why Use LLM Interactive Proxy?

Solve Real Problems

Key Capabilities

Universal Connectivity

Cost & Performance Optimization

Intelligent Session Control

Behavioral Customization

Security & Safety

Enterprise Features

Routing Selector Semantics

Architecture

Documentation

Supported Front-end Interfaces

Supported Backends

Access Modes

Quick Examples

Support

License

Development

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 8

Uh oh!

Languages