Documentation
Outgate AI is a platform for routing AI traffic through a managed gateway with rate limiting, authentication, guardrail enforcement, and full observability.
- Getting Started
- Agent and CLI Usage
- API Usage
- How the Gateway Works
- Router Types
- Failover Router
- Load Balancer (Weighted)
- Smart Router
- Combined (Multi-Layer)
- How Guardrails Work
- Risk Categories
- Anonymization and Rehydration
- Policy Configuration
- Architecture
- Communication Modes
- Stack Components
- Guardrail LLM Configuration
- Data Residency
- Troubleshooting
Getting Started
Set up the gateway in three steps: create a provider, configure authentication, and call the endpoint.
1. Create a provider
Add a provider in the console by selecting OpenAI, Anthropic, or entering a custom URL. Set rate limits and optionally attach a guardrail policy.
Forward Caller Auth — when enabled, the gateway passes the caller's Authorization header straight to the upstream provider. Each user supplies their own API key; no shared secret is stored on the gateway.
2. Create a policy and API key (optional)
If Forward Caller Auth is off, create an access policy (full access or restricted to specific providers) and then create an API key linked to that policy. Copy the key — it will not be shown again.
Skip this step if the provider uses Forward Caller Auth — the caller's own key is used instead.
3. Call the endpoint
Every provider gets a unique gateway endpoint. Copy it from the console and point your client at it.
Agent & CLI Usage
Point any AI tool that supports a custom base URL at your gateway endpoint.
OpenAI Codex
OPENAI_BASE_URL={your-endpoint} codexClaude Code
ANTHROPIC_BASE_URL={your-endpoint} claudeAny OpenAI-compatible tool
Set the base URL environment variable to your gateway endpoint. Works with Cursor, Continue, Aider, and any tool that reads OPENAI_BASE_URL.
If the provider does not use Forward Caller Auth, add the API key via OPENAI_API_KEY or ANTHROPIC_API_KEY accordingly.
API Usage
curl (OpenAI)
curl {your-endpoint}/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {your-api-key}" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'curl (Anthropic)
curl {your-endpoint}/messages \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {your-api-key}" \
-d '{
"model": "claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello"}]
}'OpenAI SDK
from openai import OpenAI
client = OpenAI(
base_url="{your-endpoint}",
api_key="{your-api-key}",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)Anthropic SDK
import anthropic
client = anthropic.Anthropic(
base_url="{your-endpoint}",
api_key="{your-api-key}",
)
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}],
)How It Works
Client ──▶ Gateway ──▶ Rate Limiter ──▶ Auth Check ──▶ Guardrail ──▶ Upstream Provider
429 with X-RateLimit-* headers.Troubleshooting
401Invalid or missing API key. Check the Authorization header.403Policy doesn't allow access to this provider. Check the API key's linked policy.429Rate limit exceeded. Wait for the reset window or increase limits in provider settings.400 / 403 (guardrail)Request blocked by guardrail policy. The response body includes the reason and detected categories.503Provider is inactive. Enable it in the console.502Upstream provider unreachable. Verify the provider URL and that the upstream service is healthy.Router Types
Routers sit in front of multiple upstream providers and direct each request to the right destination. Create a router in the console by selecting two or more providers as upstreams, then choose a routing strategy.
Failover Router
Sends every request to the primary upstream. If the primary returns an error (5xx) or times out, the gateway automatically retries with the next upstream in priority order until one succeeds. If all upstreams fail, the gateway returns 503.
Load Balancer (Weighted Router)
Distributes requests across upstreams using weighted random selection. Each upstream has a weight value (1–100); the probability of selection equals the upstream's weight divided by the total weight of all upstreams.
Smart Router
An LLM evaluates each incoming request and selects the optimal upstream based on quality, speed, and cost scores you define per model. The guardrail service analyzes the request content and picks the best match considering the configured scores.
Combined (Multi-Layer Routing)
Chain routers together by selecting an existing router as an upstream of another router. For example, a smart router can pick between two failover routers, each with its own set of providers. This creates a multi-layer routing topology where each tier applies its own strategy independently.
How It Works
Incoming Request
│
▼
Guardrail Service (LLM analyzes body for all 5 categories)
│
├── BLOCK (blocking category detected)
│ └── Return 403/422/400 error to caller
│
└── ALLOW
│
▼
Anonymize request body (replace sensitive text with tokens)
│
▼
Forward to upstream provider (tokens only, no raw PII)
│
▼
Receive upstream response
│
▼
Rehydrate response body (replace tokens back with original text)
│
▼
Return response to caller (original text restored)blocking: true in the policy, the entire request is blocked. Otherwise it is allowed through.anonymization: true, detected text is replaced with tokens before forwarding. The response is de-anonymized before returning to the client.Risk Categories
Every request is scanned against these 5 categories. Each can be independently configured in a policy.
Anonymization & Rehydration
When a category has blocking: off and anonymization: on, detected sensitive text is replaced with tokens before the request reaches the upstream provider. The response is then rehydrated (de-anonymized) before returning to the caller. The caller and the upstream never see each other's sensitive data in raw form.
Request Phase (Anonymization)
The guardrail service returns an anonymization map — a list of detected text and their replacement tokens. The gateway replaces every occurrence in the request body before forwarding.
# Original request body "Contact me at john@acme.com, my key is sk-abc123" # After anonymization (forwarded to upstream) "Contact me at pii_8f3a2b, my key is cred_e7c4d1" # Anonymization map (stored internally) john@acme.com → pii_8f3a2b sk-abc123 → cred_e7c4d1
Response Phase (Rehydration)
When the upstream responds, the gateway reverses the anonymization map — replacing tokens back with the original text before returning the response to the caller.
# Upstream response (contains tokens) "Sure, I'll contact pii_8f3a2b regarding..." # After rehydration (returned to caller) "Sure, I'll contact john@acme.com regarding..."
Policy Configuration
A policy defines how each risk category is handled. Every policy must configure all 5 categories with three settings:
low, medium, high, or critical. Affects the HTTP status code when blocking (low → 400, medium → 422, high/critical → 403).Default Policy
Auto-created per region. Blocks prompt injection, malicious content, and sensitive data. Anonymizes personal information and credentials without blocking.
Category Severity Blocking Anonymization ───────────────────── ───────── ───────── ───────────── Personal Information low off on Credentials low off on Prompt Injection high on off Malicious Content critical on off Sensitive Data medium on off
Custom Policies
Create custom policies to adjust the behavior per category. For example, a "Strict" policy could block all categories, while a "Permissive" policy could anonymize everything without blocking.
# Example: Strict — block everything Personal Information high on off Credentials critical on off Prompt Injection critical on off Malicious Content critical on off Sensitive Data high on off # Example: Permissive — anonymize, never block Personal Information low off on Credentials low off on Prompt Injection medium off on Malicious Content medium off on Sensitive Data low off on
Troubleshooting
403Request blocked by a high/critical severity category. The response body includes the reason and detected categories.422Request blocked by a medium severity category. Review the policy to adjust severity or switch to anonymization.400Request blocked by a low severity category. Consider switching the category to non-blocking with anonymization.False positivesIf legitimate requests are being blocked, lower the severity or disable blocking for that category. Use anonymization as a middle ground.Guardrail timeoutThe guardrail LLM is slow or unreachable. Check the LLM endpoint in your region's configuration. Requests fail open if the service is down.Architecture
Each region is a self-contained Docker stack that communicates with the central console. The console sends configuration commands (create provider, update rate limits, sync policies) and the region executes them locally.
Console (Global) Region Stack (Your Infrastructure)
┌──────────────┐ ┌─────────────────────────────────┐
│ │ commands (SQS/HTTP) │ Region Agent │
│ Global │ ──────────────────────▶ │ ├── configures API gateway │
│ Stack │ │ ├── syncs policies to Redis │
│ │ ◀────────────────────── │ └── reports health │
│ │ callbacks + heartbeat│ │
└──────────────┘ │ API Gateway │
│ └── routes AI traffic │
│ │
│ Log Manager + Redis │
│ └── metrics, logs, policies │
│ │
│ Guardrail Service │
│ └── content validation (LLM) │
└─────────────────────────────────┘Communication Modes
Regions support two connectivity types. Choose based on whether the region has a public endpoint or runs behind a firewall.
Private (SQS)
The region polls an AWS SQS FIFO queue for commands. No inbound network access required — the region only makes outbound connections. Ideal for regions behind firewalls or NAT.
Public (HTTP)
The console sends commands directly to the region's endpoint via HTTPS. The region agent registers itself as an API gateway service with key-auth and rate limiting. Requires the region to be reachable from the internet.
/api/commands endpoint with HMAC signature. Response is synchronous.Heartbeat
Both modes send periodic heartbeats (default: every hour) with stack health, service versions, and uptime. The console uses heartbeats to display region status and trigger automatic credential rotation.
Stack Components
Guardrail LLM Configuration
The guardrail service needs an LLM to analyze request content. You can use a local Ollama instance or any OpenAI-compatible endpoint. This is configured during region creation.
Local Ollama
Run Ollama on the same machine as the region stack. The guardrail service reaches it via host.docker.internal.
# Install and start Ollama ollama serve # Pull a model ollama pull gpt-oss:120b # When creating the region, configure: # Provider: Ollama # Endpoint: http://host.docker.internal:11434 # Model: gpt-oss:120b
OpenAI-compatible endpoint
Use any provider that exposes an OpenAI-compatible chat completions API. Requires an API token.
# When creating the region, configure: # Provider: OpenAI-compatible # Endpoint: https://api.openai.com/v1 # Model: gpt-4o-mini # Token: sk-...
The guardrail LLM is only used for content moderation within the guardrail service. It is separate from the AI providers you route traffic to through the gateway.
Data Residency
Logs, audit trails, and metrics data stays entirely within the region stack. The global stack acts as a pass-through proxy for the console to visualize this data — it does not store, cache, or retain any of it. When you query logs or metrics in the console, the request is forwarded to the region's log manager in real time and the response is streamed back without persistence.
Troubleshooting
offlineNo heartbeat received. Check that the region stack is running and can reach the console backend.degradedHeartbeat received but one or more services are unhealthy. Check the Stack tab in region details for per-service status.SQS errorsFor private regions, verify AWS credentials and queue URL. Credentials are auto-rotated — restart the stack if rotation was missed.Guardrail downEnsure the LLM endpoint is reachable from inside Docker. For Ollama, use host.docker.internal (not localhost).Gateway unhealthyCheck that PostgreSQL is running and the migration completed. View logs with docker compose logs.