Documentation

Outgate AI is a platform for routing AI traffic through a managed gateway with rate limiting, authentication, guardrail enforcement, and full observability.

Contents

AI Gateway

Getting Started
Agent and CLI Usage
API Usage
How the Gateway Works

Routing

Router Types
Failover Router
Load Balancer (Weighted)
Smart Router
Combined (Multi-Layer)

Guardrails

How Guardrails Work
Risk Categories
Anonymization and Rehydration
Policy Configuration

Regions

Architecture
Communication Modes
Stack Components
Guardrail LLM Configuration
Data Residency

Reference

Troubleshooting

Getting Started

Set up the gateway in three steps: create a provider, configure authentication, and call the endpoint.

1. Create a provider

Add a provider in the console by selecting OpenAI, Anthropic, or entering a custom URL. Set rate limits and optionally attach a guardrail policy.

Forward Caller Auth — when enabled, the gateway passes the caller's Authorization header straight to the upstream provider. Each user supplies their own API key; no shared secret is stored on the gateway.

2. Create a policy and API key (optional)

If Forward Caller Auth is off, create an access policy (full access or restricted to specific providers) and then create an API key linked to that policy. Copy the key — it will not be shown again.

Skip this step if the provider uses Forward Caller Auth — the caller's own key is used instead.

3. Call the endpoint

Every provider gets a unique gateway endpoint. Copy it from the console and point your client at it.

Agent & CLI Usage

Point any AI tool that supports a custom base URL at your gateway endpoint.

OpenAI Codex

OPENAI_BASE_URL={your-endpoint} codex

Claude Code

ANTHROPIC_BASE_URL={your-endpoint} claude

Any OpenAI-compatible tool

Set the base URL environment variable to your gateway endpoint. Works with Cursor, Continue, Aider, and any tool that reads OPENAI_BASE_URL.

If the provider does not use Forward Caller Auth, add the API key via OPENAI_API_KEY or ANTHROPIC_API_KEY accordingly.

API Usage

curl (OpenAI)

curl {your-endpoint}/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {your-api-key}" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

curl (Anthropic)

curl {your-endpoint}/messages \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer {your-api-key}" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "Hello"}]
  }'

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="{your-endpoint}",
    api_key="{your-api-key}",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Anthropic SDK

import anthropic

client = anthropic.Anthropic(
    base_url="{your-endpoint}",
    api_key="{your-api-key}",
)

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello"}],
)

How It Works

Client ──▶ Gateway ──▶ Rate Limiter ──▶ Auth Check ──▶ Guardrail ──▶ Upstream Provider

AuthThe API key identifies the consumer. The linked policy determines which providers the key can reach. An API key can be added to the provider or forwarded from the caller.

Rate LimitsHourly and daily caps per consumer. When exceeded the gateway returns 429 with X-RateLimit-* headers.

GuardrailsOptional per-provider. When attached, request content is scanned for PII, prompt injection, credentials and malicious content before reaching the upstream.

LoggingEnable per-provider request/response body logging.

Troubleshooting

401Invalid or missing API key. Check the Authorization header.

403Policy doesn't allow access to this provider. Check the API key's linked policy.

429Rate limit exceeded. Wait for the reset window or increase limits in provider settings.

400 / 403 (guardrail)Request blocked by guardrail policy. The response body includes the reason and detected categories.

503Provider is inactive. Enable it in the console.

502Upstream provider unreachable. Verify the provider URL and that the upstream service is healthy.

Router Types

Routers sit in front of multiple upstream providers and direct each request to the right destination. Create a router in the console by selecting two or more providers as upstreams, then choose a routing strategy.

Failover Router

Sends every request to the primary upstream. If the primary returns an error (5xx) or times out, the gateway automatically retries with the next upstream in priority order until one succeeds. If all upstreams fail, the gateway returns 503.

Use caseHigh availability and disaster recovery. Keep your service running even when a provider has an outage.

ConfigurationOrder upstreams by priority (first = primary). Optionally set a timeout per upstream in seconds.

PlanAvailable on all plans (Free, Plus, Pro).

Load Balancer (Weighted Router)

Distributes requests across upstreams using weighted random selection. Each upstream has a weight value (1–100); the probability of selection equals the upstream's weight divided by the total weight of all upstreams.

Use caseGradual model rollouts, cost optimization across providers, or A/B testing different models on live traffic.

ConfigurationAssign a weight (1–100) to each upstream. Default weight is 50. The console shows the calculated percentage.

PlanAvailable on all plans (Free, Plus, Pro).

Smart Router

An LLM evaluates each incoming request and selects the optimal upstream based on quality, speed, and cost scores you define per model. The guardrail service analyzes the request content and picks the best match considering the configured scores.

Use caseIntelligent routing where simple coding tasks go to a fast, cheap model while complex reasoning tasks go to a high-quality model — automatically.

ConfigurationSet three scores per upstream: Quality (0–10), Speed (0–10), and Cost (0–10). Higher values indicate better performance in that dimension.

RequirementRequires a guardrail-enabled region. The guardrail service handles the routing decision.

PlanAvailable on Plus and Pro plans.

Combined (Multi-Layer Routing)

Chain routers together by selecting an existing router as an upstream of another router. For example, a smart router can pick between two failover routers, each with its own set of providers. This creates a multi-layer routing topology where each tier applies its own strategy independently.

Use caseEnterprise architectures that need both intelligent selection and high availability — the smart router picks the best tier, and each tier has its own failover chain.

ConfigurationCreate the inner routers first (e.g. two failover routers), then create an outer router that uses them as upstreams.

How It Works

Incoming Request
    │
    ▼
Guardrail Service (LLM analyzes body for all 5 categories)
    │
    ├── BLOCK (blocking category detected)
    │     └── Return 403/422/400 error to caller
    │
    └── ALLOW
          │
          ▼
    Anonymize request body (replace sensitive text with tokens)
          │
          ▼
    Forward to upstream provider (tokens only, no raw PII)
          │
          ▼
    Receive upstream response
          │
          ▼
    Rehydrate response body (replace tokens back with original text)
          │
          ▼
    Return response to caller (original text restored)

DetectionThe guardrail LLM analyzes the request body against all 5 risk categories simultaneously. Each detection includes the matched text and its category.

DecisionIf any detected category has blocking: true in the policy, the entire request is blocked. Otherwise it is allowed through.

AnonymizationFor non-blocking categories with anonymization: true, detected text is replaced with tokens before forwarding. The response is de-anonymized before returning to the client.

Risk Categories

Every request is scanned against these 5 categories. Each can be independently configured in a policy.

Personal InformationNames, email addresses, phone numbers, and other PII. Default: not blocking, anonymized — sensitive text is replaced before reaching the upstream.

CredentialsAPI keys, passwords, auth tokens, and secrets. Default: not blocking, anonymized — credentials are masked in the forwarded request.

Prompt InjectionJailbreak attempts, role-switching, prompt leaks, filter bypasses. Default: blocking — request is rejected immediately.

Malicious ContentSQL injection, XSS, command injection, path traversal. Default: blocking — request is rejected immediately.

Sensitive DataGeneral sensitive data exposure. Default: blocking — request is rejected immediately.

Anonymization & Rehydration

When a category has blocking: off and anonymization: on, detected sensitive text is replaced with tokens before the request reaches the upstream provider. The response is then rehydrated (de-anonymized) before returning to the caller. The caller and the upstream never see each other's sensitive data in raw form.

Request Phase (Anonymization)

The guardrail service returns an anonymization map — a list of detected text and their replacement tokens. The gateway replaces every occurrence in the request body before forwarding.

# Original request body
"Contact me at john@acme.com, my key is sk-abc123"

# After anonymization (forwarded to upstream)
"Contact me at pii_8f3a2b, my key is cred_e7c4d1"

# Anonymization map (stored internally)
john@acme.com  →  pii_8f3a2b
sk-abc123      →  cred_e7c4d1

Response Phase (Rehydration)

When the upstream responds, the gateway reverses the anonymization map — replacing tokens back with the original text before returning the response to the caller.

# Upstream response (contains tokens)
"Sure, I'll contact pii_8f3a2b regarding..."

# After rehydration (returned to caller)
"Sure, I'll contact john@acme.com regarding..."

ScopeOnly applies to categories with anonymization enabled. Blocking categories skip anonymization entirely — the request never reaches the upstream.

TransparencyThe caller receives the original response with all text restored. The upstream only ever sees tokenized content — no raw sensitive data is exposed.

StreamingRehydration requires buffering the response body. When anonymization is active, streaming is disabled for that request.

Policy Configuration

A policy defines how each risk category is handled. Every policy must configure all 5 categories with three settings:

SeverityPriority level: low, medium, high, or critical. Affects the HTTP status code when blocking (low → 400, medium → 422, high/critical → 403).

BlockingWhen enabled, any detection in this category rejects the entire request. When disabled, the request is allowed through (with optional anonymization).

AnonymizationWhen enabled (and blocking is off), detected text is replaced with tokens in the forwarded request. The response is de-anonymized before returning to the caller. Not available when blocking is on.

Default Policy

Auto-created per region. Blocks prompt injection, malicious content, and sensitive data. Anonymizes personal information and credentials without blocking.

Category               Severity   Blocking   Anonymization
─────────────────────  ─────────  ─────────  ─────────────
Personal Information   low        off        on
Credentials            low        off        on
Prompt Injection       high       on         off
Malicious Content      critical   on         off
Sensitive Data         medium     on         off

Custom Policies

Create custom policies to adjust the behavior per category. For example, a "Strict" policy could block all categories, while a "Permissive" policy could anonymize everything without blocking.

# Example: Strict — block everything
Personal Information   high       on         off
Credentials            critical   on         off
Prompt Injection       critical   on         off
Malicious Content      critical   on         off
Sensitive Data         high       on         off

# Example: Permissive — anonymize, never block
Personal Information   low        off        on
Credentials            low        off        on
Prompt Injection       medium     off        on
Malicious Content      medium     off        on
Sensitive Data         low        off        on

Troubleshooting

403Request blocked by a high/critical severity category. The response body includes the reason and detected categories.

422Request blocked by a medium severity category. Review the policy to adjust severity or switch to anonymization.

400Request blocked by a low severity category. Consider switching the category to non-blocking with anonymization.

False positivesIf legitimate requests are being blocked, lower the severity or disable blocking for that category. Use anonymization as a middle ground.

Guardrail timeoutThe guardrail LLM is slow or unreachable. Check the LLM endpoint in your region's configuration. Requests fail open if the service is down.

Architecture

Each region is a self-contained Docker stack that communicates with the central console. The console sends configuration commands (create provider, update rate limits, sync policies) and the region executes them locally.

Console (Global)                          Region Stack (Your Infrastructure)
┌──────────────┐                         ┌─────────────────────────────────┐
│              │    commands (SQS/HTTP)  │  Region Agent                   │
│   Global     │ ──────────────────────▶ │    ├── configures API gateway   │
│   Stack      │                         │    ├── syncs policies to Redis  │
│              │ ◀────────────────────── │    └── reports health           │
│              │    callbacks + heartbeat│                                 │
└──────────────┘                         │  API Gateway                    │
                                         │    └── routes AI traffic        │
                                         │                                 │
                                         │  Log Manager + Redis            │
                                         │    └── metrics, logs, policies  │
                                         │                                 │
                                         │  Guardrail Service              │
                                         │    └── content validation (LLM) │
                                         └─────────────────────────────────┘

Communication Modes

Regions support two connectivity types. Choose based on whether the region has a public endpoint or runs behind a firewall.

Private (SQS)

The region polls an AWS SQS FIFO queue for commands. No inbound network access required — the region only makes outbound connections. Ideal for regions behind firewalls or NAT.

CommandsConsole pushes to SQS queue. Region agent polls with 20s long-polling, processes, then sends a signed callback.

CredentialsScoped IAM credentials provisioned automatically. Rotated every 30 days via heartbeat response.

Public (HTTP)

The console sends commands directly to the region's endpoint via HTTPS. The region agent registers itself as an API gateway service with key-auth and rate limiting. Requires the region to be reachable from the internet.

CommandsConsole POSTs to the region's /api/commands endpoint with HMAC signature. Response is synchronous.

SecurityRequests authenticated via HMAC-SHA256 signature and key-auth plugin. Rate limited to 120 req/min.

Heartbeat

Both modes send periodic heartbeats (default: every hour) with stack health, service versions, and uptime. The console uses heartbeats to display region status and trigger automatic credential rotation.

Stack Components

API GatewayRoutes AI traffic. Handles rate limiting, authentication, request transformation, and guardrail enforcement. PostgreSQL-backed.

Region AgentReceives commands from the console and configures the API gateway accordingly — creates services, routes, upstreams, plugins. Also manages guardrail script injection.

Log ManagerCollects request/response logs from the API gateway via http-log plugin. Stores in Redis streams. Serves metrics, log queries, and guardrail policy lookups.

GuardrailValidates request content using an LLM. Detects PII, credentials, prompt injection, and malicious content. Returns anonymization maps for sensitive data replacement.

RedisBacking store for log manager — holds log streams, metrics aggregations, and guardrail policy configs. Persistent volume.

Guardrail LLM Configuration

The guardrail service needs an LLM to analyze request content. You can use a local Ollama instance or any OpenAI-compatible endpoint. This is configured during region creation.

Local Ollama

Run Ollama on the same machine as the region stack. The guardrail service reaches it via host.docker.internal.

# Install and start Ollama
ollama serve

# Pull a model
ollama pull gpt-oss:120b

# When creating the region, configure:
#   Provider: Ollama
#   Endpoint: http://host.docker.internal:11434
#   Model:    gpt-oss:120b

OpenAI-compatible endpoint

Use any provider that exposes an OpenAI-compatible chat completions API. Requires an API token.

# When creating the region, configure:
#   Provider: OpenAI-compatible
#   Endpoint: https://api.openai.com/v1
#   Model:    gpt-4o-mini
#   Token:    sk-...

The guardrail LLM is only used for content moderation within the guardrail service. It is separate from the AI providers you route traffic to through the gateway.

Data Residency

Logs, audit trails, and metrics data stays entirely within the region stack. The global stack acts as a pass-through proxy for the console to visualize this data — it does not store, cache, or retain any of it. When you query logs or metrics in the console, the request is forwarded to the region's log manager in real time and the response is streamed back without persistence.

StorageAll request logs, response bodies, guardrail decisions, and metrics are stored in the region's Redis instance — never in the global stack.

VisualizationThe console proxies queries through the global stack to the region's log manager. No data is retained at the global level.

RetentionData retention is controlled by the region's Redis configuration. The global stack has zero data retention for logs, trails, or metrics.

Troubleshooting

offlineNo heartbeat received. Check that the region stack is running and can reach the console backend.

degradedHeartbeat received but one or more services are unhealthy. Check the Stack tab in region details for per-service status.

SQS errorsFor private regions, verify AWS credentials and queue URL. Credentials are auto-rotated — restart the stack if rotation was missed.

Guardrail downEnsure the LLM endpoint is reachable from inside Docker. For Ollama, use host.docker.internal (not localhost).

Gateway unhealthyCheck that PostgreSQL is running and the migration completed. View logs with docker compose logs.