Skip to main content

Documentation Index

Fetch the complete documentation index at: https://prismeai-docs-next.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Model Governance lets you control which AI models are available to your organization, set usage policies, configure routing strategies, and monitor consumption. Access these features from Agents Controls in the sidebar.

Overview

The Agents Controls section has four tabs:
TabDescription
ModelsConfigure allowed models and policies
UsageMonitor consumption against quotas
Service AccountsMachine-to-machine authentication
AgentsPer-agent model restrictions

Models Configuration

Allowed Models

By default, organizations can use all models enabled by the platform administrator. You can restrict this to a specific list.
  1. Go to Agents Controls > Models
  2. Toggle Restrict Models
  3. Select which models to allow
  4. Click Save

Default Models

Set the default models used when agents don’t specify one:
SettingDescription
Default Completion ModelUsed for chat and text generation
Default Embedding ModelUsed for vector embeddings

Quota Policy

Configure what happens when quota limits are reached:
PolicyBehavior
Hard BlockRequests fail with quota exceeded error
Soft DowngradeFall back to a cheaper model

Downgrade Mapping

When using soft downgrade, configure which models to substitute:
claude-3-opus → claude-3-sonnet
gpt-4 → gpt-3.5-turbo
This ensures continuity when expensive models hit quota limits.

Failover Mapping

Configure automatic failover when a model is unavailable:
claude-3-sonnet → claude-3-haiku
gpt-4 → gpt-4-turbo
Failover activates when the primary model returns errors, not for quota limits. Failover behavior:
  • 5xx errors: switches to the failover model from the mapping, or falls back to the default completion model (with linear backoff 1s, 2s, 3s…)
  • 429 rate limits: retries the same model after a 5s backoff
  • Other 4xx errors: returned immediately without retry
  • Up to 3 attempts (configurable, hard cap of 10)
  • Failover models are validated against governance access controls before use
  • Failover only applies to non-streaming requests

Model Routing

The LLM Gateway supports intelligent model routing — selecting the best model for a request based on configurable strategies. Use model: "auto" in API calls to trigger routing.

Routing Strategies

StrategyDescription
DisabledNo automatic routing; use specified model
RulesRule-based routing — first matching rule wins
LLM ClassifierUse a cheap LLM to classify the request and map category to model
CapabilitiesQuery model catalog for enabled models matching required tags
Cost OptimizedSame as capabilities but iterates cost tiers (lowmediumhigh) to pick cheapest match
HybridTry rules first, fall back to LLM classifier if no rule matches

Rule-Based Routing

Define conditions to route requests to different models:
routing:
  strategy: rules
  rules:
    - condition: messages_count
      operator: "<"
      threshold: 5
      model: claude-3-haiku
    - condition: messages_count
      operator: ">="
      threshold: 5
      model: claude-3-sonnet
The only supported condition type is messages_count with operators <, >, <=, >=, =. If the selected model is blocked by governance, routing falls back to the default.

Usage Monitoring

The Usage tab shows consumption against your subscription quotas.

Tracked Metrics

MetricTypeDescription
llm.requests.rpmRateRequests per minute
llm.requests.dailyRateRequests per day
llm.tokens.monthlyCumulativeTotal tokens this month
llm.cost.monthlyCumulativeTotal cost this month

Understanding Quotas

Quotas are defined in your subscription:
  • Rate limits reset after the time window (minute, hour, day)
  • Cumulative limits accumulate until the billing period resets

Usage Display

Each metric shows:
  • Current value vs. limit
  • Percentage consumed
  • Visual progress bar (yellow at 80%, red at 95%)

Per-Agent Model Restrictions

The Agents tab lets you restrict which models specific agents can use.

Why Restrict Agents?

  • Cost control: Limit expensive model usage to specific agents
  • Compliance: Ensure sensitive agents only use approved models
  • Testing: Restrict test agents to cheaper models

Configuring Agent Models

  1. Go to Agents Controls > Agents
  2. Find the agent to configure
  3. Click Configure Models
  4. Select allowed models (or leave empty for org defaults)
  5. Save changes

Service Accounts

Service accounts provide machine-to-machine authentication. See Identity & Access for details. In the context of Model Governance:
  • Link service accounts to specific agents
  • Track which service accounts consume LLM resources
  • Control model access per service account

Model Access Control

When a request arrives, the LLM Gateway validates the requested model against three sequential allowlists:
  1. Org allowlist — Is the model in the organization’s allowed models? (if the list exists and has entries)
  2. Agent allowlist — Is the model in the agent’s allowed models? (passed by agent-factory)
  3. API key scopes — Is the model allowed by the API key’s scopes?
Each check is skipped if the corresponding list is empty or absent. If any check fails, the behavior depends on the quota policy:
  • Soft Downgrade: silently swaps to the default completion model
  • Hard Block (default): returns a 403 error with MODEL_NOT_ALLOWED

Carbon Footprint Tracking

Every LLM call includes an estimated environmental impact in the response:
{
  "usage": {
    "carbon": {
      "energy": { "value": 0.000012, "unit": "kWh" },
      "gwp": { "value": 0.0000057, "unit": "kgCO2eq" }
    }
  }
}
The calculation considers GPU energy per token, server overhead, datacenter PUE (Power Usage Effectiveness), and regional emission factors. Results include uncertainty margins (+/- 20%).
PUE ProfileMultiplier
Efficient1.1
Average (default)1.58
Inefficient2.0
Carbon data is included in analytics events and powers the observability dashboards.

Supported Providers

The LLM Gateway abstracts multiple providers behind a unified API:
ProviderModelsNotes
OpenAIGPT-5, GPT-4o, o3-mini, embeddings, DALL-E 3Direct API
Azure OpenAIGPT-5, GPT-4o, embeddings, Claude (via Azure AI)Multiple resource configs
OpenAI-compatibleGemini, DeepSeek, Mistral, Cerebras, OVH, LinagoraVia openailike provider type
AnthropicClaude Sonnet 4.5, Claude 3.7 Sonnet, Claude 3.5 SonnetNative API
Google Vertex AIGemini 2.5/3, Imagen 4.0, text-embedding-005Via model aliases + service account
AWS BedrockClaude, Titan, Cohere, Nova, LlamaMultiple region/credential sets
All providers are normalized to the OpenAI API format for chat completions and streaming (SSE).

Best Practices

Start Restrictive

Begin with a limited model list and expand based on need

Use Soft Downgrade

Prefer soft downgrade to maintain service during quota limits

Monitor Usage

Set alerts before hitting quota limits

Configure Failover

Ensure critical workflows have failover models

Common Scenarios

To minimize costs while maintaining quality:
  1. Enable Soft Downgrade policy
  2. Configure downgrade mapping:
    • claude-3-opusclaude-3-sonnet
    • gpt-4gpt-3.5-turbo
  3. Set conservative monthly token limits
  4. Use rule-based routing to prefer cheaper models for simple requests

Next Steps

Capabilities

Manage tools, MCP servers, and guardrails

Observability

Monitor model costs and performance