Model Governance

Model Governance lets you control which AI models are available to your organization, set usage policies, configure routing strategies, and monitor consumption. Access these features from Agents Controls in the sidebar.

Overview

The Agents Controls section has four tabs:

Tab	Description
Models	Configure allowed models and policies
Usage	Monitor consumption against quotas
Service Accounts	Machine-to-machine authentication
Agents	Per-agent model restrictions

Models Configuration

Allowed Models

By default, organizations can use all models enabled by the platform administrator. You can restrict this to a specific list.

Go to Agents Controls > Models
Toggle Restrict Models
Select which models to allow
Click Save

Default Models

Set the default models used when agents don’t specify one:

Setting	Description
Default Completion Model	Used for chat and text generation
Default Embedding Model	Used for vector embeddings

Quota Policy

Configure what happens when quota limits are reached:

Policy	Behavior
Hard Block	Requests fail with quota exceeded error
Soft Downgrade	Fall back to a cheaper model

Downgrade Mapping

When using soft downgrade, configure which models to substitute:

claude-3-opus → claude-3-sonnet
gpt-4 → gpt-3.5-turbo

This ensures continuity when expensive models hit quota limits.

Failover Mapping

Configure automatic failover when a model is unavailable:

claude-3-sonnet → claude-3-haiku
gpt-4 → gpt-4-turbo

Failover activates when the primary model returns errors, not for quota limits. Failover behavior:

5xx errors: switches to the failover model from the mapping, or falls back to the default completion model (with linear backoff 1s, 2s, 3s…)
429 rate limits: retries the same model after a 5s backoff
Other 4xx errors: returned immediately without retry
Up to 3 attempts (configurable, hard cap of 10)
Failover models are validated against governance access controls before use
Failover only applies to non-streaming requests

Model Routing

The LLM Gateway supports intelligent model routing — selecting the best model for a request based on configurable strategies. Use model: "auto" in API calls to trigger routing.

Routing Strategies

Strategy	Description
Disabled	No automatic routing; use specified model
Rules	Rule-based routing — first matching rule wins
LLM Classifier	Use a cheap LLM to classify the request and map category to model
Capabilities	Query model catalog for enabled models matching required tags
Cost Optimized	Same as capabilities but iterates cost tiers (`low` → `medium` → `high`) to pick cheapest match
Hybrid	Try rules first, fall back to LLM classifier if no rule matches

Rule-Based Routing

Define conditions to route requests to different models:

routing:
  strategy: rules
  rules:
    - condition: messages_count
      operator: "<"
      threshold: 5
      model: claude-3-haiku
    - condition: messages_count
      operator: ">="
      threshold: 5
      model: claude-3-sonnet

The only supported condition type is messages_count with operators <, >, <=, >=, =. If the selected model is blocked by governance, routing falls back to the default.

Usage Monitoring

The Usage tab shows consumption against your subscription quotas.

Tracked Metrics

Metric	Type	Description
`llm.requests.rpm`	Rate	Requests per minute
`llm.requests.daily`	Rate	Requests per day
`llm.tokens.monthly`	Cumulative	Total tokens this month
`llm.cost.monthly`	Cumulative	Total cost this month

Understanding Quotas

Quotas are defined in your subscription:

Rate limits reset after the time window (minute, hour, day)
Cumulative limits accumulate until the billing period resets

Usage Display

Each metric shows:

Current value vs. limit
Percentage consumed
Visual progress bar (yellow at 80%, red at 95%)

Per-Agent Model Restrictions

The Agents tab lets you restrict which models specific agents can use.

Why Restrict Agents?

Cost control: Limit expensive model usage to specific agents
Compliance: Ensure sensitive agents only use approved models
Testing: Restrict test agents to cheaper models

Configuring Agent Models

Go to Agents Controls > Agents
Find the agent to configure
Click Configure Models
Select allowed models (or leave empty for org defaults)
Save changes

Service Accounts

Service accounts provide machine-to-machine authentication. See Identity & Access for details. In the context of Model Governance:

Link service accounts to specific agents
Track which service accounts consume LLM resources
Control model access per service account

Model Access Control

When a request arrives, the LLM Gateway validates the requested model against three sequential allowlists:

Org allowlist — Is the model in the organization’s allowed models? (if the list exists and has entries)
Agent allowlist — Is the model in the agent’s allowed models? (passed by agent-factory)
API key scopes — Is the model allowed by the API key’s scopes?

Each check is skipped if the corresponding list is empty or absent. If any check fails, the behavior depends on the quota policy:

Soft Downgrade: silently swaps to the default completion model
Hard Block (default): returns a 403 error with MODEL_NOT_ALLOWED

Carbon Footprint Tracking

Every LLM call includes an estimated environmental impact in the response:

{
  "usage": {
    "carbon": {
      "energy": { "value": 0.000012, "unit": "kWh" },
      "gwp": { "value": 0.0000057, "unit": "kgCO2eq" }
    }
  }
}

The calculation considers GPU energy per token, server overhead, datacenter PUE (Power Usage Effectiveness), and regional emission factors. Results include uncertainty margins (+/- 20%).

PUE Profile	Multiplier
Efficient	1.1
Average (default)	1.58
Inefficient	2.0

Carbon data is included in analytics events and powers the observability dashboards.

Supported Providers

The LLM Gateway abstracts multiple providers behind a unified API:

Provider	Models	Notes
OpenAI	GPT-5, GPT-4o, o3-mini, embeddings, DALL-E 3	Direct API
Azure OpenAI	GPT-5, GPT-4o, embeddings, Claude (via Azure AI)	Multiple resource configs
OpenAI-compatible	Gemini, DeepSeek, Mistral, Cerebras, OVH, Linagora	Via `openailike` provider type
Anthropic	Claude Sonnet 4.5, Claude 3.7 Sonnet, Claude 3.5 Sonnet	Native API
Google Vertex AI	Gemini 2.5/3, Imagen 4.0, text-embedding-005	Via model aliases + service account
AWS Bedrock	Claude, Titan, Cohere, Nova, Llama	Multiple region/credential sets

All providers are normalized to the OpenAI API format for chat completions and streaming (SSE).

Best Practices

Start Restrictive

Begin with a limited model list and expand based on need

Use Soft Downgrade

Prefer soft downgrade to maintain service during quota limits

Monitor Usage

Set alerts before hitting quota limits

Configure Failover

Ensure critical workflows have failover models

Common Scenarios

Cost Control
Compliance
High Availability

To minimize costs while maintaining quality:

Enable Soft Downgrade policy
Configure downgrade mapping:
- claude-3-opus → claude-3-sonnet
- gpt-4 → gpt-3.5-turbo
Set conservative monthly token limits
Use rule-based routing to prefer cheaper models for simple requests

Overview

Chat

Agent Creator

Knowledges

Builder

Governe

Insights (beta)

Overview

Models Configuration

Allowed Models

Default Models

Quota Policy

Downgrade Mapping

Failover Mapping

Model Routing

Routing Strategies

Rule-Based Routing

Usage Monitoring

Tracked Metrics

Understanding Quotas

Usage Display

Per-Agent Model Restrictions

Why Restrict Agents?

Configuring Agent Models

Service Accounts

Model Access Control

Carbon Footprint Tracking

Supported Providers

Best Practices

Start Restrictive

Use Soft Downgrade

Monitor Usage

Configure Failover

Common Scenarios

Next Steps

Capabilities

Observability

Overview

Chat

Agent Creator

Knowledges

Builder

Governe

Insights (beta)

Documentation Index

​Overview

​Models Configuration

​Allowed Models

​Default Models

​Quota Policy

​Downgrade Mapping

​Failover Mapping

​Model Routing

​Routing Strategies

​Rule-Based Routing

​Usage Monitoring

​Tracked Metrics

​Understanding Quotas

​Usage Display

​Per-Agent Model Restrictions

​Why Restrict Agents?

​Configuring Agent Models

​Service Accounts

​Model Access Control

​Carbon Footprint Tracking

​Supported Providers

​Best Practices

Start Restrictive

Use Soft Downgrade

Monitor Usage

Configure Failover

​Common Scenarios

​Next Steps

Capabilities

Observability

Overview

Models Configuration

Allowed Models

Default Models

Quota Policy

Downgrade Mapping

Failover Mapping

Model Routing

Routing Strategies

Rule-Based Routing

Usage Monitoring

Tracked Metrics

Understanding Quotas

Usage Display

Per-Agent Model Restrictions

Why Restrict Agents?

Configuring Agent Models

Service Accounts

Model Access Control

Carbon Footprint Tracking

Supported Providers

Best Practices

Common Scenarios

Next Steps