March

March brings stronger enterprise controls: vault-backed credentials, gateway limits that match how teams plan spend, and guardrails that align with your existing security stack. Alongside those themes, we’ve shipped significant upgrades across the platform, gateway, observability, guardrails, and provider ecosystem, empowering teams with more robust, enterprise-ready infrastructure. See what’s new:

Summary

Area	Updates
Platform	Secret References; weekly rate and budget windows (rpw) and endpoint-scoped rate limits
Observability	GCS log storage via GCP WIF from AWS; analytics for archived workspaces and workspace slugs in filters
Guardrails	Zscaler AI Guard; Akto Agentic Security; Bedrock Guardrails `customHost`; required metadata key–value guardrails
Models and providers	DeepInfra; DeepSeek; Vertex metadata labels, enterprise web search, AWS–GCP WIF; Azure AI Foundry rerank; Bedrock batch embeddings

Platform

Secret References

Instead of entering keys directly in Portkey, use Secret References to point Portkey at credentials stored in your external vault (AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault). Map integrations and virtual keys with secret_mappings so Portkey fetches values at runtime.

This keeps sensitive material in infrastructure you already control and audit. See how to configure Secret References

Weekly and endpoint-scoped rate limits

You can now set budget and usage limits on weekly windows (rpw), so caps align with how teams plan and review spend week over week, not just minute-by-minute or monthly aggregates.

You can also scope limits by endpoint type, so different API surfaces (for example chat completions, embeddings, or admin-style routes) can carry different limits instead of one global rule across everything. Budget & rate limit policies

Observability

Log storage: GCP workload identity from AWS

When the gateway runs in AWS but you write logs to Google Cloud Storage, configure GCP_WIF_AUDIENCE and GCP_WIF_SERVICE_ACCOUNT_EMAIL so the gateway authenticates through GCP Workload Identity Federation (gcs_assume style flows), without long-lived GCP keys sitting in AWS. This keeps cross-cloud log delivery out of static secrets in config or images. See hybrid GCP deployment & gcs_assume log storage

Analytics for archived workspaces

Organization admins and owners can include archived workspaces in analytics graphs, groups, and summaries. Saved filters also accept workspace slugs alongside IDs. This keeps reporting and automation stable as teams wind down or rename workspaces. See analytics export

Guardrails

Zscaler AI Guard

Connect Zscaler AI Guard so Zscaler Detections Policies apply to LLM inputs and outputs through beforeRequestHook and afterRequestHook, with a required policyId and optional timeout (default 10000 ms). This reuses the same policy class your security org already operates. See how to connect Zscaler AI Guard

Akto Agentic Security

Add Akto as a guardrails partner to scan LLM inputs and outputs for threats such as prompt injection and sensitive data leakage, with hooks and a configurable timeout (default 5000 ms). This aligns agentic traffic with how you scan other production services. See how to add Akto

Bedrock Guardrails custom host

Set customHost on the Bedrock guardrail plugin so checks hit private or regional Bedrock-compatible endpoints, not only default public URLs. This keeps guardrail evaluation on private or regional endpoints your network and security policies already trust, instead of the default public Bedrock URLs. See how to configure Bedrock Guardrails

Required metadata key–value guardrails

You can configure guardrails to enforce required metadata on every request. If any required field is missing or invalid, the gateway blocks the request before it ever reaches the model. Learn more

Why customers choose Portkey!

Models and providers

DeepInfra
- Tool calling with tools, tool_choice, and parallel_tool_calls.
- Completions and embeddings endpoints alongside chat.
DeepSeek
- deepseek-chat: tools, tool_choice, and stream_options.
- deepseek-reasoner: maps reasoning_effort to thinking mode and returns reasoning_content in streams.
- Streaming usage honors stream_options for reporting.
Bedrock: Batch inference supports embeddings as well as chat completions, so you can run large embedding jobs with the same batch patterns you use for chat.
Vertex AI
- Portkey metadata maps to Vertex resource labels.
- Enterprise search grounding via enterpriseWebSearch / enterprise_web_search (cost attribution separate from standard Search grounding).
- AWS workloads reach Vertex with AWS–GCP WIF (GCP_WIF_AUDIENCE, GCP_WIF_SERVICE_ACCOUNT_EMAIL).
Azure AI Foundry rerank
- Cohere rerank models (e.g. cohere.Cohere-rerank-v4.0-pro).
- Gateway strips the cohere. prefix for the provider.

Bug fixes and improvements

OpenTelemetry: GenAI semantic spans follow semconv 1.40.0 for inference and embeddings, with OTEL exporter support for guardrail flows and custom resource attributes—making downstream APM and tracing easier to standardize on.
Header forwarding: the gateway no longer forwards x-portkey-forward-headxers, preventing header-forwarding loops and obscured provenance in chained setups.
Streaming usage: usage metadata is passed through for the Responses API and DeepSeek (and related routes) so streaming responses stay consistent for cost and usage reporting.
Together AI: cost logging for video generation requests.
Anthropic / OpenAI-style image routes: strict tool parameters and response_format handling for non–DALL·E image models where applicable.
Budget tracking: fixes to avoid double-counting and data loss in the budget pipeline (where applicable in this release window).

Resources

Which AI Model are companies actually Paying For in 2026?

Over 1 trillion AI tokens pass through Portkey every day, The Neon Show talks with Rohit Agarwal (Portkey) about which models enterprises actually pay for in production and what changes after the prototype ships.

Blog: LLM Deployment Pipeline Explained Step by Step
Blog: What is AI lifecycle management?
Blog: MCP vs Function Calling
Blog: 1 Trillion Tokens and the Death of the Chatbot

Community Contributors

Shoutout to Pinji Chen (Tsinghua University) for identifying an edge case with custom host and header forwarding;grateful for contributors who help us improve!

Monthly Summary

Enterprise Releases

Product Releases

SDK Releases

Summary

Platform

Secret References

Weekly and endpoint-scoped rate limits

Observability

Log storage: GCP workload identity from AWS

Analytics for archived workspaces

Guardrails

Zscaler AI Guard

Akto Agentic Security

Bedrock Guardrails custom host

Required metadata key–value guardrails

Why customers choose Portkey!

Models and providers

Bug fixes and improvements

Resources

Which AI Model are companies actually Paying For in 2026?

Community Contributors

Support

Need Help?

Join Us

Monthly Summary

Enterprise Releases

Product Releases

SDK Releases

​Summary

​Platform

​Secret References

​Weekly and endpoint-scoped rate limits

​Observability

​Log storage: GCP workload identity from AWS

​Analytics for archived workspaces

​Guardrails

​Zscaler AI Guard

​Akto Agentic Security

​Bedrock Guardrails custom host

​Required metadata key–value guardrails

​Why customers choose Portkey!

​Models and providers

​Bug fixes and improvements

​Resources

​Which AI Model are companies actually Paying For in 2026?

​Community Contributors

​Support

Need Help?

Join Us

Summary

Platform

Secret References

Weekly and endpoint-scoped rate limits

Observability

Log storage: GCP workload identity from AWS

Analytics for archived workspaces

Guardrails

Zscaler AI Guard

Akto Agentic Security

Bedrock Guardrails custom host

Required metadata key–value guardrails

Why customers choose Portkey!

Models and providers

Bug fixes and improvements

Resources

Which AI Model are companies actually Paying For in 2026?

Community Contributors

Support