Skip to main content
March brings stronger enterprise controls: vault-backed credentials, gateway limits that match how teams plan spend, and guardrails that align with your existing security stack. Alongside those themes, we’ve shipped significant upgrades across the platform, gateway, observability, guardrails, and provider ecosystem, empowering teams with more robust, enterprise-ready infrastructure. See what’s new:

Summary

AreaUpdates
PlatformSecret References; weekly rate and budget windows (rpw) and endpoint-scoped rate limits
ObservabilityGCS log storage via GCP WIF from AWS; analytics for archived workspaces and workspace slugs in filters
GuardrailsZscaler AI Guard; Akto Agentic Security; Bedrock Guardrails customHost; required metadata key–value guardrails
Models and providersDeepInfra; DeepSeek; Vertex metadata labels, enterprise web search, AWS–GCP WIF; Azure AI Foundry rerank; Bedrock batch embeddings

Platform

Secret References

Instead of entering keys directly in Portkey, use Secret References to point Portkey at credentials stored in your external vault (AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault). Map integrations and virtual keys with secret_mappings so Portkey fetches values at runtime.
Creating Secret References
This keeps sensitive material in infrastructure you already control and audit. See how to configure Secret References

Weekly and endpoint-scoped rate limits

You can now set budget and usage limits on weekly windows (rpw), so caps align with how teams plan and review spend week over week, not just minute-by-minute or monthly aggregates.
Weekly policies
You can also scope limits by endpoint type, so different API surfaces (for example chat completions, embeddings, or admin-style routes) can carry different limits instead of one global rule across everything. Budget & rate limit policies

Observability

Log storage: GCP workload identity from AWS

When the gateway runs in AWS but you write logs to Google Cloud Storage, configure GCP_WIF_AUDIENCE and GCP_WIF_SERVICE_ACCOUNT_EMAIL so the gateway authenticates through GCP Workload Identity Federation (gcs_assume style flows), without long-lived GCP keys sitting in AWS. This keeps cross-cloud log delivery out of static secrets in config or images. See hybrid GCP deployment & gcs_assume log storage

Analytics for archived workspaces

Organization admins and owners can include archived workspaces in analytics graphs, groups, and summaries. Saved filters also accept workspace slugs alongside IDs. This keeps reporting and automation stable as teams wind down or rename workspaces. See analytics export

Guardrails

Zscaler AI Guard

Connect Zscaler AI Guard so Zscaler Detections Policies apply to LLM inputs and outputs through beforeRequestHook and afterRequestHook, with a required policyId and optional timeout (default 10000 ms). This reuses the same policy class your security org already operates. See how to connect Zscaler AI Guard

Akto Agentic Security

Add Akto as a guardrails partner to scan LLM inputs and outputs for threats such as prompt injection and sensitive data leakage, with hooks and a configurable timeout (default 5000 ms). This aligns agentic traffic with how you scan other production services. See how to add Akto

Bedrock Guardrails custom host

Set customHost on the Bedrock guardrail plugin so checks hit private or regional Bedrock-compatible endpoints, not only default public URLs. This keeps guardrail evaluation on private or regional endpoints your network and security policies already trust, instead of the default public Bedrock URLs. See how to configure Bedrock Guardrails

Required metadata key–value guardrails

You can configure guardrails to enforce required metadata on every request. If any required field is missing or invalid, the gateway blocks the request before it ever reaches the model. Learn more

Why customers choose Portkey!

Weekly policies

Models and providers

  • DeepInfra
    • Tool calling with tools, tool_choice, and parallel_tool_calls.
    • Completions and embeddings endpoints alongside chat.
  • DeepSeek
    • deepseek-chat: tools, tool_choice, and stream_options.
    • deepseek-reasoner: maps reasoning_effort to thinking mode and returns reasoning_content in streams.
    • Streaming usage honors stream_options for reporting.
  • Bedrock: Batch inference supports embeddings as well as chat completions, so you can run large embedding jobs with the same batch patterns you use for chat.
  • Vertex AI
    • Portkey metadata maps to Vertex resource labels.
    • Enterprise search grounding via enterpriseWebSearch / enterprise_web_search (cost attribution separate from standard Search grounding).
    • AWS workloads reach Vertex with AWS–GCP WIF (GCP_WIF_AUDIENCE, GCP_WIF_SERVICE_ACCOUNT_EMAIL).
  • Azure AI Foundry rerank
    • Cohere rerank models (e.g. cohere.Cohere-rerank-v4.0-pro).
    • Gateway strips the cohere. prefix for the provider.

Bug fixes and improvements

  • OpenTelemetry: GenAI semantic spans follow semconv 1.40.0 for inference and embeddings, with OTEL exporter support for guardrail flows and custom resource attributes—making downstream APM and tracing easier to standardize on.
  • Header forwarding: the gateway no longer forwards x-portkey-forward-headxers, preventing header-forwarding loops and obscured provenance in chained setups.
  • Streaming usage: usage metadata is passed through for the Responses API and DeepSeek (and related routes) so streaming responses stay consistent for cost and usage reporting.
  • Together AI: cost logging for video generation requests.
  • Anthropic / OpenAI-style image routes: strict tool parameters and response_format handling for non–DALL·E image models where applicable.
  • Budget tracking: fixes to avoid double-counting and data loss in the budget pipeline (where applicable in this release window).

Resources

Which AI Model are companies actually Paying For in 2026?

Over 1 trillion AI tokens pass through Portkey every day, The Neon Show talks with Rohit Agarwal (Portkey) about which models enterprises actually pay for in production and what changes after the prototype ships.

Community Contributors

Shoutout to Pinji Chen (Tsinghua University) for identifying an edge case with custom host and header forwarding;grateful for contributors who help us improve!

Support

Need Help?

Open an issue on GitHub

Join Us

Get support in our Discord
Last modified on March 31, 2026