Prompt Caching

API Support
Prompt Templates Support
Cache TTL Options
Seeing Cache Results in Portkey

Prompt caching on Anthropic lets you cache individual messages in your request for repeat use. With caching, you can free up your tokens to include more context in your prompt, and also deliver responses significantly faster and cheaper. You can use this feature on our OpenAI-compliant universal API as well as with our prompt templates.

API Support

Just set the cache_control param in your respective message body:

import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
    provider:"@PROVIDER"
})

const chatCompletion = await portkey.chat.completions.create({
    messages: [
        { "role": 'system', "content": [
            {
                "type":"text","text":"You are a helpful assistant"
            },
            {
                "type":"text","text":"<TEXT_TO_CACHE>",
                "cache_control": {"type": "ephemeral"}
            }
        ]},
        { "role": 'user', "content": 'Summarize the above story for me in 20 words' }
    ],
    model: 'claude-3-5-sonnet-20240620',
    max_tokens: 250 // Required field for Anthropic
});

console.log(chatCompletion.choices[0].message.content);

Prompt Templates Support

Set any message in your prompt template to be cached by just toggling the Cache Control setting in the UI:

Cache TTL Options

By default, the cache has a 5-minute lifetime that refreshes each time cached content is used. You can optionally specify a 1-hour TTL by adding the ttl field to cache_control:

{
  "cache_control": { "type": "ephemeral", "ttl": "1h" }
}

TTL	Write Cost	Best For
`5m` (default)	1.25x base input price	Prompts used more frequently than every 5 minutes
`1h`	2x base input price	Agentic workflows, long conversations where follow-ups may exceed 5 minutes

Cache reads cost 0.1x the base input token price regardless of TTL.

The message you are caching needs to cross minimum length to enable this feature (1024 tokens for Claude 3.5 Sonnet and Claude 3 Opus, 2048 tokens for Claude 3 Haiku)
You can mix both TTLs in the same request, but 1-hour entries must appear before 5-minute entries
Up to 4 cache breakpoints per request

For more, refer to Anthropic’s prompt caching documentation here.

Seeing Cache Results in Portkey

Portkey automatically calculates the correct pricing for your prompt caching requests & responses based on Anthropic’s calculations here:

In the individual log for any request, you can also see the exact status of your request and verify if it was cached, or delivered from cache with two usage parameters:

cache_creation_input_tokens: Number of tokens written to the cache when creating a new entry.
cache_read_input_tokens: Number of tokens retrieved from the cache for this request.

Understanding Token Counts with CachingPortkey normalizes Anthropic’s response to the OpenAI format. In this format, prompt_tokens includes the cached tokens:

prompt_tokens = inputTokens + cache_read_input_tokens + cache_creation_input_tokens

This differs from Anthropic’s native format where inputTokens excludes cached tokens. Portkey’s pricing calculation accounts for this by:

Subtracting cached tokens from prompt_tokens to get the base input token count
Applying the standard input token rate to base tokens
Applying the discounted cache read rate to cache_read_input_tokens
Applying the cache write rate to cache_creation_input_tokens

This ensures accurate cost calculation even though the token format is normalized.

Last modified on March 31, 2026

Structured Outputs Computer use tool

⌘I

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

API Support

Prompt Templates Support

Cache TTL Options

Seeing Cache Results in Portkey

Ecosystem

LLM Integrations

Cloud Platforms

Guardrails

Plugins

Vector Databases

Agents

AI Apps

Libraries

Tracing Providers

MCP Clients

MCP Servers

​API Support

​Prompt Templates Support

​Cache TTL Options

​Seeing Cache Results in Portkey

API Support

Prompt Templates Support

Cache TTL Options

Seeing Cache Results in Portkey