Skip to main content

Prompt Auto Caching

Quentin avatar
Written by Quentin
Updated over 3 months ago

Understanding Prompt Caching

Prompt Caching allows users to make repeated API calls more efficiently by reusing context from recent prompts, resulting in a reduction in input token costs and faster response times.


While prompt caching features has been widely supported by most mainstream model providers, the detailed rules of caching varies between different providers.


For OpenAI models, the prompt caching feature is automatically enabled, and when cached prompt is hit, you will be charged for half of the price of input tokens. [Docs of OpenAI Prompt Caching](https://platform.openai.com/docs/guides/prompt-caching)
While for Anthropic, the prompt caching feature is not enabled by default, and users need to manually mark which prompts to be cached. While you can get up to 85% faster response times for cached prompts and potentially reduce costs by up to 90%, creating the initial cached prompt incurs a 25% higher cost. [Docs of Anthropic Prompt Caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)


For OpenAI: cached prefixes generally remain active for 5 to 10 minutes of inactivity. However, during off-peak periods, caches may persist for up to one hour.For Claude: the cache has a 5-minute lifetime, refreshed each time the cached content is used.


How ConsoleX AI Supports Prompt Caching

ConsoleX AI supports prompt auto caching, currently mainly for Claude models, to help you maximize cost savings and improve performance while saving you the trouble of manually configuring cached prompts.

To enable prompt auto caching, you can follow the steps below:

  1. Click the "Configuration" link on top right corner of the page to open the configuration side panel.

  2. Turn on the "Auto caching" switch to enable prompt auto caching.

When auto caching is enabled, and if you are using Anthropic models, ConsoleX AI will automatically cache the longest prompts and tool lists according to maximize cost savings.


Specifically, when you are using the shared model endpoints, you can view the the cached prompt tokens and cost in the credits usage page, which provides transparency and control over your usage. If you are using the private model endpoints, ConsoleX AI will also cache the prompts and tool lists by the same rules, but you need to track the cached prompt tokens and cost by yourself.


Best Practices of Using Auto Caching

  • Although for repetitive scenarios, prompt auto caching can save you a lot of costs, for scenarios with low cache hit rates, it may increase costs instead. So make sure to enable auto caching in scenarios where the cache hit rate is high.

  • For the model providers that auomatically enable prompt caching like OpenAI, you don't need to enable the auto caching feature in ConsoleX AI, and if the prompt cache is hit and you are using the shared OpenAI endpoints on ConsoleX AI, you will only be charged for half of the price of input tokens as well.

Did this answer your question?