Caching

Ultra AI offers advanced caching mechanisms to improve response times and reduce costs associated with AI API calls.

Note: Similarity caching requires at least one OpenAI key for embedding generation. You will incur charges for embedding generation.

Cache Configuration

You can configure caching in your API requests:

import OpenAI from "openai"

const openai = new OpenAI({
  apiKey: "your-ultraai-api-key",
  baseURL: "https://api.ultraai.app/v1",
})

const completion = await openai.chat.completions.create({
  model: JSON.stringify({
    models: ["openai:gpt-4o"],
    cache: {
      type: "similarity", // or "exact"
      maxAge: 3600, // 1 hour
      threshold: 0.8, // 80% similarity
    },
  }),
  messages: [{ role: "user", content: "What is the capital of France?" }],
})

Cache Types

Exact: Caches responses for exact matches of input
Similarity: Uses semantic similarity to return cached responses for similar inputs

Cache Parameters

type: “exact” or “similarity”
maxAge: Maximum age of cached results in seconds
threshold: Similarity threshold for cache hits (0.0 - 1.0)

Get started

Features

References

Cache Configuration

Cache Types

Cache Parameters

Get started

Features

References

​Cache Configuration

​Cache Types

​Cache Parameters

Cache Configuration

Cache Types

Cache Parameters