Ultra AI offers advanced caching mechanisms to improve response times and reduce costs associated with AI API calls.

Note: Similarity caching requires at least one OpenAI key for embedding generation. You will incur charges for embedding generation.

Cache Configuration

You can configure caching in your API requests:

import OpenAI from "openai"

const openai = new OpenAI({
  apiKey: "your-ultraai-api-key",
  baseURL: "https://api.ultraai.app/v1",
})

const completion = await openai.chat.completions.create({
  model: JSON.stringify({
    models: ["openai:gpt-4o"],
    cache: {
      type: "similarity", // or "exact"
      maxAge: 3600, // 1 hour
      threshold: 0.8, // 80% similarity
    },
  }),
  messages: [{ role: "user", content: "What is the capital of France?" }],
})

Cache Types

  • Exact: Caches responses for exact matches of input
  • Similarity: Uses semantic similarity to return cached responses for similar inputs

Cache Parameters

  • type: “exact” or “similarity”
  • maxAge: Maximum age of cached results in seconds
  • threshold: Similarity threshold for cache hits (0.0 - 1.0)