Features
Caching
Smart caching for AI responses
Ultra AI offers advanced caching mechanisms to improve response times and reduce costs associated with AI API calls.
Note: Similarity caching requires at least one OpenAI key for embedding generation. You will incur charges for embedding generation.
Cache Configuration
You can configure caching in your API requests:
import OpenAI from "openai"
const openai = new OpenAI({
apiKey: "your-ultraai-api-key",
baseURL: "https://api.ultraai.app/v1",
})
const completion = await openai.chat.completions.create({
model: JSON.stringify({
models: ["openai:gpt-4o"],
cache: {
type: "similarity", // or "exact"
maxAge: 3600, // 1 hour
threshold: 0.8, // 80% similarity
},
}),
messages: [{ role: "user", content: "What is the capital of France?" }],
})
Cache Types
- Exact: Caches responses for exact matches of input
- Similarity: Uses semantic similarity to return cached responses for similar inputs
Cache Parameters
type
: “exact” or “similarity”maxAge
: Maximum age of cached results in secondsthreshold
: Similarity threshold for cache hits (0.0 - 1.0)