Working with LLMs
Embabel supports any LLM supported by Spring AI. In practice, this is just about any LLM.
Choosing an LLM
Section titled “Choosing an LLM”Embabel encourages you to think about LLM choice for every LLM invocation.
The PromptRunner interface makes this easy.
Because Embabel enables you to break agentic flows up into multiple action steps, each step can use a smaller, focused prompt with fewer tools.
This means it may be able to use a smaller LLM.
Considerations:
- Consider the complexity of the return type you expect from the LLM. This is typically a good proxy for determining required LLM quality. A small LLM is likely to struggle with a deeply nested return structure.
- Consider the nature of the task. LLMs have different strengths; review any available documentation. You don’t necessarily need a huge, expensive model that is good at nearly everything, at the cost of your wallet and the environment.
- Consider the sophistication of tool calling required. Simple tool calls are fine, but complex orchestration is another indicator you’ll need a strong LLM. (It may also be an indication that you should create a more sophisticated flow using Embabel GOAP.)
- Consider trying a local LLM running under Ollama or Docker.
Tuning for Smaller and Local Models
Section titled “Tuning for Smaller and Local Models”A core goal of Embabel is to make agentic flows work well across the full range of LLMs, so you can choose the cheapest, smallest, or most private model that does the job — rather than always reaching for a frontier model. Smaller chat models behave differently from frontier models in ways that the framework can compensate for:
- Silent failures after tool calls. Weaker open-weights models (e.g.
gpt-oss-20b, some Qwen variants) sometimes return blank text with no further tool calls when they don’t know how to proceed. Without intervention the tool loop exits with empty content. Activateembabel.agent.platform.toolloop.empty-response.max-retries: 1to feed a synthetic nudge back to the model and give it one more chance — see Empty-Response Handling. - Tool-name confusion. Smaller models more frequently call tools by approximate names. The default
AutoCorrectionPolicyhandles this by feeding back a “did you mean X?” suggestion; tuneembabel.agent.platform.toolloop.tool-not-found.max-retriesif your model needs more attempts. - Iteration headroom. Recovery costs LLM calls. If you enable retry policies, raise
embabel.agent.platform.toolloop.max-iterationsso a turn that needs an extra round trip doesn’t run out of budget.
These settings are off-by-default so existing deployments using strong models behave exactly as before. Turn them on per-deployment when the model you’ve picked benefits from them.
Advanced: Custom LLM Integration
Section titled “Advanced: Custom LLM Integration”Embabel’s tool loop is framework-agnostic, allowing you to integrate any LLM provider by implementing the LlmMessageSender interface.
This is useful when:
- You want to use an LLM provider not supported by Spring AI
- You need custom request/response handling
- You’re integrating with a proprietary or internal LLM service
The LlmMessageSender Interface
Section titled “The LlmMessageSender Interface”The core abstraction is the LlmMessageSender functional interface:
@FunctionalInterfacepublic interface LlmMessageSender { LlmMessageResponse call( List<Message> messages, List<Tool> tools );}fun interface LlmMessageSender { fun call( messages: List<Message>, tools: List<Tool>, ): LlmMessageResponse}The implementation makes a single LLM inference call and returns the response.
Importantly, it does not execute tools—it only returns any tool call requests from the LLM.
Tool execution is handled by Embabel’s DefaultToolLoop.
embabel.agent.platform.toolloop.type=parallelFor full list of tool loop configuration parameters please refer to ToolLoopConfiguration.
Tool-Not-Found Recovery Policy
Section titled “Tool-Not-Found Recovery Policy”When the LLM calls a tool by a name that doesn’t exist in the available set, the behavior is controlled by ToolNotFoundPolicy.
Two built-in policies are provided:
AutoCorrectionPolicy(default) — feeds the error back to the LLM so it can self-correct. Uses case-insensitive fuzzy matching to suggest corrections for hallucinated tool names (e.g.,ragbot_vectorSearch→ suggestsvectorSearch). When multiple candidates match, all are listed. ThrowsToolNotFoundExceptionafter 3 consecutive failures.ImmediateThrowPolicy— throwsToolNotFoundExceptionimmediately.
The system-wide default is AutoCorrectionPolicy, provided as a Spring bean with @ConditionalOnMissingBean.
To override it globally, define your own ToolNotFoundPolicy bean.
For per-interaction control, use withToolNotFoundPolicy() on PromptRunner:
promptRunner .withToolNotFoundPolicy(new AutoCorrectionPolicy(5)) .creating(MyOutput.class) .create(messages);promptRunner .withToolNotFoundPolicy(AutoCorrectionPolicy(maxRetries = 5)) .creating(MyOutput::class.java) .create(messages)Custom policies can be implemented by implementing the ToolNotFoundPolicy interface:
class MyEditDistancePolicy : ToolNotFoundPolicy { override fun handle(requestedName: String, availableTools: List<Tool>): ToolNotFoundAction { // Custom recovery logic, e.g. edit-distance matching ... }}Response Types
Section titled “Response Types”The LlmMessageResponse contains:
message: The LLM’s response as an EmbabelMessagetextContent: Text content from the responseusage: Optional token usage information
For responses that include tool calls, return an AssistantMessageWithToolCalls:
public record ToolCall( String id, // Unique identifier for the tool call String name, // Name of the tool to invoke String arguments // JSON arguments for the tool) {}data class ToolCall( val id: String, // Unique identifier for the tool call val name: String, // Name of the tool to invoke val arguments: String, // JSON arguments for the tool)Example: Custom LLM Provider
Section titled “Example: Custom LLM Provider”Here’s an example of implementing LlmMessageSender for a hypothetical HTTP-based LLM API:
public class MyCustomLlmMessageSender implements LlmMessageSender {
private final HttpClient httpClient; private final String apiKey; private final String model;
public MyCustomLlmMessageSender(HttpClient httpClient, String apiKey, String model) { this.httpClient = httpClient; this.apiKey = apiKey; this.model = model; }
@Override public LlmMessageResponse call(List<Message> messages, List<Tool> tools) { // Convert Embabel messages to your API's format List<Map<String, Object>> apiMessages = messages.stream() .map(message -> Map.<String, Object>of( "role", message.getRole().name().toLowerCase(), "content", message.getTextContent() )) .toList();
// Convert tool definitions to your API's format List<Map<String, Object>> apiTools = tools.stream() .map(tool -> Map.<String, Object>of( "name", tool.getDefinition().getName(), "description", tool.getDefinition().getDescription(), "parameters", tool.getDefinition().getInputSchema().jsonSchema() )) .toList();
// Make API request (using your preferred HTTP client) MyApiResponse responseBody = httpClient.post("https://api.my-llm.com/chat") .header("Authorization", "Bearer " + apiKey) .body(Map.of( "model", model, "messages", apiMessages, "tools", apiTools.isEmpty() ? null : apiTools )) .execute(MyApiResponse.class);
// Check if LLM requested tool calls List<ToolCall> toolCalls = null; if (responseBody.getToolCalls() != null) { toolCalls = responseBody.getToolCalls().stream() .map(call -> new ToolCall( call.getId(), call.getFunction().getName(), call.getFunction().getArguments() )) .toList(); }
Message embabelMessage; if (toolCalls == null || toolCalls.isEmpty()) { embabelMessage = new AssistantMessage( responseBody.getContent() != null ? responseBody.getContent() : "" ); } else { embabelMessage = new AssistantMessageWithToolCalls( responseBody.getContent() != null ? responseBody.getContent() : "", toolCalls ); }
Usage usage = null; if (responseBody.getUsage() != null) { usage = new Usage( responseBody.getUsage().getPromptTokens(), responseBody.getUsage().getCompletionTokens() ); }
return new LlmMessageResponse(embabelMessage, responseBody.getContent(), usage); }}class MyCustomLlmMessageSender( private val httpClient: HttpClient, private val apiKey: String, private val model: String,) : LlmMessageSender {
override fun call( messages: List<Message>, tools: List<Tool>, ): LlmMessageResponse { // Convert Embabel messages to your API's format val apiMessages = messages.map { message -> mapOf( "role" to message.role.name.lowercase(), "content" to message.textContent ) }
// Convert tool definitions to your API's format val apiTools = tools.map { tool -> mapOf( "name" to tool.definition.name, "description" to tool.definition.description, "parameters" to tool.definition.inputSchema.jsonSchema() ) }
// Make API request val response = httpClient.post("https://api.my-llm.com/chat") { header("Authorization", "Bearer $apiKey") body = mapOf( "model" to model, "messages" to apiMessages, "tools" to apiTools.ifEmpty { null } ) }
// Parse response and convert to Embabel types val responseBody = response.body<MyApiResponse>()
// Check if LLM requested tool calls val toolCalls = responseBody.toolCalls?.map { call -> ToolCall( id = call.id, name = call.function.name, arguments = call.function.arguments ) }
val embabelMessage = if (toolCalls.isNullOrEmpty()) { AssistantMessage(responseBody.content ?: "") } else { AssistantMessageWithToolCalls( content = responseBody.content ?: "", toolCalls = toolCalls ) }
return LlmMessageResponse( message = embabelMessage, textContent = responseBody.content ?: "", usage = responseBody.usage?.let { u -> Usage( inputTokens = u.promptTokens, outputTokens = u.completionTokens, ) } ) }}Creating an LlmService
Section titled “Creating an LlmService”To make your custom LLM available through Embabel’s ModelProvider, implement the LlmService interface:
public class MyCustomLlmService implements LlmService<MyCustomLlmService> {
private final String name; private final String provider; private final HttpClient httpClient; private final String apiKey; private final LocalDate knowledgeCutoffDate; private final List<PromptContributor> promptContributors; private final PricingModel pricingModel;
public MyCustomLlmService( String name, String provider, HttpClient httpClient, String apiKey, LocalDate knowledgeCutoffDate, PricingModel pricingModel) { this.name = name; this.provider = provider; this.httpClient = httpClient; this.apiKey = apiKey; this.knowledgeCutoffDate = knowledgeCutoffDate; this.promptContributors = knowledgeCutoffDate != null ? List.of(new KnowledgeCutoffDate(knowledgeCutoffDate)) : List.of(); this.pricingModel = pricingModel; }
@Override public String getName() { return name; }
@Override public String getProvider() { return provider; }
@Override public LocalDate getKnowledgeCutoffDate() { return knowledgeCutoffDate; }
@Override public List<PromptContributor> getPromptContributors() { return promptContributors; }
@Override public PricingModel getPricingModel() { return pricingModel; }
@Override public LlmMessageSender createMessageSender(LlmOptions options) { return new MyCustomLlmMessageSender( httpClient, apiKey, options.getModel() != null ? options.getModel() : name ); }
@Override public MyCustomLlmService withKnowledgeCutoffDate(LocalDate date) { return new MyCustomLlmService(name, provider, httpClient, apiKey, date, pricingModel); }
@Override public MyCustomLlmService withPromptContributor(PromptContributor promptContributor) { var newContributors = new ArrayList<>(promptContributors); newContributors.add(promptContributor); return new MyCustomLlmService( name, provider, httpClient, apiKey, knowledgeCutoffDate, newContributors, pricingModel ); }}data class MyCustomLlmService( override val name: String, override val provider: String, private val httpClient: HttpClient, private val apiKey: String, override val knowledgeCutoffDate: LocalDate? = null, override val promptContributors: List<PromptContributor> = buildList { knowledgeCutoffDate?.let { add(KnowledgeCutoffDate(it)) } }, override val pricingModel: PricingModel? = null,) : LlmService<MyCustomLlmService> {
override fun createMessageSender(options: LlmOptions): LlmMessageSender { return MyCustomLlmMessageSender( httpClient = httpClient, apiKey = apiKey, model = options.model ?: name, ) }
override fun withKnowledgeCutoffDate(date: LocalDate): MyCustomLlmService = copy( knowledgeCutoffDate = date, promptContributors = promptContributors + KnowledgeCutoffDate(date) )
override fun withPromptContributor(promptContributor: PromptContributor): MyCustomLlmService = copy(promptContributors = promptContributors + promptContributor)}Then register it as a Spring bean:
@Configurationpublic class MyLlmConfiguration {
@Bean public LlmService<?> myCustomLlm( HttpClient httpClient, @Value("${my-llm.api-key}") String apiKey) { return new MyCustomLlmService( "my-custom-model", "MyProvider", httpClient, apiKey, LocalDate.of(2024, 12, 1), null ); }}@Configurationclass MyLlmConfiguration {
@Bean fun myCustomLlm( httpClient: HttpClient, @Value("\${my-llm.api-key}") apiKey: String, ): LlmService<*> = MyCustomLlmService( name = "my-custom-model", provider = "MyProvider", httpClient = httpClient, apiKey = apiKey, knowledgeCutoffDate = LocalDate.of(2024, 12, 1), )}The bean will be automatically discovered and made available through the ModelProvider.
How Model Discovery and Selection Works
Section titled “How Model Discovery and Selection Works”When your application starts, ConfigurableModelProvider collects all LlmService beans from the Spring application context.
Your custom LLM is matched by the name property you set on your LlmService implementation.
By name: Use the name from your LlmService directly.
This works with @LlmCall, ai.withLlm(), and AgenticTool.withLlm():
// In a declarative action@LlmCall(llm = "my-custom-model")String myAction();
// In an imperative actionai.withLlm("my-custom-model") .create("Tell me a joke", String.class);// In a declarative action@LlmCall(llm = "my-custom-model")fun myAction(): String
// In an imperative actionai.withLlm("my-custom-model") .create<String>("Tell me a joke")By role: Map a role name to your model name in configuration, then reference it with the # prefix:
embabel: models: default-llm: my-custom-model (1) llms: best: my-custom-model (2) cheapest: my-small-model (3)- Sets the default LLM used when no explicit model is specified
- Maps the
bestrole to your custom model - Maps the
cheapestrole to a different model
Then reference roles with #:
// By role@LlmCall(llm = "#best")String myAction();
// Or programmaticallyai.withLlmByRole("best") .create("Tell me a joke", String.class);// By role@LlmCall(llm = "#best")fun myAction(): String
// Or programmaticallyai.withLlmByRole("best") .create<String>("Tell me a joke")Using Your Custom Implementation (Alternative)
Section titled “Using Your Custom Implementation (Alternative)”If you need more control over the LLM operations layer itself, you can extend ToolLoopLlmOperations:
public class MyCustomLlmOperations extends ToolLoopLlmOperations {
private final HttpClient httpClient; private final String apiKey;
public MyCustomLlmOperations( HttpClient httpClient, String apiKey, ModelProvider modelProvider, ToolDecorator toolDecorator, Validator validator) { super(modelProvider, toolDecorator, validator); this.httpClient = httpClient; this.apiKey = apiKey; }
@Override protected LlmMessageSender createMessageSender(LlmService<?> llm, LlmOptions options) { return new MyCustomLlmMessageSender( httpClient, apiKey, options.getModel() != null ? options.getModel() : "default-model" ); }}class MyCustomLlmOperations( private val httpClient: HttpClient, private val apiKey: String, modelProvider: ModelProvider, toolDecorator: ToolDecorator, validator: Validator,) : ToolLoopLlmOperations( modelProvider = modelProvider, toolDecorator = toolDecorator, validator = validator,) { override fun createMessageSender( llm: LlmService<*>, options: LlmOptions, ): LlmMessageSender { return MyCustomLlmMessageSender( httpClient = httpClient, apiKey = apiKey, model = options.model ?: "default-model", ) }}The ToolLoopLlmOperations base class provides several extension points:
createMessageSender(): Create the LLM communication layercreateOutputConverter(): Parse LLM responses into typed objectssanitizeStringOutput(): Clean up raw text responsesemitCallEvent(): Emit observability events
Key Implementation Notes
Section titled “Key Implementation Notes”- Tool calls are not executed by your sender. Just return the tool call requests—Embabel’s tool loop handles execution and continuation.
- Handle both tool and non-tool responses. Return
AssistantMessagefor plain text,AssistantMessageWithToolCallswhen the LLM wants to invoke tools. - Include usage information when available. This enables cost tracking and observability.
- Message types matter. The tool loop expects specific message types:
UserMessage: User inputSystemMessage: System promptsAssistantMessage: LLM text responseAssistantMessageWithToolCalls: LLM response with tool requestsToolResultMessage: Result returned to LLM after tool execution
Advanced: Custom Embedding Service
Section titled “Advanced: Custom Embedding Service”Just as you can integrate a custom LLM, you can implement a custom embedding service that doesn’t depend on Spring AI. This is useful when:
- You want to use an embedding provider not supported by Spring AI
- You need custom pre/post-processing of embeddings
- You’re integrating with a proprietary or internal embedding API
The EmbeddingService Interface
Section titled “The EmbeddingService Interface”The EmbeddingService interface is framework-agnostic.
Unlike SpringAiEmbeddingService, a custom implementation does not need to wrap a Spring AI EmbeddingModel:
public interface EmbeddingService { float[] embed(String text); List<float[]> embed(List<String> texts); int getDimensions(); String getName(); String getProvider();}interface EmbeddingService : EmbeddingServiceMetadata, HasInfoString { fun embed(text: String): FloatArray fun embed(texts: List<String>): List<FloatArray> val dimensions: Int}Example: Custom Embedding Provider
Section titled “Example: Custom Embedding Provider”Here’s an example of implementing EmbeddingService for an HTTP-based embedding API:
public class MyCustomEmbeddingService implements EmbeddingService {
private final String name; private final String provider; private final int dimensions; private final HttpClient httpClient; private final String apiKey;
public MyCustomEmbeddingService( String name, String provider, int dimensions, HttpClient httpClient, String apiKey) { this.name = name; this.provider = provider; this.dimensions = dimensions; this.httpClient = httpClient; this.apiKey = apiKey; }
@Override public String getName() { return name; }
@Override public String getProvider() { return provider; }
@Override public int getDimensions() { return dimensions; }
@Override public float[] embed(String text) { return embed(List.of(text)).get(0); }
@Override public List<float[]> embed(List<String> texts) { MyEmbeddingResponse response = httpClient .post("https://api.my-embeddings.com/embed") .header("Authorization", "Bearer " + apiKey) .body(Map.of("texts", texts, "model", name)) .execute(MyEmbeddingResponse.class); return response.getEmbeddings(); }}class MyCustomEmbeddingService( override val name: String, override val provider: String, override val dimensions: Int, private val httpClient: HttpClient, private val apiKey: String,) : EmbeddingService {
override fun embed(text: String): FloatArray = embed(listOf(text)).first()
override fun embed(texts: List<String>): List<FloatArray> { val response = httpClient.post("https://api.my-embeddings.com/embed") { header("Authorization", "Bearer $apiKey") body = mapOf("texts" to texts, "model" to name) } return response.body<MyEmbeddingResponse>().embeddings }}Registering as a Spring Bean
Section titled “Registering as a Spring Bean”Register your custom embedding service as a Spring bean and it will be automatically discovered:
@Configurationpublic class MyEmbeddingConfiguration {
@Bean public EmbeddingService myCustomEmbeddings( HttpClient httpClient, @Value("${my-embeddings.api-key}") String apiKey) { return new MyCustomEmbeddingService( "my-custom-embeddings", "MyProvider", 384, httpClient, apiKey ); }}@Configurationclass MyEmbeddingConfiguration {
@Bean fun myCustomEmbeddings( httpClient: HttpClient, @Value("\${my-embeddings.api-key}") apiKey: String, ): EmbeddingService = MyCustomEmbeddingService( name = "my-custom-embeddings", provider = "MyProvider", dimensions = 384, httpClient = httpClient, apiKey = apiKey, )}Discovery and Selection
Section titled “Discovery and Selection”Custom embedding services follow the same discovery and selection pattern as LLMs (see How Model Discovery and Selection Works).
By name: Use ai.withEmbeddingService() with the name from your implementation:
ai.withEmbeddingService("my-custom-embeddings") .embed("Hello world");ai.withEmbeddingService("my-custom-embeddings") .embed("Hello world")By role: Map a role name to your embedding service in configuration:
embabel: models: default-embedding-model: my-custom-embeddings (1) embedding-services: cheapest: my-custom-embeddings (2)- Sets the default embedding service
- Maps the
cheapestrole to your custom embedding service
Advanced Caching with Anthropic
Section titled “Advanced Caching with Anthropic”While many providers have implicit caching managed internally, Anthropic exposes public APIs for explicit prompt caching control. This allows you to optimize costs and latency for applications with long prompts, many tools, or extended conversations.
Motivation
Section titled “Motivation”Anthropic’s prompt caching feature provides significant benefits:
- Cost savings: Cache reads cost 90% less than regular input tokens
- Latency improvements: Cached content is processed faster
- Ideal for: Long system prompts, extensive tool definitions, multi-turn conversations
Without caching, every API call processes the entire prompt from scratch. With caching, repeated content (system prompts, tools, conversation history) can be cached and reused across requests.
How It Works
Section titled “How It Works”Anthropic caches the trailing portion of your prompt context. The cache is identified by an exact match of the content hashcode. Any change to the cached portion invalidates the cache.
Key concepts:
- Cache creation: First time content is seen, it’s written to cache with a 25% premium over regular input tokens (for 5-minute TTL)
- Cache reads: Subsequent requests with matching content read from cache at 10% of regular input token cost
- Cache TTL: 5 minutes (default) or 1 hour (premium, higher creation cost)
- Minimum size: 1024 tokens for older models, 4096 tokens for Claude Sonnet 4.5 and newer.
Cache Strategies
Section titled “Cache Strategies”Embabel provides several caching strategies through AnthropicCachingConfig:
System Prompt Caching
Cache the system prompt for reuse across multiple requests:
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setSystemPrompt(true);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching(systemPrompt = true)Tools Caching
Cache tool definitions when using many tools or tools with large schemas:
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setTools(true);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching(tools = true)System + Tools Caching
Combine both strategies:
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setSystemPrompt(true);cachingConfig.setTools(true);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching( systemPrompt = true, tools = true )Conversation History Caching
Cache conversation history for long multi-turn conversations:
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setConversationHistory(true);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching(conversationHistory = true)Advanced Configuration
Section titled “Advanced Configuration”Message Type Minimum Content Length
Control which messages are eligible for caching based on their content length:
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setSystemPrompt(true);cachingConfig.messageTypeMinContentLength(MessageRole.SYSTEM, 1024);cachingConfig.messageTypeMinContentLength(MessageRole.USER, 512);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching( AnthropicCachingConfig(systemPrompt = true) .messageTypeMinContentLength(MessageRole.SYSTEM, 1024) .messageTypeMinContentLength(MessageRole.USER, 512) )Message Type TTL
Set cache TTL per message type (default is 5 minutes):
AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();cachingConfig.setSystemPrompt(true);cachingConfig.messageTypeTtl(MessageRole.SYSTEM, AnthropicCacheTtl.ONE_HOUR);
LlmOptions options = LlmOptions.withDefaultLlm();options = withAnthropicCaching(options, cachingConfig);val options = LlmOptions.withDefaultLlm() .withAnthropicCaching( AnthropicCachingConfig(systemPrompt = true) .messageTypeTtl(MessageRole.SYSTEM, AnthropicCacheTtl.ONE_HOUR) )Accessing Cache Metrics
Section titled “Accessing Cache Metrics”Embabel provides extension methods to access Anthropic-specific cache metrics from the Usage object:
import static com.embabel.agent.config.models.anthropic.AnthropicUsage.*;
AssistantMessage response = promptRunner.respond(messages);Usage usage = response.getUsage();
// Check if cache was created or readboolean cacheCreated = hasAnthropicCacheCreation(usage);boolean cacheRead = hasAnthropicCacheRead(usage);
// Get token countsInteger creationTokens = anthropicCacheCreationTokens(usage);Integer readTokens = anthropicCacheReadTokens(usage);
// Get summary string for loggingString summary = anthropicCacheSummary(usage);// Example output: "cache[creation=1061, read=0]"val response = promptRunner.respond(messages)val usage = response.usage
// Check if cache was created or readval cacheCreated = usage.hasAnthropicCacheCreation()val cacheRead = usage.hasAnthropicCacheRead()
// Get token countsval creationTokens = usage.anthropicCacheCreationTokens()val readTokens = usage.anthropicCacheReadTokens()
// Get summary string for loggingval summary = usage.anthropicCacheSummary()// Example output: "cache[creation=1061, read=0]"Best Practices
Section titled “Best Practices”- Cache long, stable content: System prompts and tool definitions that don’t change frequently are ideal candidates
- Mind the minimum size: Content must meet the minimum token requirement (1024 or 4096 depending on model)
- Monitor cache metrics: Use the cache extension methods to track cache hit rates and validate savings
- Consider TTL vs cost: 1-hour TTL has higher creation cost but better for longer sessions
- Test before deploying: Cache behavior can vary based on prompt structure and usage patterns
Reference
Section titled “Reference”For complete details on Anthropic’s prompt caching, see: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching