Working with LLMs

Embabel supports any LLM supported by Spring AI. In practice, this is just about any LLM.

Choosing an LLM

Embabel encourages you to think about LLM choice for every LLM invocation. The PromptRunner interface makes this easy. Because Embabel enables you to break agentic flows up into multiple action steps, each step can use a smaller, focused prompt with fewer tools. This means it may be able to use a smaller LLM.

Considerations:

Consider the complexity of the return type you expect from the LLM. This is typically a good proxy for determining required LLM quality. A small LLM is likely to struggle with a deeply nested return structure.
Consider the nature of the task. LLMs have different strengths; review any available documentation. You don’t necessarily need a huge, expensive model that is good at nearly everything, at the cost of your wallet and the environment.
Consider the sophistication of tool calling required. Simple tool calls are fine, but complex orchestration is another indicator you’ll need a strong LLM. (It may also be an indication that you should create a more sophisticated flow using Embabel GOAP.)
Consider trying a local LLM running under Ollama or Docker.

Tuning for Smaller and Local Models

A core goal of Embabel is to make agentic flows work well across the full range of LLMs, so you can choose the cheapest, smallest, or most private model that does the job — rather than always reaching for a frontier model. Smaller chat models behave differently from frontier models in ways that the framework can compensate for:

Silent failures after tool calls. Weaker open-weights models (e.g. gpt-oss-20b, some Qwen variants) sometimes return blank text with no further tool calls when they don’t know how to proceed. Without intervention the tool loop exits with empty content. Activate embabel.agent.platform.toolloop.empty-response.max-retries: 1 to feed a synthetic nudge back to the model and give it one more chance — see Empty-Response Handling.
Tool-name confusion. Smaller models more frequently call tools by approximate names. The default AutoCorrectionPolicy handles this by feeding back a “did you mean X?” suggestion; tune embabel.agent.platform.toolloop.tool-not-found.max-retries if your model needs more attempts.
Iteration headroom. Recovery costs LLM calls. If you enable retry policies, raise embabel.agent.platform.toolloop.max-iterations so a turn that needs an extra round trip doesn’t run out of budget.

These settings are off-by-default so existing deployments using strong models behave exactly as before. Turn them on per-deployment when the model you’ve picked benefits from them.

Advanced: Custom LLM Integration

Embabel’s tool loop is framework-agnostic, allowing you to integrate any LLM provider by implementing the LlmMessageSender interface. This is useful when:

You want to use an LLM provider not supported by Spring AI
You need custom request/response handling
You’re integrating with a proprietary or internal LLM service

The LlmMessageSender Interface

The core abstraction is the LlmMessageSender functional interface:

Java
Kotlin

@FunctionalInterface
public interface LlmMessageSender {
    LlmMessageResponse call(
        List<Message> messages,
        List<Tool> tools
    );
}

fun interface LlmMessageSender {
    fun call(
        messages: List<Message>,
        tools: List<Tool>,
    ): LlmMessageResponse
}

The implementation makes a single LLM inference call and returns the response. Importantly, it does not execute tools—it only returns any tool call requests from the LLM. Tool execution is handled by Embabel’s DefaultToolLoop.

embabel.agent.platform.toolloop.type=parallel

For full list of tool loop configuration parameters please refer to ToolLoopConfiguration.

Tool-Not-Found Recovery Policy

When the LLM calls a tool by a name that doesn’t exist in the available set, the behavior is controlled by ToolNotFoundPolicy.

Two built-in policies are provided:

AutoCorrectionPolicy (default) — feeds the error back to the LLM so it can self-correct. Uses case-insensitive fuzzy matching to suggest corrections for hallucinated tool names (e.g., ragbot_vectorSearch → suggests vectorSearch). When multiple candidates match, all are listed. Throws ToolNotFoundException after 3 consecutive failures.
ImmediateThrowPolicy — throws ToolNotFoundException immediately.

The system-wide default is AutoCorrectionPolicy, provided as a Spring bean with @ConditionalOnMissingBean. To override it globally, define your own ToolNotFoundPolicy bean.

For per-interaction control, use withToolNotFoundPolicy() on PromptRunner:

Java
Kotlin

promptRunner
    .withToolNotFoundPolicy(new AutoCorrectionPolicy(5))
    .creating(MyOutput.class)
    .create(messages);

promptRunner
    .withToolNotFoundPolicy(AutoCorrectionPolicy(maxRetries = 5))
    .creating(MyOutput::class.java)
    .create(messages)

Custom policies can be implemented by implementing the ToolNotFoundPolicy interface:

class MyEditDistancePolicy : ToolNotFoundPolicy {
    override fun handle(requestedName: String, availableTools: List<Tool>): ToolNotFoundAction {
        // Custom recovery logic, e.g. edit-distance matching
        ...
    }
}

Response Types

The LlmMessageResponse contains:

message: The LLM’s response as an Embabel Message
textContent: Text content from the response
usage: Optional token usage information

For responses that include tool calls, return an AssistantMessageWithToolCalls:

Java
Kotlin

public record ToolCall(
    String id,         // Unique identifier for the tool call
    String name,       // Name of the tool to invoke
    String arguments   // JSON arguments for the tool
) {}

data class ToolCall(
    val id: String,      // Unique identifier for the tool call
    val name: String,    // Name of the tool to invoke
    val arguments: String, // JSON arguments for the tool
)

Example: Custom LLM Provider

Here’s an example of implementing LlmMessageSender for a hypothetical HTTP-based LLM API:

Java
Kotlin

public class MyCustomLlmMessageSender implements LlmMessageSender {

    private final HttpClient httpClient;
    private final String apiKey;
    private final String model;

    public MyCustomLlmMessageSender(HttpClient httpClient, String apiKey, String model) {
        this.httpClient = httpClient;
        this.apiKey = apiKey;
        this.model = model;
    }

    @Override
    public LlmMessageResponse call(List<Message> messages, List<Tool> tools) {
        // Convert Embabel messages to your API's format
        List<Map<String, Object>> apiMessages = messages.stream()
            .map(message -> Map.<String, Object>of(
                "role", message.getRole().name().toLowerCase(),
                "content", message.getTextContent()
            ))
            .toList();

        // Convert tool definitions to your API's format
        List<Map<String, Object>> apiTools = tools.stream()
            .map(tool -> Map.<String, Object>of(
                "name", tool.getDefinition().getName(),
                "description", tool.getDefinition().getDescription(),
                "parameters", tool.getDefinition().getInputSchema().jsonSchema()
            ))
            .toList();

        // Make API request (using your preferred HTTP client)
        MyApiResponse responseBody = httpClient.post("https://api.my-llm.com/chat")
            .header("Authorization", "Bearer " + apiKey)
            .body(Map.of(
                "model", model,
                "messages", apiMessages,
                "tools", apiTools.isEmpty() ? null : apiTools
            ))
            .execute(MyApiResponse.class);

        // Check if LLM requested tool calls
        List<ToolCall> toolCalls = null;
        if (responseBody.getToolCalls() != null) {
            toolCalls = responseBody.getToolCalls().stream()
                .map(call -> new ToolCall(
                    call.getId(),
                    call.getFunction().getName(),
                    call.getFunction().getArguments()
                ))
                .toList();
        }

        Message embabelMessage;
        if (toolCalls == null || toolCalls.isEmpty()) {
            embabelMessage = new AssistantMessage(
                responseBody.getContent() != null ? responseBody.getContent() : ""
            );
        } else {
            embabelMessage = new AssistantMessageWithToolCalls(
                responseBody.getContent() != null ? responseBody.getContent() : "",
                toolCalls
            );
        }

        Usage usage = null;
        if (responseBody.getUsage() != null) {
            usage = new Usage(
                responseBody.getUsage().getPromptTokens(),
                responseBody.getUsage().getCompletionTokens()
            );
        }

        return new LlmMessageResponse(embabelMessage, responseBody.getContent(), usage);
    }
}

class MyCustomLlmMessageSender(
    private val httpClient: HttpClient,
    private val apiKey: String,
    private val model: String,
) : LlmMessageSender {

    override fun call(
        messages: List<Message>,
        tools: List<Tool>,
    ): LlmMessageResponse {
        // Convert Embabel messages to your API's format
        val apiMessages = messages.map { message ->
            mapOf(
                "role" to message.role.name.lowercase(),
                "content" to message.textContent
            )
        }

        // Convert tool definitions to your API's format
        val apiTools = tools.map { tool ->
            mapOf(
                "name" to tool.definition.name,
                "description" to tool.definition.description,
                "parameters" to tool.definition.inputSchema.jsonSchema()
            )
        }

        // Make API request
        val response = httpClient.post("https://api.my-llm.com/chat") {
            header("Authorization", "Bearer $apiKey")
            body = mapOf(
                "model" to model,
                "messages" to apiMessages,
                "tools" to apiTools.ifEmpty { null }
            )
        }

        // Parse response and convert to Embabel types
        val responseBody = response.body<MyApiResponse>()

        // Check if LLM requested tool calls
        val toolCalls = responseBody.toolCalls?.map { call ->
            ToolCall(
                id = call.id,
                name = call.function.name,
                arguments = call.function.arguments
            )
        }

        val embabelMessage = if (toolCalls.isNullOrEmpty()) {
            AssistantMessage(responseBody.content ?: "")
        } else {
            AssistantMessageWithToolCalls(
                content = responseBody.content ?: "",
                toolCalls = toolCalls
            )
        }

        return LlmMessageResponse(
            message = embabelMessage,
            textContent = responseBody.content ?: "",
            usage = responseBody.usage?.let { u ->
                Usage(
                    inputTokens = u.promptTokens,
                    outputTokens = u.completionTokens,
                )
            }
        )
    }
}

Creating an LlmService

To make your custom LLM available through Embabel’s ModelProvider, implement the LlmService interface:

Java
Kotlin

public class MyCustomLlmService implements LlmService<MyCustomLlmService> {

    private final String name;
    private final String provider;
    private final HttpClient httpClient;
    private final String apiKey;
    private final LocalDate knowledgeCutoffDate;
    private final List<PromptContributor> promptContributors;
    private final PricingModel pricingModel;

    public MyCustomLlmService(
            String name,
            String provider,
            HttpClient httpClient,
            String apiKey,
            LocalDate knowledgeCutoffDate,
            PricingModel pricingModel) {
        this.name = name;
        this.provider = provider;
        this.httpClient = httpClient;
        this.apiKey = apiKey;
        this.knowledgeCutoffDate = knowledgeCutoffDate;
        this.promptContributors = knowledgeCutoffDate != null
            ? List.of(new KnowledgeCutoffDate(knowledgeCutoffDate))
            : List.of();
        this.pricingModel = pricingModel;
    }

    @Override
    public String getName() { return name; }

    @Override
    public String getProvider() { return provider; }

    @Override
    public LocalDate getKnowledgeCutoffDate() { return knowledgeCutoffDate; }

    @Override
    public List<PromptContributor> getPromptContributors() { return promptContributors; }

    @Override
    public PricingModel getPricingModel() { return pricingModel; }

    @Override
    public LlmMessageSender createMessageSender(LlmOptions options) {
        return new MyCustomLlmMessageSender(
            httpClient,
            apiKey,
            options.getModel() != null ? options.getModel() : name
        );
    }

    @Override
    public MyCustomLlmService withKnowledgeCutoffDate(LocalDate date) {
        return new MyCustomLlmService(name, provider, httpClient, apiKey, date, pricingModel);
    }

    @Override
    public MyCustomLlmService withPromptContributor(PromptContributor promptContributor) {
        var newContributors = new ArrayList<>(promptContributors);
        newContributors.add(promptContributor);
        return new MyCustomLlmService(
            name, provider, httpClient, apiKey, knowledgeCutoffDate,
            newContributors, pricingModel
        );
    }
}

data class MyCustomLlmService(
    override val name: String,
    override val provider: String,
    private val httpClient: HttpClient,
    private val apiKey: String,
    override val knowledgeCutoffDate: LocalDate? = null,
    override val promptContributors: List<PromptContributor> =
        buildList { knowledgeCutoffDate?.let { add(KnowledgeCutoffDate(it)) } },
    override val pricingModel: PricingModel? = null,
) : LlmService<MyCustomLlmService> {

    override fun createMessageSender(options: LlmOptions): LlmMessageSender {
        return MyCustomLlmMessageSender(
            httpClient = httpClient,
            apiKey = apiKey,
            model = options.model ?: name,
        )
    }

    override fun withKnowledgeCutoffDate(date: LocalDate): MyCustomLlmService =
        copy(
            knowledgeCutoffDate = date,
            promptContributors = promptContributors + KnowledgeCutoffDate(date)
        )

    override fun withPromptContributor(promptContributor: PromptContributor): MyCustomLlmService =
        copy(promptContributors = promptContributors + promptContributor)
}

Then register it as a Spring bean:

Java
Kotlin

@Configuration
public class MyLlmConfiguration {

    @Bean
    public LlmService<?> myCustomLlm(
            HttpClient httpClient,
            @Value("${my-llm.api-key}") String apiKey) {
        return new MyCustomLlmService(
            "my-custom-model",
            "MyProvider",
            httpClient,
            apiKey,
            LocalDate.of(2024, 12, 1),
            null
        );
    }
}

@Configuration
class MyLlmConfiguration {

    @Bean
    fun myCustomLlm(
        httpClient: HttpClient,
        @Value("\${my-llm.api-key}") apiKey: String,
    ): LlmService<*> = MyCustomLlmService(
        name = "my-custom-model",
        provider = "MyProvider",
        httpClient = httpClient,
        apiKey = apiKey,
        knowledgeCutoffDate = LocalDate.of(2024, 12, 1),
    )
}

The bean will be automatically discovered and made available through the ModelProvider.

How Model Discovery and Selection Works

When your application starts, ConfigurableModelProvider collects all LlmService beans from the Spring application context. Your custom LLM is matched by the name property you set on your LlmService implementation.

By name: Use the name from your LlmService directly. This works with @LlmCall, ai.withLlm(), and AgenticTool.withLlm():

Java
Kotlin

// In a declarative action
@LlmCall(llm = "my-custom-model")
String myAction();

// In an imperative action
ai.withLlm("my-custom-model")
    .create("Tell me a joke", String.class);

// In a declarative action
@LlmCall(llm = "my-custom-model")
fun myAction(): String

// In an imperative action
ai.withLlm("my-custom-model")
    .create<String>("Tell me a joke")

By role: Map a role name to your model name in configuration, then reference it with the # prefix:

embabel:
  models:
    default-llm: my-custom-model  (1)
    llms:
      best: my-custom-model       (2)
      cheapest: my-small-model    (3)

Sets the default LLM used when no explicit model is specified
Maps the best role to your custom model
Maps the cheapest role to a different model

Then reference roles with #:

Java
Kotlin

// By role
@LlmCall(llm = "#best")
String myAction();

// Or programmatically
ai.withLlmByRole("best")
    .create("Tell me a joke", String.class);

// By role
@LlmCall(llm = "#best")
fun myAction(): String

// Or programmatically
ai.withLlmByRole("best")
    .create<String>("Tell me a joke")

Using Your Custom Implementation (Alternative)

If you need more control over the LLM operations layer itself, you can extend ToolLoopLlmOperations:

Java
Kotlin

public class MyCustomLlmOperations extends ToolLoopLlmOperations {

    private final HttpClient httpClient;
    private final String apiKey;

    public MyCustomLlmOperations(
            HttpClient httpClient,
            String apiKey,
            ModelProvider modelProvider,
            ToolDecorator toolDecorator,
            Validator validator) {
        super(modelProvider, toolDecorator, validator);
        this.httpClient = httpClient;
        this.apiKey = apiKey;
    }

    @Override
    protected LlmMessageSender createMessageSender(LlmService<?> llm, LlmOptions options) {
        return new MyCustomLlmMessageSender(
            httpClient,
            apiKey,
            options.getModel() != null ? options.getModel() : "default-model"
        );
    }
}

class MyCustomLlmOperations(
    private val httpClient: HttpClient,
    private val apiKey: String,
    modelProvider: ModelProvider,
    toolDecorator: ToolDecorator,
    validator: Validator,
) : ToolLoopLlmOperations(
    modelProvider = modelProvider,
    toolDecorator = toolDecorator,
    validator = validator,
) {
    override fun createMessageSender(
        llm: LlmService<*>,
        options: LlmOptions,
    ): LlmMessageSender {
        return MyCustomLlmMessageSender(
            httpClient = httpClient,
            apiKey = apiKey,
            model = options.model ?: "default-model",
        )
    }
}

The ToolLoopLlmOperations base class provides several extension points:

createMessageSender(): Create the LLM communication layer
createOutputConverter(): Parse LLM responses into typed objects
sanitizeStringOutput(): Clean up raw text responses
emitCallEvent(): Emit observability events

Key Implementation Notes

Tool calls are not executed by your sender. Just return the tool call requests—Embabel’s tool loop handles execution and continuation.
Handle both tool and non-tool responses. Return AssistantMessage for plain text, AssistantMessageWithToolCalls when the LLM wants to invoke tools.
Include usage information when available. This enables cost tracking and observability.
Message types matter. The tool loop expects specific message types:

UserMessage: User input
SystemMessage: System prompts
AssistantMessage: LLM text response
AssistantMessageWithToolCalls: LLM response with tool requests
ToolResultMessage: Result returned to LLM after tool execution

Advanced: Custom Embedding Service

Just as you can integrate a custom LLM, you can implement a custom embedding service that doesn’t depend on Spring AI. This is useful when:

You want to use an embedding provider not supported by Spring AI
You need custom pre/post-processing of embeddings
You’re integrating with a proprietary or internal embedding API

The EmbeddingService Interface

The EmbeddingService interface is framework-agnostic. Unlike SpringAiEmbeddingService, a custom implementation does not need to wrap a Spring AI EmbeddingModel:

Java
Kotlin

public interface EmbeddingService {
    float[] embed(String text);
    List<float[]> embed(List<String> texts);
    int getDimensions();
    String getName();
    String getProvider();
}

interface EmbeddingService : EmbeddingServiceMetadata, HasInfoString {
    fun embed(text: String): FloatArray
    fun embed(texts: List<String>): List<FloatArray>
    val dimensions: Int
}

Example: Custom Embedding Provider

Here’s an example of implementing EmbeddingService for an HTTP-based embedding API:

Java
Kotlin

public class MyCustomEmbeddingService implements EmbeddingService {

    private final String name;
    private final String provider;
    private final int dimensions;
    private final HttpClient httpClient;
    private final String apiKey;

    public MyCustomEmbeddingService(
            String name,
            String provider,
            int dimensions,
            HttpClient httpClient,
            String apiKey) {
        this.name = name;
        this.provider = provider;
        this.dimensions = dimensions;
        this.httpClient = httpClient;
        this.apiKey = apiKey;
    }

    @Override
    public String getName() { return name; }

    @Override
    public String getProvider() { return provider; }

    @Override
    public int getDimensions() { return dimensions; }

    @Override
    public float[] embed(String text) {
        return embed(List.of(text)).get(0);
    }

    @Override
    public List<float[]> embed(List<String> texts) {
        MyEmbeddingResponse response = httpClient
            .post("https://api.my-embeddings.com/embed")
            .header("Authorization", "Bearer " + apiKey)
            .body(Map.of("texts", texts, "model", name))
            .execute(MyEmbeddingResponse.class);
        return response.getEmbeddings();
    }
}

class MyCustomEmbeddingService(
    override val name: String,
    override val provider: String,
    override val dimensions: Int,
    private val httpClient: HttpClient,
    private val apiKey: String,
) : EmbeddingService {

    override fun embed(text: String): FloatArray =
        embed(listOf(text)).first()

    override fun embed(texts: List<String>): List<FloatArray> {
        val response = httpClient.post("https://api.my-embeddings.com/embed") {
            header("Authorization", "Bearer $apiKey")
            body = mapOf("texts" to texts, "model" to name)
        }
        return response.body<MyEmbeddingResponse>().embeddings
    }
}

@Configuration
public class MyEmbeddingConfiguration {

    @Bean
    public EmbeddingService myCustomEmbeddings(
            HttpClient httpClient,
            @Value("${my-embeddings.api-key}") String apiKey) {
        return new MyCustomEmbeddingService(
            "my-custom-embeddings",
            "MyProvider",
            384,
            httpClient,
            apiKey
        );
    }
}

@Configuration
class MyEmbeddingConfiguration {

    @Bean
    fun myCustomEmbeddings(
        httpClient: HttpClient,
        @Value("\${my-embeddings.api-key}") apiKey: String,
    ): EmbeddingService = MyCustomEmbeddingService(
        name = "my-custom-embeddings",
        provider = "MyProvider",
        dimensions = 384,
        httpClient = httpClient,
        apiKey = apiKey,
    )
}

Discovery and Selection

Custom embedding services follow the same discovery and selection pattern as LLMs (see How Model Discovery and Selection Works).

By name: Use ai.withEmbeddingService() with the name from your implementation:

Java
Kotlin

ai.withEmbeddingService("my-custom-embeddings")
    .embed("Hello world");

ai.withEmbeddingService("my-custom-embeddings")
    .embed("Hello world")

By role: Map a role name to your embedding service in configuration:

embabel:
  models:
    default-embedding-model: my-custom-embeddings  (1)
    embedding-services:
      cheapest: my-custom-embeddings               (2)

Sets the default embedding service
Maps the cheapest role to your custom embedding service

Advanced Caching with Anthropic

While many providers have implicit caching managed internally, Anthropic exposes public APIs for explicit prompt caching control. This allows you to optimize costs and latency for applications with long prompts, many tools, or extended conversations.

Motivation

Anthropic’s prompt caching feature provides significant benefits:

Cost savings: Cache reads cost 90% less than regular input tokens
Latency improvements: Cached content is processed faster
Ideal for: Long system prompts, extensive tool definitions, multi-turn conversations

Without caching, every API call processes the entire prompt from scratch. With caching, repeated content (system prompts, tools, conversation history) can be cached and reused across requests.

How It Works

Anthropic caches the trailing portion of your prompt context. The cache is identified by an exact match of the content hashcode. Any change to the cached portion invalidates the cache.

Key concepts:

Cache creation: First time content is seen, it’s written to cache with a 25% premium over regular input tokens (for 5-minute TTL)
Cache reads: Subsequent requests with matching content read from cache at 10% of regular input token cost
Cache TTL: 5 minutes (default) or 1 hour (premium, higher creation cost)
Minimum size: 1024 tokens for older models, 4096 tokens for Claude Sonnet 4.5 and newer.

Cache Strategies

Embabel provides several caching strategies through AnthropicCachingConfig:

System Prompt Caching

Cache the system prompt for reuse across multiple requests:

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setSystemPrompt(true);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(systemPrompt = true)

Tools Caching

Cache tool definitions when using many tools or tools with large schemas:

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setTools(true);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(tools = true)

System + Tools Caching

Combine both strategies:

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setSystemPrompt(true);
cachingConfig.setTools(true);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(
        systemPrompt = true,
        tools = true
    )

Conversation History Caching

Cache conversation history for long multi-turn conversations:

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setConversationHistory(true);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(conversationHistory = true)

Advanced Configuration

Message Type Minimum Content Length

Control which messages are eligible for caching based on their content length:

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setSystemPrompt(true);
cachingConfig.messageTypeMinContentLength(MessageRole.SYSTEM, 1024);
cachingConfig.messageTypeMinContentLength(MessageRole.USER, 512);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(
        AnthropicCachingConfig(systemPrompt = true)
            .messageTypeMinContentLength(MessageRole.SYSTEM, 1024)
            .messageTypeMinContentLength(MessageRole.USER, 512)
    )

Message Type TTL

Set cache TTL per message type (default is 5 minutes):

Java
Kotlin

AnthropicCachingConfig cachingConfig = new AnthropicCachingConfig();
cachingConfig.setSystemPrompt(true);
cachingConfig.messageTypeTtl(MessageRole.SYSTEM, AnthropicCacheTtl.ONE_HOUR);

LlmOptions options = LlmOptions.withDefaultLlm();
options = withAnthropicCaching(options, cachingConfig);

val options = LlmOptions.withDefaultLlm()
    .withAnthropicCaching(
        AnthropicCachingConfig(systemPrompt = true)
            .messageTypeTtl(MessageRole.SYSTEM, AnthropicCacheTtl.ONE_HOUR)
    )

Accessing Cache Metrics

Embabel provides extension methods to access Anthropic-specific cache metrics from the Usage object:

Java
Kotlin

import static com.embabel.agent.config.models.anthropic.AnthropicUsage.*;

AssistantMessage response = promptRunner.respond(messages);
Usage usage = response.getUsage();

// Check if cache was created or read
boolean cacheCreated = hasAnthropicCacheCreation(usage);
boolean cacheRead = hasAnthropicCacheRead(usage);

// Get token counts
Integer creationTokens = anthropicCacheCreationTokens(usage);
Integer readTokens = anthropicCacheReadTokens(usage);

// Get summary string for logging
String summary = anthropicCacheSummary(usage);
// Example output: "cache[creation=1061, read=0]"

val response = promptRunner.respond(messages)
val usage = response.usage

// Check if cache was created or read
val cacheCreated = usage.hasAnthropicCacheCreation()
val cacheRead = usage.hasAnthropicCacheRead()

// Get token counts
val creationTokens = usage.anthropicCacheCreationTokens()
val readTokens = usage.anthropicCacheReadTokens()

// Get summary string for logging
val summary = usage.anthropicCacheSummary()
// Example output: "cache[creation=1061, read=0]"

Best Practices

Cache long, stable content: System prompts and tool definitions that don’t change frequently are ideal candidates
Mind the minimum size: Content must meet the minimum token requirement (1024 or 4096 depending on model)
Monitor cache metrics: Use the cache extension methods to track cache hit rates and validate savings
Consider TTL vs cost: 1-hour TTL has higher creation cost but better for longer sessions
Test before deploying: Cache behavior can vary based on prompt structure and usage patterns

Reference

For complete details on Anthropic’s prompt caching, see: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching