Tracking LLM Cost and Usage

Embabel emits an event for every LLM and embedding call your agent makes. Subscribe to those events to know, in real time, how much each call cost, which model handled it, and which agent process it belongs to.

The events

Two events are available:

LlmInvocationEvent — emitted once per LLM call.
EmbeddingInvocationEvent — emitted once per embedding call.

Each event exposes:

invocation.llmMetadata (or embeddingMetadata) — model name and provider
invocation.usage — token counts
invocation.cost() — computed cost for that call
interactionId — identifier of the originating interaction
agentProcess — the agent process that triggered the call (use agentProcess.id to group, agentProcess.agent.name to label)

Subscribing to cost events

Implement AgenticEventListener and react to the events you care about. The listener is registered like any other Embabel event listener.

Java
Kotlin

public class OrganizationCostTracker implements AgenticEventListener {

    private final ConcurrentMap<String, DoubleAdder> costPerAgent = new ConcurrentHashMap<>();

    @Override
    public void onProcessEvent(AgentProcessEvent event) {
        if (event instanceof LlmInvocationEvent llm) {
            costPerAgent
                .computeIfAbsent(llm.getAgentProcess().getAgent().getName(), k -> new DoubleAdder())
                .add(llm.getInvocation().cost());
        }
    }
}

class OrganizationCostTracker : AgenticEventListener {

    private val costPerAgent = ConcurrentHashMap<String, DoubleAdder>()

    override fun onProcessEvent(event: AgentProcessEvent) {
        if (event is LlmInvocationEvent) {
            costPerAgent
                .computeIfAbsent(event.agentProcess.agent.name) { DoubleAdder() }
                .add(event.invocation.cost())
        }
    }
}

The same pattern works for EmbeddingInvocationEvent.

Blocking spending: the Budget Guardrail pattern

Cost events fire after the call completes, so they cannot stop the call that just ran. What they can do is stop the next one.

The pattern combines two pieces you already know:

A listener that counts. Subscribe to LlmInvocationEvent and accumulate cost or tokens against the key you care about — agent process id, tenant, end user.
A guardrail that blocks. A UserInputGuardRail reads the counter before the next LLM call. If the budget is exceeded, the guardrail returns a CRITICAL validation error and the call never happens.

                       LLM call ───► LlmInvocationEvent ─┐
                                                          ▼
                                            counter (per agent / tenant / user)
                                                          │
   next call ──► UserInputGuardRail reads counter ────────┘
                                │
                       over budget? ──► CRITICAL ──► call blocked

The counter lives in your listener; the decision lives in your guardrail. Embabel wires both into the agent process for you. See Working with Guardrails for how to register a UserInputGuardRail and how CRITICAL validation errors stop execution.