Skip to content

Tracking LLM Cost and Usage

Embabel emits an event for every LLM and embedding call your agent makes. Subscribe to those events to know, in real time, how much each call cost, which model handled it, and which agent process it belongs to.

Two events are available:

  • LlmInvocationEvent — emitted once per LLM call.
  • EmbeddingInvocationEvent — emitted once per embedding call.

Each event exposes:

  • invocation.llmMetadata (or embeddingMetadata) — model name and provider
  • invocation.usage — token counts
  • invocation.cost() — computed cost for that call
  • interactionId — identifier of the originating interaction
  • agentProcess — the agent process that triggered the call (use agentProcess.id to group, agentProcess.agent.name to label)

Implement AgenticEventListener and react to the events you care about. The listener is registered like any other Embabel event listener.

public class OrganizationCostTracker implements AgenticEventListener {
private final ConcurrentMap<String, DoubleAdder> costPerAgent = new ConcurrentHashMap<>();
@Override
public void onProcessEvent(AgentProcessEvent event) {
if (event instanceof LlmInvocationEvent llm) {
costPerAgent
.computeIfAbsent(llm.getAgentProcess().getAgent().getName(), k -> new DoubleAdder())
.add(llm.getInvocation().cost());
}
}
}

The same pattern works for EmbeddingInvocationEvent.

Blocking spending: the Budget Guardrail pattern

Section titled “Blocking spending: the Budget Guardrail pattern”

Cost events fire after the call completes, so they cannot stop the call that just ran. What they can do is stop the next one.

The pattern combines two pieces you already know:

  1. A listener that counts. Subscribe to LlmInvocationEvent and accumulate cost or tokens against the key you care about — agent process id, tenant, end user.
  2. A guardrail that blocks. A UserInputGuardRail reads the counter before the next LLM call. If the budget is exceeded, the guardrail returns a CRITICAL validation error and the call never happens.
LLM call ───► LlmInvocationEvent ─┐
counter (per agent / tenant / user)
next call ──► UserInputGuardRail reads counter ────────┘
over budget? ──► CRITICAL ──► call blocked

The counter lives in your listener; the decision lives in your guardrail. Embabel wires both into the agent process for you. See Working with Guardrails for how to register a UserInputGuardRail and how CRITICAL validation errors stop execution.