RAG (Retrieval-Augmented Generation)
Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses by retrieving relevant information from a knowledge base before generating answers. This grounds LLM outputs in specific, verifiable sources rather than relying solely on training data.
For more background on RAG concepts, see:
Embabel Agent provides RAG support through the LlmReference interface, which allows you to attach references (including RAG stores) to LLM calls.
The key classes are ToolishRag for exposing search operations as LLM tools, and SearchOperations for the underlying search functionality.
Agentic RAG Architecture
Section titled “Agentic RAG Architecture”Unlike traditional RAG implementations that perform a single retrieval step, Embabel Agent’s RAG is entirely agentic and tool-based. The LLM has full control over the retrieval process:
- Autonomous Search: The LLM decides when to search, what queries to use, and how many results to retrieve
- Iterative Refinement: The LLM can perform multiple searches with different queries until it finds relevant information
- Cross-Reference Discovery: The LLM can follow references, expand chunks to see surrounding context, and zoom out to parent sections
- HyDE Support: The LLM can generate hypothetical documents (HyDE queries) to improve semantic search results
This agentic approach produces better results than single-shot RAG because the LLM can:
- Start with a broad search and narrow down
- Try different phrasings if initial queries return poor results
- Expand promising results to get more context
- Combine information from multiple chunks
Facade Pattern for Safe Tool Exposure
Section titled “Facade Pattern for Safe Tool Exposure”Embabel Agent uses a facade pattern to expose RAG capabilities safely and consistently across different store implementations.
The ToolishRag class acts as a facade that:
- Inspects Store Capabilities: Examines which
SearchOperationssubinterfaces the store implements - Exposes Appropriate Tools: Only creates tool wrappers for supported operations
- Provides Consistent Interface: All tools use the same parameter patterns regardless of underlying store
@Overridepublic List<Tool> tools() { List<Object> toolObjects = new ArrayList<>(); if (searchOperations instanceof VectorSearch) { toolObjects.add(new VectorSearchTools((VectorSearch) searchOperations)); } if (searchOperations instanceof TextSearch) { toolObjects.add(new TextSearchTools((TextSearch) searchOperations)); } if (searchOperations instanceof ResultExpander) { toolObjects.add(new ResultExpanderTools((ResultExpander) searchOperations)); } if (searchOperations instanceof RegexSearchOperations) { toolObjects.add(new RegexSearchTools((RegexSearchOperations) searchOperations)); } return toolObjects.stream() .flatMap(obj -> Tool.fromInstance(obj).stream()) .toList();}override fun tools(): List<Tool> { val toolObjects = buildList { if (searchOperations is VectorSearch) { add(VectorSearchTools(searchOperations)) } if (searchOperations is TextSearch) { add(TextSearchTools(searchOperations)) } if (searchOperations is ResultExpander) { add(ResultExpanderTools(searchOperations)) } if (searchOperations is RegexSearchOperations) { add(RegexSearchTools(searchOperations)) } } return toolObjects.flatMap { Tool.fromInstance(it) }}This means:
- A Lucene store exposes vector search, text search, regex search, AND result expansion tools
- A Spring AI VectorStore adapter exposes only vector search tools
- A basic text-only store exposes only text search tools
- A directory-based text search exposes text search and regex search
The LLM sees only the tools that actually work with the configured store, preventing runtime errors from unsupported operations.
Getting Started
Section titled “Getting Started”To use RAG in your Embabel Agent application, add the rag-core module and a store implementation to your pom.xml:
<dependency> <groupId>com.embabel.agent</groupId> <artifactId>embabel-agent-rag-lucene</artifactId> <version>${embabel-agent.version}</version></dependency>
<dependency> <groupId>com.embabel.agent</groupId> <artifactId>embabel-agent-rag-tika</artifactId> <version>${embabel-agent.version}</version></dependency>The embabel-agent-rag-lucene module provides Lucene-based vector and text search.
The embabel-agent-rag-tika module provides Apache Tika integration for parsing various document formats.
Our Model
Section titled “Our Model”Embabel Agent uses a hierarchical content model that goes beyond traditional flat chunk storage:
Datum (sealed interface)│ Core: id, uri, metadata, labels()│├── ContentElement ─────────────────────────────────────┐│ Structural content (not embedded) ││ ┌───────────────────────────────────────────────┐ ││ │ ContentRoot / NavigableDocument │ ││ │ Documents with URI and title │ ││ └───────────────────────────────────────────────┘ ││ ┌───────────────────────────────────────────────┐ ││ │ ContainerSection / LeafSection │ ││ │ Hierarchical document sections │ ││ └───────────────────────────────────────────────┘ ││ │└── Retrievable ────────────────────────────────────────┤ Embeddable/searchable content │ ┌───────────────────────────────────────────────┐ │ │ Chunk │ │ │ text, parentId, embedding │ │ │ Primary unit for vector search │ │ └───────────────────────────────────────────────┘ │ ┌───────────────────────────────────────────────┐ │ │ NamedEntity │ │ │ Domain entity contract (Person, Product) │ │ │ name, description + domain properties │ │ │ │ │ │ └── NamedEntityData │ │ │ Storage format with properties map │ │ │ Hydration via toTypedInstance() │ │ └───────────────────────────────────────────────┘ │ │────────────────────────────────────────────────────────┘Key Design Points:
Datumis the root sealed interface for all data objectsContentElementbranch contains structural content (documents, sections) that is NOT embeddedRetrievablebranch contains searchable content with embeddings (chunks, entities)NamedEntityis the domain contract for typed entitiesNamedEntityDatais the storage format with genericpropertiesmap and hydration support
Content Elements
Section titled “Content Elements”The ContentElement interface is the supertype for all content in the RAG system.
Key subtypes include:
ContentRoot/NavigableDocument: The root of a document hierarchy, with a required URI and titleSection: A hierarchical division of content with a titleContainerSection: A section containing other sectionsLeafSection: A section containing actual text contentChunk: Traditional RAG text chunks, created by splittingLeafSectioncontent
Chunks
Section titled “Chunks”Chunk is the primary unit for vector search.
Each chunk:
- Contains a
textfield with the content - Has a
parentIdlinking to its source section - Includes
metadatawith information about its origin (root document, container section, leaf section) - Can compute its
pathFromRootthrough the document hierarchy
This hierarchical model enables advanced RAG capabilities like “zoom out” to parent sections or expansion to adjacent chunks.
SearchOperations
Section titled “SearchOperations”SearchOperations is the tag interface for search functionality.
Concrete implementations implement one or more subinterfaces based on their capabilities.
This design allows stores to implement only what’s natural and efficient for them—a vector database need not pretend to support full-text search, and a text search engine need not fake vector similarity.
VectorSearch
Section titled “VectorSearch”Classic semantic vector search:
public interface VectorSearch extends SearchOperations { <T extends Retrievable> List<SimilarityResult<T>> vectorSearch( TextSimilaritySearchRequest request, Class<T> clazz );}interface VectorSearch : SearchOperations { fun <T : Retrievable> vectorSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>>}TextSearch
Section titled “TextSearch”Full-text search using Lucene query syntax:
public interface TextSearch extends SearchOperations { <T extends Retrievable> List<SimilarityResult<T>> textSearch( TextSimilaritySearchRequest request, Class<T> clazz );}interface TextSearch : SearchOperations { fun <T : Retrievable> textSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>>}Supported query syntax includes:
+term- term must appear-term- term must not appear"phrase"- exact phrase matchterm*- prefix wildcardterm~- fuzzy match
ResultExpander
Section titled “ResultExpander”Expand search results to surrounding context:
public interface ResultExpander extends SearchOperations { List<ContentElement> expandResult( String id, Method method, int elementsToAdd );}interface ResultExpander : SearchOperations { fun expandResult( id: String, method: Method, elementsToAdd: Int, ): List<ContentElement>}Expansion methods:
SEQUENCE- expand to previous and next chunksZOOM_OUT- expand to enclosing section
RegexSearchOperations
Section titled “RegexSearchOperations”Pattern-based search across content:
public interface RegexSearchOperations extends SearchOperations { <T extends Retrievable> List<SimilarityResult<T>> regexSearch( Pattern regex, int topK, Class<T> clazz );}interface RegexSearchOperations : SearchOperations { fun <T : Retrievable> regexSearch( regex: Regex, topK: Int, clazz: Class<T>, ): List<SimilarityResult<T>>}Useful for finding specific patterns like error codes, identifiers, or structured content that doesn’t match well with semantic or keyword search.
CoreSearchOperations
Section titled “CoreSearchOperations”A convenience interface combining the most common search capabilities:
public interface CoreSearchOperations extends VectorSearch, TextSearch { }interface CoreSearchOperations : VectorSearch, TextSearchStores that support both vector and text search can implement this single interface for convenience.
ToolishRag
Section titled “ToolishRag”ToolishRag is an LlmReference that exposes SearchOperations as LLM tools.
This gives the LLM fine-grained control over RAG searches.
Configuration
Section titled “Configuration”Create a ToolishRag by wrapping your SearchOperations:
public ChatActions(SearchOperations searchOperations) { this.toolishRag = new ToolishRag( "sources", "Sources for answering user questions", searchOperations );}class ChatActions(searchOperations: SearchOperations) { private val toolishRag = ToolishRag( "sources", "Sources for answering user questions", searchOperations )}Using with LLM Calls
Section titled “Using with LLM Calls”Attach ToolishRag to an LLM call using .withReference():
@Action(canRerun = true, trigger = UserMessage.class)void respond(Conversation conversation, ActionContext context) { var assistantMessage = context.ai() .withLlm(properties.chatLlm()) .withReference(toolishRag) .rendering("ragbot") .respondWithSystemPrompt(conversation, Map.of( "properties", properties )); context.sendMessage(conversation.addMessage(assistantMessage));}@Action(canRerun = true, trigger = UserMessage::class)fun respond(conversation: Conversation, context: ActionContext) { val assistantMessage = context.ai() .withLlm(properties.chatLlm()) .withReference(toolishRag) .rendering("ragbot") .respondWithSystemPrompt(conversation, mapOf( "properties" to properties )) context.sendMessage(conversation.addMessage(assistantMessage))}Based on the capabilities of the underlying SearchOperations, ToolishRag exposes:
- VectorSearchTools:
vectorSearch(query, topK, threshold)- semantic similarity search - TextSearchTools:
textSearch(query, topK, threshold)- BM25 full-text search with Lucene syntax - RegexSearchTools:
regexSearch(regex, topK)- pattern-based search using regular expressions - ResultExpanderTools:
broadenChunk(chunkId, chunksToAdd)- expand to adjacent chunks,zoomOut(id)- expand to parent section
The LLM autonomously decides when to use these tools based on user queries.
Eager Search
Section titled “Eager Search”By default, ToolishRag is entirely agentic—the LLM decides when to search and what queries to use.
However, when the topic of the conversation is already known, you can preload relevant results before the LLM starts, giving it a head start and reducing the number of tool calls needed.
ToolishRag implements the EagerSearch interface, which provides withEagerSearchAbout():
// Preload results about the user's topic before the LLM startsToolishRag eagerRag = toolishRag .withEagerSearchAbout("Kotlin coroutines", 10);
context.ai() .withReference(eagerRag) .respondWithSystemPrompt(conversation, Map.of());// Preload results about the user's topic before the LLM startsval eagerRag = toolishRag .withEagerSearchAbout("Kotlin coroutines", 10)
context.ai() .withReference(eagerRag) .respondWithSystemPrompt(conversation, emptyMap())The preloaded results are included in the prompt as hints. The LLM still has access to all the usual search tools and can perform additional searches as needed.
For more control over the search parameters, pass a TextSimilaritySearchRequest directly:
var request = new TextSimilaritySearchRequest("Kotlin coroutines", 0.7, 10);ToolishRag eagerRag = toolishRag.withEagerSearchAbout(request);val request = TextSimilaritySearchRequest("Kotlin coroutines", 0.7, 10)val eagerRag = toolishRag.withEagerSearchAbout(request)ToolishRag lifecycle
Section titled “ToolishRag lifecycle”It is safe to create a ToolishRag instance and reuse across many LLM calls.
However, instances are not expensive to create, so you can create a new instance per LLM call.
You might choose to do this if you provide a ResultListener
that will collect queries and results for logging or analysis: for example, to track which queries were most useful for answering user questions and the complexity in terms of number of searches performed.
This can be useful for implementing a learning feedback loop, for example to discern which queries performed badly, indicating that content such as documentation needs to be enhanced.
Result Filtering
Section titled “Result Filtering”In multi-tenant applications or scenarios where searches should be scoped to specific data subsets, ToolishRag supports result filtering.
Filters are applied transparently to all searches—the LLM does not see or control them, ensuring security and data isolation.
Embabel Agent provides two types of filters:
- Metadata Filters: Filter on the
metadatamap ofDatumobjects (chunks, sections, etc.) - Property Filters: Filter on object properties of typed entities (e.g., fields of
NamedEntityDataor custom entity classes)
Both use the same PropertyFilter type but are applied at different levels.
Motivation
Section titled “Motivation”Consider a document management system where:
- Each document belongs to an owner (user or organization)
- Some documents are shared reference data accessible to all users
- The LLM should only search documents the current user is authorized to access
Without filtering, you would need separate RAG stores per user or risk data leakage.
With filtering, a single ToolishRag instance can be scoped per-request to the current user’s data.
Filter API
Section titled “Filter API”Embabel Agent provides two filter interfaces for RAG searches:
PropertyFilter: Filters on map-based properties (metadata, entity properties)EntityFilter: ExtendsPropertyFilterto add entity-specific filtering, particularly label-based filtering
PropertyFilter
Section titled “PropertyFilter”The PropertyFilter sealed class hierarchy provides type-safe filter expressions for map-based properties:
| Filter Type | Description | Example |
| --- | --- | --- |
| Eq | Equals | PropertyFilter.eq("owner", "alice") |
| Ne | Not equals | PropertyFilter.ne("status", "deleted") |
| Gt, Gte | Greater than (or equal) | PropertyFilter.gte("score", 0.8) |
| Lt, Lte | Less than (or equal) | PropertyFilter.lt("priority", 5) |
| In | Value in list | PropertyFilter.in("category", "tech", "science") |
| Nin | Value not in list | PropertyFilter.nin("status", "deleted", "archived") |
| Contains | String contains substring | PropertyFilter.contains("tags", "important") |
| And | Logical AND | PropertyFilter.and(filter1, filter2) |
| Or | Logical OR | PropertyFilter.or(filter1, filter2) |
| Not | Logical NOT | PropertyFilter.not(filter) |
EntityFilter
Section titled “EntityFilter”EntityFilter extends PropertyFilter to add entity-specific filtering. Currently, it adds label-based filtering via HasAnyLabel:
| Filter Type | Description | Example |
| --- | --- | --- |
| HasAnyLabel | Matches entities with any of the specified labels | EntityFilter.hasAnyLabel("Person", "Organization") |
HasAnyLabel is particularly useful for:
- Type-safe entity searches: Filter results to only include specific entity types
- Multi-type queries: Search across multiple entity types in one query
import com.embabel.agent.rag.filter.EntityFilter;import com.embabel.agent.rag.filter.PropertyFilter;
// Filter by single labelEntityFilter personFilter = EntityFilter.hasAnyLabel("Person");
// Filter by multiple labels (OR semantics - entity must have ANY of these labels)EntityFilter entityFilter = EntityFilter.hasAnyLabel("Person", "Organization");
// Combine HasAnyLabel with property filters using fluent APIPropertyFilter simpleCombo = EntityFilter.hasAnyLabel("Person") .and(PropertyFilter.eq("status", "active"));
// Multiple conditionsPropertyFilter complexFilter = EntityFilter.hasAnyLabel("Person") .and(PropertyFilter.eq("status", "active")) .and(PropertyFilter.gte("score", 0.8));
// OR combinationsPropertyFilter orFilter = EntityFilter.hasAnyLabel("Person") .or(PropertyFilter.eq("fallback", true));
// With negationPropertyFilter notDeleted = EntityFilter.hasAnyLabel("Person") .and(PropertyFilter.not(PropertyFilter.eq("status", "deleted")));
// Complex groupingPropertyFilter accessFilter = PropertyFilter.or( PropertyFilter.and( EntityFilter.hasAnyLabel("Person", "Employee"), PropertyFilter.eq("active", true) ), PropertyFilter.eq("role", "admin"));import com.embabel.agent.rag.filter.EntityFilterimport com.embabel.agent.rag.filter.PropertyFilter.Companion.eqimport com.embabel.agent.rag.filter.PropertyFilter.Companion.gte
// Filter by single labelval personFilter = EntityFilter.hasAnyLabel("Person")
// Filter by multiple labels (OR semantics - entity must have ANY of these labels)val entityFilter = EntityFilter.hasAnyLabel("Person", "Organization")
// Combine HasAnyLabel with property filters using fluent APIval simpleCombo = EntityFilter.hasAnyLabel("Person") and eq("status", "active")
// Multiple conditionsval complexFilter = EntityFilter.hasAnyLabel("Person") and eq("status", "active") and gte("score", 0.8)
// OR combinationsval orFilter = EntityFilter.hasAnyLabel("Person") or eq("fallback", true)
// With negationval notDeleted = EntityFilter.hasAnyLabel("Person") and !eq("status", "deleted")
// Complex groupingval accessFilter = (EntityFilter.hasAnyLabel("Person", "Employee") and eq("active", true)) or eq("role", "admin")Since EntityFilter extends PropertyFilter, all filter types share the same and, or, not operators and can be freely combined.
Kotlin Operator Syntax
Section titled “Kotlin Operator Syntax”Kotlin users can use operator and infix functions for a more natural DSL syntax:
import com.embabel.agent.rag.filter.PropertyFilter;
// Simple filter with not operatorPropertyFilter notDeleted = PropertyFilter.not(PropertyFilter.eq("status", "deleted"));
// Combine with 'and' and 'or'PropertyFilter userAccess = PropertyFilter.and( PropertyFilter.eq("owner", userId), PropertyFilter.gte("confidenceScore", 0.7));
// Complex expressions with groupingPropertyFilter accessFilter = PropertyFilter.or( PropertyFilter.and( PropertyFilter.eq("owner", userId), PropertyFilter.ne("status", "deleted") ), PropertyFilter.eq("role", "admin"));import com.embabel.agent.rag.filter.PropertyFilter.Companion.eqimport com.embabel.agent.rag.filter.PropertyFilter.Companion.gteimport com.embabel.agent.rag.filter.PropertyFilter.Companion.ne
// Simple filter with not operatorval notDeleted = !eq("status", "deleted")
// Combine with infix 'and' and 'or'val userAccess = eq("owner", userId) and gte("confidenceScore", 0.7)
// Complex expressions with groupingval accessFilter = (eq("owner", userId) and ne("status", "deleted")) or eq("role", "admin")Metadata vs Entity Filters
Section titled “Metadata vs Entity Filters”ToolishRag accepts two separate filter parameters:
metadataFilter: APropertyFilterthat filters on themetadatamap ofDatumobjects. Metadata is typically ingestion-time information like source URI, ingestion date, owner ID, etc.entityFilter: AnEntityFilterthat filters on entity properties and labels. ForNamedEntityData, this filters on thepropertiesmap andlabels(). For typed entities, reflection is used to access top-level fields.
// Filter on metadata (e.g., which user owns the document)PropertyFilter metadataFilter = PropertyFilter.eq("ownerId", currentUserId);
// Filter on entity labels and propertiesEntityFilter entityFilter = EntityFilter.hasAnyLabel("Person");
// Apply both filtersToolishRag scopedRag = toolishRag .withMetadataFilter(metadataFilter) .withEntityFilter(entityFilter);// Filter on metadata (e.g., which user owns the document)val metadataFilter = PropertyFilter.eq("ownerId", currentUserId)
// Filter on entity labels and properties (combine label filtering with property filtering)val entityFilter = EntityFilter.hasAnyLabel("Person") and PropertyFilter.eq("status", "active")
// Apply both filtersval scopedRag = toolishRag .withMetadataFilter(metadataFilter) .withEntityFilter(entityFilter)In most cases, you’ll use metadata filters for access control and entity filters for type-based and business logic filtering.
Neo4j Cypher Filtering
Section titled “Neo4j Cypher Filtering”When using Neo4j via the Drivine module, metadata filters are automatically converted to Cypher WHERE clauses using CypherFilterConverter:
// The filter is converted to Cypher WHERE clause automaticallyPropertyFilter filter = PropertyFilter.and( PropertyFilter.eq("owner", "alice"), PropertyFilter.gte("confidenceScore", 0.7));
// In DrivineNamedEntityDataRepository:List<SimilarityResult<T>> results = repository.vectorSearch(request, filter);// Generates: WHERE (e.owner = $_filter_0) AND (e.confidenceScore >= $_filter_1) AND ...// The filter is converted to Cypher WHERE clause automaticallyval filter = eq("owner", "alice") and gte("confidenceScore", 0.7)
// In DrivineNamedEntityDataRepository:val results = repository.vectorSearch(request, metadataFilter = filter)// Generates: WHERE (e.owner = $_filter_0) AND (e.confidenceScore >= $_filter_1) AND ...The converter produces parameterized queries for safety and handles all filter types including nested logical expressions.
For both DrivineStore (chunks) and DrivineNamedEntityDataRepository (named entities), both metadata and property filters are translated to native Cypher WHERE clauses. This is because Neo4j stores all data as node properties - metadata is simply the set of properties that aren’t core fields like id, text, parentId, etc. This provides optimal performance by filtering at the database level rather than in-memory.
Basic Usage
Section titled “Basic Usage”Apply a metadata filter to scope all searches to a specific owner:
// Create a filter for the current userPropertyFilter ownerFilter = PropertyFilter.eq("ownerId", currentUserId);
// Apply to ToolishRag - all searches will be filteredToolishRag scopedRag = toolishRag.withMetadataFilter(ownerFilter);
// Use in LLM call - LLM cannot see or bypass the filtercontext.ai() .withReference(scopedRag) .respondWithSystemPrompt(conversation, Map.of());// Create a filter for the current userval ownerFilter = PropertyFilter.eq("ownerId", currentUserId)
// Apply to ToolishRag - all searches will be filteredval scopedRag = toolishRag.withMetadataFilter(ownerFilter)
// Use in LLM call - LLM cannot see or bypass the filtercontext.ai() .withReference(scopedRag) .respondWithSystemPrompt(conversation, emptyMap())Complex Filters
Section titled “Complex Filters”Combine filters for more sophisticated access control:
// User can access their own documents OR documents in their departmentsPropertyFilter accessFilter = PropertyFilter.or( PropertyFilter.eq("ownerId", currentUserId), PropertyFilter.in("departmentId", userDepartmentIds));
ToolishRag scopedRag = toolishRag.withMetadataFilter(accessFilter);
// Organization-scoped with status restrictionPropertyFilter orgFilter = PropertyFilter.and( PropertyFilter.eq("orgId", currentOrgId), PropertyFilter.ne("status", "deleted"), PropertyFilter.gte("confidenceScore", 0.7));
ToolishRag scopedRag2 = toolishRag.withMetadataFilter(orgFilter);// User can access their own documents OR documents in their departmentsval accessFilter = eq("ownerId", currentUserId) or PropertyFilter.`in`("departmentId", *userDepartmentIds.toTypedArray())
val scopedRag = toolishRag.withMetadataFilter(accessFilter)
// Organization-scoped with status restrictionval orgFilter = eq("orgId", currentOrgId) and ne("status", "deleted") and gte("confidenceScore", 0.7)
val scopedRag2 = toolishRag.withMetadataFilter(orgFilter)Per-Request Scoping Pattern
Section titled “Per-Request Scoping Pattern”A common pattern is to create a scoped ToolishRag per request in a web application:
@Action(trigger = UserMessage.class)void respond(Conversation conversation, ActionContext context) { // Get current user from security context String userId = SecurityContextHolder.getContext() .getAuthentication().getName();
// Create user-scoped RAG for this request ToolishRag userScopedRag = toolishRag.withMetadataFilter( PropertyFilter.eq("ownerId", userId) );
context.ai() .withReference(userScopedRag) .rendering("assistant") .respondWithSystemPrompt(conversation, Map.of());}@Action(trigger = UserMessage::class)fun respond(conversation: Conversation, context: ActionContext) { // Get current user from security context val userId = SecurityContextHolder.getContext() .authentication.name
// Create user-scoped RAG for this request val userScopedRag = toolishRag.withMetadataFilter( PropertyFilter.eq("ownerId", userId) )
context.ai() .withReference(userScopedRag) .rendering("assistant") .respondWithSystemPrompt(conversation, emptyMap())}Backend Implementation
Section titled “Backend Implementation”Filters are applied at different levels depending on the backend:
- Spring AI VectorStore: Metadata filters are translated to
Filter.Expressionfor native filtering; entity filters (includingHasAnyLabel) are applied in-memory - Neo4j (Drivine): Both metadata and entity filters (including
HasAnyLabel) are translated to native Cypher WHERE clauses and label predicates (optimal performance) - Lucene: Both filter types are applied as post-filters with inflated
topKto compensate for filtered-out results - Custom stores: Can implement
FilteringVectorSearch/FilteringTextSearchfor native translation, or fall back to in-memory filtering
The InMemoryPropertyFilter utility class provides fallback filtering for any store implementation:
// In your SearchOperations implementationList<SimilarityResult<T>> results = performSearch(request);return InMemoryPropertyFilter.filterResults(results, metadataFilter, entityFilter);// In your SearchOperations implementationval results = performSearch(request)return InMemoryPropertyFilter.filterResults(results, metadataFilter, entityFilter)For EntityFilter.HasAnyLabel, the in-memory filter checks if the entity has any of the specified labels via NamedEntityData.labels().
This ensures filtering works across all backends, with native optimization for metadata filters where available.
Ingestion
Section titled “Ingestion”Document Parsing with Tika
Section titled “Document Parsing with Tika”Embabel Agent uses Apache Tika for document parsing. TikaHierarchicalContentReader reads various formats (Markdown, HTML, PDF, Word, etc.) and extracts a hierarchical structure:
@ShellMethod("Ingest URL or file path")String ingest(@ShellOption(defaultValue = "./data/document.md") String location) { var uri = location.startsWith("http://") || location.startsWith("https://") ? location : Path.of(location).toAbsolutePath().toUri().toString(); var ingested = NeverRefreshExistingDocumentContentPolicy.INSTANCE .ingestUriIfNeeded( luceneSearchOperations, new TikaHierarchicalContentReader(), uri ); return ingested != null ? "Ingested document with ID: " + ingested : "Document already exists, no ingestion performed.";}@ShellMethod("Ingest URL or file path")fun ingest(@ShellOption(defaultValue = "./data/document.md") location: String): String { val uri = if (location.startsWith("http://") || location.startsWith("https://")) { location } else { Path.of(location).toAbsolutePath().toUri().toString() } val ingested = NeverRefreshExistingDocumentContentPolicy.INSTANCE .ingestUriIfNeeded( luceneSearchOperations, TikaHierarchicalContentReader(), uri ) return if (ingested != null) { "Ingested document with ID: $ingested" } else { "Document already exists, no ingestion performed." }}Chunking Configuration
Section titled “Chunking Configuration”Content is split into chunks with configurable parameters:
ragbot: chunker-config: max-chunk-size: 800 overlap-size: 100Configuration options:
maxChunkSize- Maximum characters per chunk (default: 1500)overlapSize- Character overlap between consecutive chunks (default: 200)includeSectionTitleInChunk- Include section title in chunk text (default: true)
Chunk Transformation
Section titled “Chunk Transformation”When chunks are created from documents, they often lack the context needed for effective retrieval.
A chunk containing “This approach improves performance by 40%” is not useful unless the reader knows what “this approach” refers to.
The ChunkTransformer interface allows you to enrich chunks with additional context before they are indexed.
The urtext Field
Section titled “The urtext Field”Every Chunk has two text fields:
text- The indexed content, which may be transformed with additional contexturtext- The original, unmodified chunk text
The urtext field preserves the original content for accurate citations.
When displaying search results to users, use urtext to show exactly what appeared in the source document, while using the enriched text for vector embeddings and search.
AddTitlesChunkTransformer
Section titled “AddTitlesChunkTransformer”The recommended default transformer is AddTitlesChunkTransformer, which prepends document and section titles to each chunk:
@BeanChunkTransformer chunkTransformer() { return AddTitlesChunkTransformer.INSTANCE;}@Beanfun chunkTransformer(): ChunkTransformer { return AddTitlesChunkTransformer.INSTANCE}This transforms a chunk like:
This approach improves performance by 40% compared to the baseline.Into:
# Title: Performance Optimization Guide# URI: https://docs.example.com/performance# Section: Caching Strategies
This approach improves performance by 40% compared to the baseline.Now the chunk carries its context, improving both retrieval accuracy and LLM understanding.
Custom Transformers
Section titled “Custom Transformers”You can create custom transformers by implementing ChunkTransformer or extending AbstractChunkTransformer:
public class MetadataEnrichingTransformer extends AbstractChunkTransformer {
@Override public Map<String, Object> additionalMetadata( Chunk chunk, ChunkTransformationContext context) { return Map.of( "documentType", context.getDocument().getMetadata().get("type"), "lastModified", Instant.now().toString() ); }
@Override public String newText(Chunk chunk, ChunkTransformationContext context) { // Optionally modify the text return chunk.getText(); }}class MetadataEnrichingTransformer : AbstractChunkTransformer() {
override fun additionalMetadata( chunk: Chunk, context: ChunkTransformationContext ): Map<String, Any> { return mapOf( "documentType" to context.document?.metadata?.get("type"), "lastModified" to Instant.now().toString() ) }
override fun newText(chunk: Chunk, context: ChunkTransformationContext): String { // Optionally modify the text return chunk.text }}The ChunkTransformationContext provides access to:
section- TheSectioncontaining this chunkdocument- TheContentRoot(may be null for orphan sections)
Chaining Transformers
Section titled “Chaining Transformers”Use ChainedChunkTransformer to apply multiple transformations in sequence:
@BeanChunkTransformer chunkTransformer() { return new ChainedChunkTransformer(List.of( AddTitlesChunkTransformer.INSTANCE, new MetadataEnrichingTransformer(), new CustomCleanupTransformer() ));}@Beanfun chunkTransformer(): ChunkTransformer { return ChainedChunkTransformer(listOf( AddTitlesChunkTransformer.INSTANCE, MetadataEnrichingTransformer(), CustomCleanupTransformer() ))}Transformers are applied in order, with each receiving the output of the previous transformer.
Configuring the Store
Section titled “Configuring the Store”Pass your ChunkTransformer to the store implementation:
@DependsOn("onnxEmbeddingInitializer") (1)@BeanDrivineStore drivineStore( PersistenceManager persistenceManager, EmbeddingService embeddingService, ChunkTransformer chunkTransformer, (2) MyProperties properties) { return new DrivineStore( persistenceManager, properties.neoRag(), properties.chunkerConfig(), chunkTransformer, (3) embeddingService, platformTransactionManager, new DrivineCypherSearch(persistenceManager) );}@DependsOn("onnxEmbeddingInitializer") (1)@Beanfun drivineStore( persistenceManager: PersistenceManager, embeddingService: EmbeddingService, chunkTransformer: ChunkTransformer, (2) properties: MyProperties): DrivineStore { return DrivineStore( persistenceManager, properties.neoRag(), properties.chunkerConfig(), chunkTransformer, (3) embeddingService, platformTransactionManager, DrivineCypherSearch(persistenceManager) )}- Ensure the
EmbeddingServicebean is registered before this configuration is wired (see note below) - Inject the
ChunkTransformerbean - Pass it to the store constructor
Using Docling for Markdown Conversion
Section titled “Using Docling for Markdown Conversion”While we believe that you should write your Gen AI applications in Java or Kotlin, ingestion is more in the realm of data science, and Python is indisputably strong in this area.
For complex documents like PDFs, consider using Docling to convert to Markdown first:
docling https://example.com/document.pdf --from pdf --to md --output ./dataMarkdown is easier to parse hierarchically and produces better chunks than raw PDF extraction.
Supported Stores
Section titled “Supported Stores”Embabel Agent provides several RAG store implementations:
Lucene (embabel-agent-rag-lucene)
Section titled “Lucene (embabel-agent-rag-lucene)”Full-featured store with vector search, text search, and result expansion. Supports both in-memory and file-based persistence:
@BeanLuceneSearchOperations luceneSearchOperations( ModelProvider modelProvider, RagbotProperties properties) { var embeddingService = modelProvider.getEmbeddingService( DefaultModelSelectionCriteria.INSTANCE); return LuceneSearchOperations .withName("docs") .withEmbeddingService(embeddingService) .withChunkerConfig(properties.chunkerConfig()) .withIndexPath(Paths.get("./.lucene-index")) // File persistence .buildAndLoadChunks();}@Beanfun luceneSearchOperations( modelProvider: ModelProvider, properties: RagbotProperties): LuceneSearchOperations { val embeddingService = modelProvider.getEmbeddingService( DefaultModelSelectionCriteria.INSTANCE ) return LuceneSearchOperations .withName("docs") .withEmbeddingService(embeddingService) .withChunkerConfig(properties.chunkerConfig()) .withIndexPath(Paths.get("./.lucene-index")) // File persistence .buildAndLoadChunks()}Omit .withIndexPath() for in-memory only storage.
Graph database store for RAG (available in separate modules embabel-agent-rag-neo-drivine and embabel-agent-rag-neo-ogm).
Ideal when you need graph relationships between content elements.
PostgreSQL pgvector (embabel-rag-pgvector)
Section titled “PostgreSQL pgvector (embabel-rag-pgvector)”PostgreSQL-based RAG store using the pgvector extension (available in the separate embabel/embabel-rag-pgvector repository).
Supports hybrid search combining vector similarity, full-text search via tsvector/tsquery, and fuzzy matching via pg_trgm.
Ideal when you already use PostgreSQL and want a familiar, battle-tested database for RAG.
Spring AI VectorStore (SpringVectorStoreVectorSearch)
Section titled “Spring AI VectorStore (SpringVectorStoreVectorSearch)”Adapter that wraps any Spring AI VectorStore, enabling use of any vector database Spring AI supports:
public class SpringVectorStoreVectorSearch implements VectorSearch { private final VectorStore vectorStore;
public SpringVectorStoreVectorSearch(VectorStore vectorStore) { this.vectorStore = vectorStore; }
@Override public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch( TextSimilaritySearchRequest request, Class<T> clazz) { SearchRequest searchRequest = SearchRequest .builder() .query(request.getQuery()) .similarityThreshold(request.getSimilarityThreshold()) .topK(request.getTopK()) .build(); List<Document> results = vectorStore.similaritySearch(searchRequest); // ... convert results }}class SpringVectorStoreVectorSearch( private val vectorStore: VectorStore,) : VectorSearch { override fun <T : Retrievable> vectorSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>> { val searchRequest = SearchRequest .builder() .query(request.query) .similarityThreshold(request.similarityThreshold) .topK(request.topK) .build() val results = vectorStore.similaritySearch(searchRequest) // ... convert results }}This allows integration with Pinecone, Weaviate, Milvus, Chroma, and other stores via Spring AI.
Implementing Your Own RAG Store
Section titled “Implementing Your Own RAG Store”To implement a custom RAG store, implement only the SearchOperations subinterfaces that are natural and efficient for your store.
This is a key design principle: stores should only implement what they can do well.
For example:
- A vector database like Pinecone might implement only
VectorSearchsince that’s its strength - A full-text search engine might implement
TextSearchandRegexSearchOperations - A hierarchical document store might add
ResultExpanderfor context expansion - A full-featured store like Lucene can implement all interfaces
The ToolishRag facade automatically exposes only the tools that your store supports.
This means you don’t need to provide stub implementations or throw “not supported” exceptions—simply don’t implement interfaces that don’t fit your store’s capabilities.
// A store that only supports vector searchpublic class MyVectorOnlyStore implements VectorSearch { @Override public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch( TextSimilaritySearchRequest request, Class<T> clazz) { // Implement vector similarity search }}
// A store that supports both vector and text searchpublic class MyFullTextStore implements VectorSearch, TextSearch { @Override public <T extends Retrievable> List<SimilarityResult<T>> vectorSearch( TextSimilaritySearchRequest request, Class<T> clazz) { // Implement vector similarity search }
@Override public <T extends Retrievable> List<SimilarityResult<T>> textSearch( TextSimilaritySearchRequest request, Class<T> clazz) { // Implement full-text search }
@Override public String getLuceneSyntaxNotes() { return "Full Lucene syntax supported"; }}// A store that only supports vector searchclass MyVectorOnlyStore : VectorSearch { override fun <T : Retrievable> vectorSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>> { // Implement vector similarity search }}
// A store that supports both vector and text searchclass MyFullTextStore : VectorSearch, TextSearch { override fun <T : Retrievable> vectorSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>> { // Implement vector similarity search }
override fun <T : Retrievable> textSearch( request: TextSimilaritySearchRequest, clazz: Class<T>, ): List<SimilarityResult<T>> { // Implement full-text search }
override val luceneSyntaxNotes: String = "Full Lucene syntax supported"}For ingestion support, extend ChunkingContentElementRepository to handle document storage and chunking.
Complete Example
Section titled “Complete Example”See the rag-demo project for a complete working example including:
- Lucene-based RAG store configuration
- Document ingestion via Tika
- Chatbot with RAG-powered responses
- Jinja prompt templates for system prompts
- Spring Shell commands for interactive testing