Working with Multiple AI Models in Spring AI — @Qualifier, mutate(), and Model Routing
I am a developer who loves Java, Spring, Quarkus, Micronaut, Open source, Microservices, Cloud
Working with Multiple AI Models in Spring AI
Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API
In production AI applications, you rarely use just one model. GPT-4o handles creative generation, Claude excels at reasoning, and Groq delivers sub-second inference with open-source models. Spring AI makes it straightforward to wire all of them into a single Spring Boot application using patterns you already know.
This article covers the @Qualifier pattern for multi-model configuration, the mutate() pattern for runtime flexibility, OpenAI-compatible endpoints for providers like Groq, and a model router strategy for production systems.
The Problem: Bean Conflicts with Multiple Starters
When you add a single Spring AI starter — say spring-ai-openai-spring-boot-starter — Spring Boot auto-configures an OpenAiChatModel bean that implements the ChatModel interface. A ChatClient is built on top of it. Everything works.
The moment you add a second starter like spring-ai-anthropic-spring-boot-starter, you now have two ChatModel beans in the application context: OpenAiChatModel and AnthropicChatModel. Any injection point that asks for ChatModel by type will fail with:
NoUniqueBeanDefinitionException: No qualifying bean of type 'ChatModel':
expected single matching bean but found 2: openAiChatModel, anthropicChatModel
This is standard Spring behavior — nothing specific to Spring AI. But it means you need explicit configuration.
Solution: The @Qualifier Pattern
The cleanest approach is to create dedicated ChatClient beans that inject the concrete model types directly.
Configuration Class
@Configuration
public class AiConfig {
@Bean
public ChatClient openAiChatClient(OpenAiChatModel openAiChatModel) {
return ChatClient.builder(openAiChatModel)
.defaultSystem("You are a creative writing assistant.")
.build();
}
@Bean
public ChatClient anthropicChatClient(AnthropicChatModel anthropicChatModel) {
return ChatClient.builder(anthropicChatModel)
.defaultSystem("You are a precise summarization engine.")
.build();
}
}
By injecting OpenAiChatModel and AnthropicChatModel directly (concrete types, not the ChatModel interface), there's zero ambiguity. Each ChatClient bean gets its own model and system prompt.
Using Qualified Beans in Services
@Service
public class AiService {
private final ChatClient openAiClient;
private final ChatClient anthropicClient;
public AiService(
@Qualifier("openAiChatClient") ChatClient openAiClient,
@Qualifier("anthropicChatClient") ChatClient anthropicClient) {
this.openAiClient = openAiClient;
this.anthropicClient = anthropicClient;
}
public String generateCreativeContent(String prompt) {
return openAiClient.prompt().user(prompt).call().content();
}
public String summarize(String text) {
return anthropicClient.prompt().user(text).call().content();
}
}
The @Qualifier value matches the bean method name. Each method delegates to the right model for its task.
The mutate() Pattern: Runtime Flexibility
Sometimes you need a variation of an existing ChatClient — different system prompt, different temperature — without creating a new bean. The mutate() method returns a new builder pre-filled with the current client's configuration.
ChatClient customClient = openAiClient.mutate()
.defaultSystem("You are a technical documentation writer.")
.build();
String result = customClient.prompt()
.user("Explain the Circuit Breaker pattern")
.call()
.content();
Key characteristics:
- The original
ChatClientis not modified —mutate()creates a copy - The returned builder inherits all defaults (model, system prompt, advisors)
- You override only what you need
- The mutated client is typically ephemeral — used and discarded
This is particularly useful when you have a base client configured with retry logic, rate limiting, and observability, and you need task-specific variations at runtime.
OpenAI-Compatible Endpoints: One Starter, Many Providers
Many AI providers expose OpenAI-compatible REST APIs: Groq, Together AI, Ollama, Perplexity, and others. Instead of adding a separate starter for each, you can reuse the OpenAI starter and override the base URL.
Application Properties
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
chat:
options:
model: gpt-4o
Programmatic Groq Client via OpenAI API
@Bean
public ChatClient groqChatClient(
@Value("${spring.ai.openai-groq.api-key}") String apiKey,
@Value("${spring.ai.openai-groq.base-url}") String baseUrl) {
var openAiApi = new OpenAiApi(baseUrl, apiKey);
var chatOptions = OpenAiChatOptions.builder()
.model("llama-3.3-70b-versatile")
.build();
var chatModel = new OpenAiChatModel(openAiApi, chatOptions);
return ChatClient.builder(chatModel)
.defaultSystem("You are a fast inference assistant.")
.build();
}
With custom properties in application.yml:
spring:
ai:
openai-groq:
base-url: https://api.groq.com/openai/v1
api-key: ${GROQ_API_KEY}
Now you have OpenAI, Anthropic, and Groq in one application — three models, three ChatClients, one codebase.
Model Router: Production-Grade Selection
Hardcoding model selection in your service layer doesn't scale. A router pattern decouples the "which model" decision from business logic.
@Service
public class ModelRouter {
private final Map<String, ChatClient> clients;
public ModelRouter(
@Qualifier("openAiChatClient") ChatClient openAi,
@Qualifier("anthropicChatClient") ChatClient anthropic,
@Qualifier("groqChatClient") ChatClient groq) {
this.clients = Map.of(
"creative", openAi,
"reasoning", anthropic,
"fast", groq
);
}
public ChatClient route(String taskType) {
return clients.getOrDefault(taskType, clients.get("fast"));
}
}
Usage becomes trivial:
ChatClient client = modelRouter.route("creative");
String result = client.prompt().user(prompt).call().content();
You can extend this pattern with:
- Cost-aware routing — track token usage per model and route based on budget
- Fallback chains — if the primary model is down, fall back to the next
- A/B testing — randomly route a percentage of traffic to a new model
- Latency-based routing — choose the fastest available model
Production Considerations
Dependency Setup
<!-- OpenAI -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Anthropic -->
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>
API Key Management
Store keys in environment variables or a secrets manager. Never commit them to source control.
spring:
ai:
openai:
api-key: ${OPENAI_API_KEY}
anthropic:
api-key: ${ANTHROPIC_API_KEY}
Error Handling
Each model can fail independently. Wrap calls in try-catch and consider implementing circuit breakers (Resilience4j) per model to prevent cascading failures.
Observability
Spring AI integrates with Micrometer. Each ChatModel emits metrics for token usage, latency, and error rates — critical when running multiple models in production.
Key Takeaways
- Multiple starters = bean conflicts. Use
@Qualifierwith concrete model types to resolve them. mutate()creates ephemeral ChatClient variations without polluting the bean context.- OpenAI-compatible endpoints let you use Groq, Together AI, and others through the OpenAI starter.
- A model router decouples model selection from business logic and enables cost-aware, fallback, and A/B strategies.
Resources
- Spring AI Reference Documentation
- Spring AI GitHub Repository
- Groq API Documentation
- Anthropic API Documentation
Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API
Tags: #spring-ai #java #spring-boot #ai #multi-model



