Working with Multiple AI Models in Spring AI

Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API

In production AI applications, you rarely use just one model. GPT-4o handles creative generation, Claude excels at reasoning, and Groq delivers sub-second inference with open-source models. Spring AI makes it straightforward to wire all of them into a single Spring Boot application using patterns you already know.

This article covers the @Qualifier pattern for multi-model configuration, the mutate() pattern for runtime flexibility, OpenAI-compatible endpoints for providers like Groq, and a model router strategy for production systems.

The Problem: Bean Conflicts with Multiple Starters

When you add a single Spring AI starter — say spring-ai-openai-spring-boot-starter — Spring Boot auto-configures an OpenAiChatModel bean that implements the ChatModel interface. A ChatClient is built on top of it. Everything works.

The moment you add a second starter like spring-ai-anthropic-spring-boot-starter, you now have two ChatModel beans in the application context: OpenAiChatModel and AnthropicChatModel. Any injection point that asks for ChatModel by type will fail with:

NoUniqueBeanDefinitionException: No qualifying bean of type 'ChatModel':
expected single matching bean but found 2: openAiChatModel, anthropicChatModel

This is standard Spring behavior — nothing specific to Spring AI. But it means you need explicit configuration.

Solution: The @Qualifier Pattern

The cleanest approach is to create dedicated ChatClient beans that inject the concrete model types directly.

Configuration Class

@Configuration
public class AiConfig {

    @Bean
    public ChatClient openAiChatClient(OpenAiChatModel openAiChatModel) {
        return ChatClient.builder(openAiChatModel)
                .defaultSystem("You are a creative writing assistant.")
                .build();
    }

    @Bean
    public ChatClient anthropicChatClient(AnthropicChatModel anthropicChatModel) {
        return ChatClient.builder(anthropicChatModel)
                .defaultSystem("You are a precise summarization engine.")
                .build();
    }
}

By injecting OpenAiChatModel and AnthropicChatModel directly (concrete types, not the ChatModel interface), there's zero ambiguity. Each ChatClient bean gets its own model and system prompt.

Using Qualified Beans in Services

@Service
public class AiService {

    private final ChatClient openAiClient;
    private final ChatClient anthropicClient;

    public AiService(
            @Qualifier("openAiChatClient") ChatClient openAiClient,
            @Qualifier("anthropicChatClient") ChatClient anthropicClient) {
        this.openAiClient = openAiClient;
        this.anthropicClient = anthropicClient;
    }

    public String generateCreativeContent(String prompt) {
        return openAiClient.prompt().user(prompt).call().content();
    }

    public String summarize(String text) {
        return anthropicClient.prompt().user(text).call().content();
    }
}

The @Qualifier value matches the bean method name. Each method delegates to the right model for its task.

The mutate() Pattern: Runtime Flexibility

Sometimes you need a variation of an existing ChatClient — different system prompt, different temperature — without creating a new bean. The mutate() method returns a new builder pre-filled with the current client's configuration.

ChatClient customClient = openAiClient.mutate()
        .defaultSystem("You are a technical documentation writer.")
        .build();

String result = customClient.prompt()
        .user("Explain the Circuit Breaker pattern")
        .call()
        .content();

Key characteristics:

The original ChatClient is not modified — mutate() creates a copy
The returned builder inherits all defaults (model, system prompt, advisors)
You override only what you need
The mutated client is typically ephemeral — used and discarded

This is particularly useful when you have a base client configured with retry logic, rate limiting, and observability, and you need task-specific variations at runtime.

OpenAI-Compatible Endpoints: One Starter, Many Providers

Many AI providers expose OpenAI-compatible REST APIs: Groq, Together AI, Ollama, Perplexity, and others. Instead of adding a separate starter for each, you can reuse the OpenAI starter and override the base URL.

Application Properties

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

Programmatic Groq Client via OpenAI API

 @Bean
 public ChatClient groqChatClient(
            @Value("${spring.ai.openai-groq.api-key}") String apiKey,
            @Value("${spring.ai.openai-groq.base-url}") String baseUrl) {

        var openAiApi = new OpenAiApi.Builder().
                apiKey(apiKey)
                .baseUrl(baseUrl)
                .build();
        var chatOptions = OpenAiChatOptions.builder()
                .model("openai/gpt-oss-20b")
                .build();
        var chatModel =  OpenAiChatModel.builder()
                .openAiApi(openAiApi)
                .defaultOptions(chatOptions)
                .build();

        return ChatClient.builder(chatModel)
                .defaultSystem("You are a fast inference assistant.")
                .build();
    }

With custom properties in application.yml:

spring:
  ai:
    openai-groq:
      base-url: https://api.groq.com/openai/v1
      api-key: ${GROQ_API_KEY}

Now you have OpenAI, Anthropic, and Groq in one application — three models, three ChatClients, one codebase.

Model Router: Production-Grade Selection

Hardcoding model selection in your service layer doesn't scale. A router pattern decouples the "which model" decision from business logic.

@Service
public class ModelRouter {

    private final Map<String, ChatClient> clients;

    public ModelRouter(
            @Qualifier("openAiChatClient") ChatClient openAi,
            @Qualifier("anthropicChatClient") ChatClient anthropic,
            @Qualifier("groqChatClient") ChatClient groq) {
        this.clients = Map.of(
            "creative", openAi,
            "reasoning", anthropic,
            "fast", groq
        );
    }

    public ChatClient route(String taskType) {
        return clients.getOrDefault(taskType, clients.get("fast"));
    }
}

Usage becomes trivial:

ChatClient client = modelRouter.route("creative");
String result = client.prompt().user(prompt).call().content();

You can extend this pattern with:

Cost-aware routing — track token usage per model and route based on budget
Fallback chains — if the primary model is down, fall back to the next
A/B testing — randomly route a percentage of traffic to a new model
Latency-based routing — choose the fastest available model

Production Considerations

Dependency Setup

<!-- OpenAI -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Anthropic -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>

API Key Management

Store keys in environment variables or a secrets manager. Never commit them to source control.

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}

Error Handling

Each model can fail independently. Wrap calls in try-catch and consider implementing circuit breakers (Resilience4j) per model to prevent cascading failures.

Observability

Spring AI integrates with Micrometer. Each ChatModel emits metrics for token usage, latency, and error rates — critical when running multiple models in production.

Key Takeaways

Multiple starters = bean conflicts. Use @Qualifier with concrete model types to resolve them.
mutate() creates ephemeral ChatClient variations without polluting the bean context.
OpenAI-compatible endpoints let you use Groq, Together AI, and others through the OpenAI starter.
A model router decouples model selection from business logic and enables cost-aware, fallback, and A/B strategies.

Resources

Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API

Tags: #spring-ai #java #spring-boot #ai #multi-model

Working with Multiple AI Models in Spring AI — @Qualifier, mutate(), and Model Routing

Working with Multiple AI Models in Spring AI

The Problem: Bean Conflicts with Multiple Starters

Solution: The @Qualifier Pattern

Configuration Class

Using Qualified Beans in Services

The mutate() Pattern: Runtime Flexibility

OpenAI-Compatible Endpoints: One Starter, Many Providers

Application Properties

Programmatic Groq Client via OpenAI API

Model Router: Production-Grade Selection

Production Considerations

Dependency Setup

API Key Management

Error Handling

Observability

Key Takeaways

Resources

Comments

More from this blog

Spring AI Tool Calling: From Chatbot to AI Agent with @Tool

Spring AI Advisors API in Java: Build Middleware for AI Interactions

Spring AI ChatClient API: The Fluent Heart of AI Integration

What is Spring AI? — Why Java Developers Need This in 2026

Command Palette

Working with Multiple AI Models in Spring AI

The Problem: Bean Conflicts with Multiple Starters

Solution: The @Qualifier Pattern

Configuration Class

Using Qualified Beans in Services

The mutate() Pattern: Runtime Flexibility

OpenAI-Compatible Endpoints: One Starter, Many Providers

Application Properties

Programmatic Groq Client via OpenAI API

Model Router: Production-Grade Selection

Production Considerations

Dependency Setup

API Key Management

Error Handling

Observability

Key Takeaways

Resources

Comments

More from this blog