# Working with Multiple AI Models in Spring AI — @Qualifier, mutate(), and Model Routing

# Working with Multiple AI Models in Spring AI

> **Spring AI Complete Course — Lecture 3 of 12**  
> Previous: [Lecture 2 — ChatClient API](#) | Next: [Lecture 4 — Advisors API](#)

In production AI applications, you rarely use just one model. GPT-4o handles creative generation, Claude excels at reasoning, and Groq delivers sub-second inference with open-source models. Spring AI makes it straightforward to wire all of them into a single Spring Boot application using patterns you already know.

This article covers the **@Qualifier pattern** for multi-model configuration, the **mutate() pattern** for runtime flexibility, **OpenAI-compatible endpoints** for providers like Groq, and a **model router** strategy for production systems.

* * *

## The Problem: Bean Conflicts with Multiple Starters

When you add a single Spring AI starter — say `spring-ai-openai-spring-boot-starter` — Spring Boot auto-configures an `OpenAiChatModel` bean that implements the `ChatModel` interface. A `ChatClient` is built on top of it. Everything works.

The moment you add a second starter like `spring-ai-anthropic-spring-boot-starter`, you now have two `ChatModel` beans in the application context: `OpenAiChatModel` and `AnthropicChatModel`. Any injection point that asks for `ChatModel` by type will fail with:

```plaintext
NoUniqueBeanDefinitionException: No qualifying bean of type 'ChatModel':
expected single matching bean but found 2: openAiChatModel, anthropicChatModel
```

This is standard Spring behavior — nothing specific to Spring AI. But it means you need explicit configuration.

* * *

## Solution: The @Qualifier Pattern

The cleanest approach is to create dedicated `ChatClient` beans that inject the concrete model types directly.

### Configuration Class

```java
@Configuration
public class AiConfig {

    @Bean
    public ChatClient openAiChatClient(OpenAiChatModel openAiChatModel) {
        return ChatClient.builder(openAiChatModel)
                .defaultSystem("You are a creative writing assistant.")
                .build();
    }

    @Bean
    public ChatClient anthropicChatClient(AnthropicChatModel anthropicChatModel) {
        return ChatClient.builder(anthropicChatModel)
                .defaultSystem("You are a precise summarization engine.")
                .build();
    }
}
```

By injecting `OpenAiChatModel` and `AnthropicChatModel` directly (concrete types, not the `ChatModel` interface), there's zero ambiguity. Each `ChatClient` bean gets its own model and system prompt.

### Using Qualified Beans in Services

```java
@Service
public class AiService {

    private final ChatClient openAiClient;
    private final ChatClient anthropicClient;

    public AiService(
            @Qualifier("openAiChatClient") ChatClient openAiClient,
            @Qualifier("anthropicChatClient") ChatClient anthropicClient) {
        this.openAiClient = openAiClient;
        this.anthropicClient = anthropicClient;
    }

    public String generateCreativeContent(String prompt) {
        return openAiClient.prompt().user(prompt).call().content();
    }

    public String summarize(String text) {
        return anthropicClient.prompt().user(text).call().content();
    }
}
```

The `@Qualifier` value matches the bean method name. Each method delegates to the right model for its task.

* * *

## The mutate() Pattern: Runtime Flexibility

Sometimes you need a variation of an existing `ChatClient` — different system prompt, different temperature — without creating a new bean. The `mutate()` method returns a new builder pre-filled with the current client's configuration.

```java
ChatClient customClient = openAiClient.mutate()
        .defaultSystem("You are a technical documentation writer.")
        .build();

String result = customClient.prompt()
        .user("Explain the Circuit Breaker pattern")
        .call()
        .content();
```

**Key characteristics:**

*   The original `ChatClient` is **not modified** — `mutate()` creates a copy
    
*   The returned builder inherits all defaults (model, system prompt, advisors)
    
*   You override only what you need
    
*   The mutated client is typically ephemeral — used and discarded
    

This is particularly useful when you have a base client configured with retry logic, rate limiting, and observability, and you need task-specific variations at runtime.

* * *

## OpenAI-Compatible Endpoints: One Starter, Many Providers

Many AI providers expose OpenAI-compatible REST APIs: **Groq**, **Together AI**, **Ollama**, **Perplexity**, and others. Instead of adding a separate starter for each, you can reuse the OpenAI starter and override the base URL.

### Application Properties

```yaml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o
```

### Programmatic Groq Client via OpenAI API

```java
 @Bean
 public ChatClient groqChatClient(
            @Value("${spring.ai.openai-groq.api-key}") String apiKey,
            @Value("${spring.ai.openai-groq.base-url}") String baseUrl) {

        var openAiApi = new OpenAiApi.Builder().
                apiKey(apiKey)
                .baseUrl(baseUrl)
                .build();
        var chatOptions = OpenAiChatOptions.builder()
                .model("openai/gpt-oss-20b")
                .build();
        var chatModel =  OpenAiChatModel.builder()
                .openAiApi(openAiApi)
                .defaultOptions(chatOptions)
                .build();

        return ChatClient.builder(chatModel)
                .defaultSystem("You are a fast inference assistant.")
                .build();
    }
```

With custom properties in `application.yml`:

```yaml
spring:
  ai:
    openai-groq:
      base-url: https://api.groq.com/openai/v1
      api-key: ${GROQ_API_KEY}
```

Now you have OpenAI, Anthropic, and Groq in one application — three models, three ChatClients, one codebase.

* * *

## Model Router: Production-Grade Selection

Hardcoding model selection in your service layer doesn't scale. A router pattern decouples the "which model" decision from business logic.

```java
@Service
public class ModelRouter {

    private final Map<String, ChatClient> clients;

    public ModelRouter(
            @Qualifier("openAiChatClient") ChatClient openAi,
            @Qualifier("anthropicChatClient") ChatClient anthropic,
            @Qualifier("groqChatClient") ChatClient groq) {
        this.clients = Map.of(
            "creative", openAi,
            "reasoning", anthropic,
            "fast", groq
        );
    }

    public ChatClient route(String taskType) {
        return clients.getOrDefault(taskType, clients.get("fast"));
    }
}
```

Usage becomes trivial:

```java
ChatClient client = modelRouter.route("creative");
String result = client.prompt().user(prompt).call().content();
```

You can extend this pattern with:

*   **Cost-aware routing** — track token usage per model and route based on budget
    
*   **Fallback chains** — if the primary model is down, fall back to the next
    
*   **A/B testing** — randomly route a percentage of traffic to a new model
    
*   **Latency-based routing** — choose the fastest available model
    

* * *

## Production Considerations

### Dependency Setup

```xml
<!-- OpenAI -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Anthropic -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>
```

### API Key Management

Store keys in environment variables or a secrets manager. Never commit them to source control.

```yaml
spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}
```

### Error Handling

Each model can fail independently. Wrap calls in try-catch and consider implementing circuit breakers (Resilience4j) per model to prevent cascading failures.

### Observability

Spring AI integrates with Micrometer. Each `ChatModel` emits metrics for token usage, latency, and error rates — critical when running multiple models in production.

* * *

## Key Takeaways

1.  **Multiple starters = bean conflicts.** Use `@Qualifier` with concrete model types to resolve them.
    
2.  `mutate()` **creates ephemeral ChatClient variations** without polluting the bean context.
    
3.  **OpenAI-compatible endpoints** let you use Groq, Together AI, and others through the OpenAI starter.
    
4.  **A model router** decouples model selection from business logic and enables cost-aware, fallback, and A/B strategies.
    

* * *

## Resources

*   [Spring AI Reference Documentation](https://docs.spring.io/spring-ai/reference/)
    
*   [Spring AI GitHub Repository](https://github.com/spring-projects/spring-ai)
    
*   [Groq API Documentation](https://console.groq.com/docs)
    
*   [Anthropic API Documentation](https://docs.anthropic.com/)
    

* * *

> **Spring AI Complete Course — Lecture 3 of 12**  
> Previous: [Lecture 2 — ChatClient API](#) | Next: [Lecture 4 — Advisors API](#)

* * *

*Tags: #spring-ai #java #spring-boot #ai #multi-model*
