Skip to main content

Command Palette

Search for a command to run...

Working with Multiple AI Models in Spring AI — @Qualifier, mutate(), and Model Routing

Updated
5 min read
K

I am a developer who loves Java, Spring, Quarkus, Micronaut, Open source, Microservices, Cloud

Working with Multiple AI Models in Spring AI

Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API

In production AI applications, you rarely use just one model. GPT-4o handles creative generation, Claude excels at reasoning, and Groq delivers sub-second inference with open-source models. Spring AI makes it straightforward to wire all of them into a single Spring Boot application using patterns you already know.

This article covers the @Qualifier pattern for multi-model configuration, the mutate() pattern for runtime flexibility, OpenAI-compatible endpoints for providers like Groq, and a model router strategy for production systems.


The Problem: Bean Conflicts with Multiple Starters

When you add a single Spring AI starter — say spring-ai-openai-spring-boot-starter — Spring Boot auto-configures an OpenAiChatModel bean that implements the ChatModel interface. A ChatClient is built on top of it. Everything works.

The moment you add a second starter like spring-ai-anthropic-spring-boot-starter, you now have two ChatModel beans in the application context: OpenAiChatModel and AnthropicChatModel. Any injection point that asks for ChatModel by type will fail with:

NoUniqueBeanDefinitionException: No qualifying bean of type 'ChatModel':
expected single matching bean but found 2: openAiChatModel, anthropicChatModel

This is standard Spring behavior — nothing specific to Spring AI. But it means you need explicit configuration.


Solution: The @Qualifier Pattern

The cleanest approach is to create dedicated ChatClient beans that inject the concrete model types directly.

Configuration Class

@Configuration
public class AiConfig {

    @Bean
    public ChatClient openAiChatClient(OpenAiChatModel openAiChatModel) {
        return ChatClient.builder(openAiChatModel)
                .defaultSystem("You are a creative writing assistant.")
                .build();
    }

    @Bean
    public ChatClient anthropicChatClient(AnthropicChatModel anthropicChatModel) {
        return ChatClient.builder(anthropicChatModel)
                .defaultSystem("You are a precise summarization engine.")
                .build();
    }
}

By injecting OpenAiChatModel and AnthropicChatModel directly (concrete types, not the ChatModel interface), there's zero ambiguity. Each ChatClient bean gets its own model and system prompt.

Using Qualified Beans in Services

@Service
public class AiService {

    private final ChatClient openAiClient;
    private final ChatClient anthropicClient;

    public AiService(
            @Qualifier("openAiChatClient") ChatClient openAiClient,
            @Qualifier("anthropicChatClient") ChatClient anthropicClient) {
        this.openAiClient = openAiClient;
        this.anthropicClient = anthropicClient;
    }

    public String generateCreativeContent(String prompt) {
        return openAiClient.prompt().user(prompt).call().content();
    }

    public String summarize(String text) {
        return anthropicClient.prompt().user(text).call().content();
    }
}

The @Qualifier value matches the bean method name. Each method delegates to the right model for its task.


The mutate() Pattern: Runtime Flexibility

Sometimes you need a variation of an existing ChatClient — different system prompt, different temperature — without creating a new bean. The mutate() method returns a new builder pre-filled with the current client's configuration.

ChatClient customClient = openAiClient.mutate()
        .defaultSystem("You are a technical documentation writer.")
        .build();

String result = customClient.prompt()
        .user("Explain the Circuit Breaker pattern")
        .call()
        .content();

Key characteristics:

  • The original ChatClient is not modifiedmutate() creates a copy
  • The returned builder inherits all defaults (model, system prompt, advisors)
  • You override only what you need
  • The mutated client is typically ephemeral — used and discarded

This is particularly useful when you have a base client configured with retry logic, rate limiting, and observability, and you need task-specific variations at runtime.


OpenAI-Compatible Endpoints: One Starter, Many Providers

Many AI providers expose OpenAI-compatible REST APIs: Groq, Together AI, Ollama, Perplexity, and others. Instead of adding a separate starter for each, you can reuse the OpenAI starter and override the base URL.

Application Properties

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      chat:
        options:
          model: gpt-4o

Programmatic Groq Client via OpenAI API

@Bean
public ChatClient groqChatClient(
        @Value("${spring.ai.openai-groq.api-key}") String apiKey,
        @Value("${spring.ai.openai-groq.base-url}") String baseUrl) {

    var openAiApi = new OpenAiApi(baseUrl, apiKey);
    var chatOptions = OpenAiChatOptions.builder()
            .model("llama-3.3-70b-versatile")
            .build();
    var chatModel = new OpenAiChatModel(openAiApi, chatOptions);

    return ChatClient.builder(chatModel)
            .defaultSystem("You are a fast inference assistant.")
            .build();
}

With custom properties in application.yml:

spring:
  ai:
    openai-groq:
      base-url: https://api.groq.com/openai/v1
      api-key: ${GROQ_API_KEY}

Now you have OpenAI, Anthropic, and Groq in one application — three models, three ChatClients, one codebase.


Model Router: Production-Grade Selection

Hardcoding model selection in your service layer doesn't scale. A router pattern decouples the "which model" decision from business logic.

@Service
public class ModelRouter {

    private final Map<String, ChatClient> clients;

    public ModelRouter(
            @Qualifier("openAiChatClient") ChatClient openAi,
            @Qualifier("anthropicChatClient") ChatClient anthropic,
            @Qualifier("groqChatClient") ChatClient groq) {
        this.clients = Map.of(
            "creative", openAi,
            "reasoning", anthropic,
            "fast", groq
        );
    }

    public ChatClient route(String taskType) {
        return clients.getOrDefault(taskType, clients.get("fast"));
    }
}

Usage becomes trivial:

ChatClient client = modelRouter.route("creative");
String result = client.prompt().user(prompt).call().content();

You can extend this pattern with:

  • Cost-aware routing — track token usage per model and route based on budget
  • Fallback chains — if the primary model is down, fall back to the next
  • A/B testing — randomly route a percentage of traffic to a new model
  • Latency-based routing — choose the fastest available model

Production Considerations

Dependency Setup

<!-- OpenAI -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Anthropic -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>

API Key Management

Store keys in environment variables or a secrets manager. Never commit them to source control.

spring:
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
    anthropic:
      api-key: ${ANTHROPIC_API_KEY}

Error Handling

Each model can fail independently. Wrap calls in try-catch and consider implementing circuit breakers (Resilience4j) per model to prevent cascading failures.

Observability

Spring AI integrates with Micrometer. Each ChatModel emits metrics for token usage, latency, and error rates — critical when running multiple models in production.


Key Takeaways

  1. Multiple starters = bean conflicts. Use @Qualifier with concrete model types to resolve them.
  2. mutate() creates ephemeral ChatClient variations without polluting the bean context.
  3. OpenAI-compatible endpoints let you use Groq, Together AI, and others through the OpenAI starter.
  4. A model router decouples model selection from business logic and enables cost-aware, fallback, and A/B strategies.

Resources


Spring AI Complete Course — Lecture 3 of 12
Previous: Lecture 2 — ChatClient API | Next: Lecture 4 — Advisors API


Tags: #spring-ai #java #spring-boot #ai #multi-model

More from this blog

C

Coding Saint - Simple Short Tutorials

55 posts

I am Kumar Pallav, a passionate programmer.I love java, open source & microservices . I create Simple , Short Tutorials Follow me at https://twitter.com/kumar_pallav