Blossoming Intelligence: How to Run Spring AI Locally with Ollama

#spring #ai #ollama #llama3

Nobody can dispute that AI is here to stay. Among many of its benefits, developers are using its capability to boost their productivity.
It is also planned to become accessible for a fee as a SaaS or any other service once it has gained the necessary trust from enterprises.
Still, We can run pre-trained models locally and incorporate them into our current app.

In this short article, we'll look at how easy it is to create a chat bot backend powered by Spring and Olama using the llama 3 model.

TechStack

This project is built using:

Java 21.
Spring boot 3.2.5 with WebFlux.
Spring AI 3.2.5.
Ollama 0.1.36.

Ollama Setup

To install Ollama locally, you simply need to head to https://ollama.com/download and install it using the proper executable to your OS.

You check is installed by running the following command:

ollama --version

You can directly pull a model from Ollama Models) and run it using the ollama cli, in my case I used the llama3 model:

ollama pull llama3 # Should take a while.
ollama run llama3

Let's test it out with a simple prompt:

To exist, use the command:

/bye

Talking Spring

The Spring will have the following properties:

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.model=llama3

Then is our chat package, will have a chat config bean to handle:

package io.daasrattale.generalknowledge.chat;

import org.springframework.ai.chat.ChatClient;
import org.springframework.ai.ollama.OllamaChatClient;
import org.springframework.ai.ollama.api.OllamaApi;
import org.springframework.ai.ollama.api.OllamaOptions;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class ChatConfig {


    @Bean
    public ChatClient chatClient() {
        return new OllamaChatClient(new OllamaApi())
                .withDefaultOptions(OllamaOptions.create()
                        .withTemperature(0.9f));
    }
}

FYI, Model temperature is a parameter that controls how random a language model's output is.
A temperature is set to 0.9 to make the model more random and willing to take more risks on the answers.

The last step is to create a simple Chat rest controller:

package io.daasrattale.generalknowledge.chat;

import org.springframework.ai.ollama.OllamaChatClient;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import reactor.core.publisher.Mono;

@RestController
@RequestMapping("/v1/chat")
public class ChatController {


    private final OllamaChatClient chatClient;

    public ChatController(OllamaChatClient chatClient) {
        this.chatClient = chatClient;
    }

    @GetMapping
    public Mono<ResponseEntity<String>> generate(@RequestParam(defaultValue = "Tell me to add a proper prompt in a funny way") String prompt) {
        return Mono.just(
                ResponseEntity.ok(chatClient.call(prompt))
        );
    }


}

Let's try and call a GET /v1/chat with an empty prompt: