Build models into containers¶
In this guide we will preload a LLM into a custom built Ollama serving image. You can follow the same steps for other models and other serving engines.
Define some values
export MODEL_URL=ollama://qwen2:0.5b
# Customize with your own image repo.
export IMAGE=us-central1-docker.pkg.dev/substratus-dev/default/ollama-builtin-qwen2-05b:latest
Build and push image. Note: building (downloading base image & model) and pushing (uploading image & model) can take a while depending on the size of the model.
git clone https://github.com/substratusai/kubeai
cd ./kubeai/examples/ollama-builtin
docker build --build-arg MODEL_URL=$MODEL_URL -t $IMAGE .
docker push $IMAGE
Create a model manifest & apply into a cluster with KubeAI installed. NOTE: The only difference between an built-in model image and otherwise is the addition of the image:
field.
kubectl apply -f - << EOF
apiVersion: kubeai.org/v1
kind: Model
metadata:
name: builtin-model-example
spec:
features: ["TextGeneration"]
owner: alibaba
image: $IMAGE # <-- The image with model built-in
url: "$MODEL_URL"
engine: OLlama
resourceProfile: cpu:1
EOF