Skip to content

Configure Embedding Models

KubeAI supports the following engines for text embedding models:

  • Infinity
  • vLLM
  • Ollama

Infinity supports any HuggingFace models listed as text-embedding. See the models, reranking or clip models on huggingface for reference.

Install BAAI/bge-small-en-v1.5 model using Infinity

Create a file named kubeai-models.yaml with the following content:

catalog:
  bge-embed-text-cpu:
    enabled: true
    features: ["TextEmbedding"]
    owner: baai
    url: "hf://BAAI/bge-small-en-v1.5"
    engine: Infinity
    resourceProfile: cpu:1
    minReplicas: 1

Apply the kubeai-models helm chart:

helm install kubeai-models kubeai/models -f ./kubeai-models.yaml

Once the pod is ready, you can use the OpenAI Python SDK to interact with the model:

from openai import OpenAI
# Assumes port-forward of kubeai service to localhost:8000.
client = OpenAI(api_key="ignored", base_url="http://localhost:8000/openai/v1")
response = client.embeddings.create(
    input="Your text goes here.",
    model="bge-embed-text-cpu"
)