Backend Servers¶

KubeAI serves ML models by launching Pods on Kubernetes. The configuration and lifecycle of these Pods are managed by the KubeAI controller. Every model server Pod loads exactly one model on startup.

In a Model manifest you can define what server to use for inference (VLLM, OLlama). Any model-specific settings can be passed to the server process via the args and env fields.