Skip to content

Kubernetes API

Packages

kubeai.org/v1

Package v1 contains API Schema definitions for the kubeai v1 API group

Resource Types

Adapter

Appears in: - ModelSpec

Field Description Default Validation
name string Name must be a lowercase string with no spaces. MaxLength: 63
Pattern: ^[a-z0-9-]+$
Required: {}
url string

LoadBalancing

Appears in: - ModelSpec

Field Description Default Validation
strategy LoadBalancingStrategy LeastLoad Enum: [LeastLoad PrefixHash]
Optional: {}
prefixHash PrefixHash { } Optional: {}

LoadBalancingStrategy

Underlying type: string

Validation: - Enum: [LeastLoad PrefixHash]

Appears in: - LoadBalancing

Field Description
LeastLoad
PrefixHash

Model

Model resources define the ML models that will be served by KubeAI.

Field Description Default Validation
apiVersion string kubeai.org/v1
kind string Model
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec ModelSpec
status ModelStatus

ModelFeature

Underlying type: string

Validation: - Enum: [TextGeneration TextEmbedding SpeechToText]

Appears in: - ModelSpec

ModelSpec

ModelSpec defines the desired state of Model.

Appears in: - Model

Field Description Default Validation
url string URL of the model to be served.
Currently the following formats are supported:

For VLLM, FasterWhisper, Infinity engines:

"hf:///"
"pvc://"
"pvc:///"
"gs:///" (only with cacheProfile)
"oss:///" (only with cacheProfile)
"s3:///" (only with cacheProfile)

For OLlama engine:

"ollama://"
Required: {}
adapters Adapter array
features ModelFeature array Features that the model supports.
Dictates the APIs that are available for the model.
Enum: [TextGeneration TextEmbedding SpeechToText]
engine string Engine to be used for the server process. Enum: [OLlama VLLM FasterWhisper Infinity]
Required: {}
resourceProfile string ResourceProfile required to serve the model.
Use the format ":".
Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs.
Must be a valid ResourceProfile defined in the system config.
cacheProfile string CacheProfile to be used for caching model artifacts.
Must be a valid CacheProfile defined in the system config.
image string Image to be used for the server process.
Will be set from ResourceProfile + Engine if not specified.
args string array Args to be added to the server process.
env object (keys:string, values:string) Env variables to be added to the server process.
replicas integer Replicas is the number of Pod replicas that should be actively
serving the model. KubeAI will manage this field unless AutoscalingDisabled
is set to true.
minReplicas integer MinReplicas is the minimum number of Pod replicas that the model can scale down to.
Note: 0 is a valid value.
Minimum: 0
Optional: {}
maxReplicas integer MaxReplicas is the maximum number of Pod replicas that the model can scale up to.
Empty value means no limit.
Minimum: 1
autoscalingDisabled boolean AutoscalingDisabled will stop the controller from managing the replicas
for the Model. When disabled, metrics will not be collected on server Pods.
targetRequests integer TargetRequests is average number of active requests that the autoscaler
will try to maintain on model server Pods.
100 Minimum: 1
scaleDownDelaySeconds integer ScaleDownDelay is the minimum time before a deployment is scaled down after
the autoscaling algorithm determines that it should be scaled down.
30
owner string Owner of the model. Used solely to populate the owner field in the
OpenAI /v1/models endpoint.
DEPRECATED.
Optional: {}
loadBalancing LoadBalancing LoadBalancing configuration for the model.
If not specified, a default is used based on the engine and request.
{ }

ModelStatus

ModelStatus defines the observed state of Model.

Appears in: - Model

Field Description Default Validation
replicas ModelStatusReplicas
cache ModelStatusCache

ModelStatusCache

Appears in: - ModelStatus

Field Description Default Validation
loaded boolean

ModelStatusReplicas

Appears in: - ModelStatus

Field Description Default Validation
all integer
ready integer

PrefixHash

Appears in: - LoadBalancing

Field Description Default Validation
meanLoadFactor integer MeanLoadPercentage is the percentage that any given endpoint's load must not exceed
over the mean load of all endpoints in the hash ring. Defaults to 125% which is
a widely accepted value for the Consistent Hashing with Bounded Loads algorithm.
125 Minimum: 100
Optional: {}
replication integer Replication is the number of replicas of each endpoint on the hash ring.
Higher values will result in a more even distribution of load but will
decrease lookup performance.
20 Optional: {}
prefixCharLength integer PrefixCharLength is the number of characters to count when building the prefix to hash. 100 Optional: {}