Skip to content

Kubernetes API

Packages

kubeai.org/v1

Package v1 contains API Schema definitions for the kubeai v1 API group

Resource Types

Model

Model resources define the ML models that will be served by KubeAI.

Field Description Default Validation
apiVersion string kubeai.org/v1
kind string Model
metadata ObjectMeta Refer to Kubernetes API documentation for fields of metadata.
spec ModelSpec
status ModelStatus

ModelFeature

Underlying type: string

Validation: - Enum: [TextGeneration TextEmbedding SpeechToText]

Appears in: - ModelSpec

ModelSpec

ModelSpec defines the desired state of Model.

Appears in: - Model

Field Description Default Validation
url string URL of the model to be served.
Currently only the following formats are supported:
For VLLM & FasterWhisper engines: "hf:///"
For OLlama engine: "ollama://
Required: {}
features ModelFeature array Features that the model supports.
Dictates the APIs that are available for the model.
Enum: [TextGeneration TextEmbedding SpeechToText]
engine string Engine to be used for the server process. Enum: [OLlama VLLM FasterWhisper Infinity]
Required: {}
resourceProfile string ResourceProfile required to serve the model.
Use the format ":".
Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs.
Must be a valid ResourceProfile defined in the system config.
cacheProfile string CacheProfile to be used for caching model artifacts.
Must be a valid CacheProfile defined in the system config.
image string Image to be used for the server process.
Will be set from ResourceProfile + Engine if not specified.
args string array Args to be added to the server process.
env object (keys:string, values:string) Env variables to be added to the server process.
replicas integer Replicas is the number of Pod replicas that should be actively
serving the model. KubeAI will manage this field unless AutoscalingDisabled
is set to true.
minReplicas integer MinReplicas is the minimum number of Pod replicas that the model can scale down to.
Note: 0 is a valid value.
Minimum: 0
Optional: {}
maxReplicas integer MaxReplicas is the maximum number of Pod replicas that the model can scale up to.
Empty value means no limit.
Minimum: 1
autoscalingDisabled boolean AutoscalingDisabled will stop the controller from managing the replicas
for the Model. When disabled, metrics will not be collected on server Pods.
targetRequests integer TargetRequests is average number of active requests that the autoscaler
will try to maintain on model server Pods.
100 Minimum: 1
scaleDownDelaySeconds integer ScaleDownDelay is the minimum time before a deployment is scaled down after
the autoscaling algorithm determines that it should be scaled down.
30
owner string Owner of the model. Used solely to populate the owner field in the
OpenAI /v1/models endpoint.
DEPRECATED.
Optional: {}

ModelStatus

ModelStatus defines the observed state of Model.

Appears in: - Model

Field Description Default Validation
replicas ModelStatusReplicas
cache ModelStatusCache

ModelStatusCache

Appears in: - ModelStatus

Field Description Default Validation
loaded boolean

ModelStatusReplicas

Appears in: - ModelStatus

Field Description Default Validation
all integer
ready integer