Kubernetes API¶
Packages¶
kubeai.org/v1¶
Package v1 contains API Schema definitions for the kubeai v1 API group
Resource Types¶
Model¶
Model resources define the ML models that will be served by KubeAI.
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
kubeai.org/v1 |
||
kind string |
Model |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec ModelSpec |
|||
status ModelStatus |
ModelFeature¶
Underlying type: string
Validation: - Enum: [TextGeneration TextEmbedding SpeechToText]
Appears in: - ModelSpec
ModelSpec¶
ModelSpec defines the desired state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
url string |
URL of the model to be served. Currently only the following formats are supported: For VLLM & FasterWhisper engines: "hf:// For OLlama engine: "ollama:// |
Required: {} |
|
features ModelFeature array |
Features that the model supports. Dictates the APIs that are available for the model. |
Enum: [TextGeneration TextEmbedding SpeechToText] |
|
engine string |
Engine to be used for the server process. | Enum: [OLlama VLLM FasterWhisper Infinity] Required: {} |
|
resourceProfile string |
ResourceProfile required to serve the model. Use the format " Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs. Must be a valid ResourceProfile defined in the system config. |
||
cacheProfile string |
CacheProfile to be used for caching model artifacts. Must be a valid CacheProfile defined in the system config. |
||
image string |
Image to be used for the server process. Will be set from ResourceProfile + Engine if not specified. |
||
args string array |
Args to be added to the server process. | ||
env object (keys:string, values:string) |
Env variables to be added to the server process. | ||
replicas integer |
Replicas is the number of Pod replicas that should be actively serving the model. KubeAI will manage this field unless AutoscalingDisabled is set to true. |
||
minReplicas integer |
MinReplicas is the minimum number of Pod replicas that the model can scale down to. Note: 0 is a valid value. |
Minimum: 0 Optional: {} |
|
maxReplicas integer |
MaxReplicas is the maximum number of Pod replicas that the model can scale up to. Empty value means no limit. |
Minimum: 1 |
|
autoscalingDisabled boolean |
AutoscalingDisabled will stop the controller from managing the replicas for the Model. When disabled, metrics will not be collected on server Pods. |
||
targetRequests integer |
TargetRequests is average number of active requests that the autoscaler will try to maintain on model server Pods. |
100 | Minimum: 1 |
scaleDownDelaySeconds integer |
ScaleDownDelay is the minimum time before a deployment is scaled down after the autoscaling algorithm determines that it should be scaled down. |
30 | |
owner string |
Owner of the model. Used solely to populate the owner field in the OpenAI /v1/models endpoint. DEPRECATED. |
Optional: {} |
ModelStatus¶
ModelStatus defines the observed state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
replicas ModelStatusReplicas |
|||
cache ModelStatusCache |
ModelStatusCache¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
loaded boolean |
ModelStatusReplicas¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
all integer |
|||
ready integer |