Kubernetes API¶
Packages¶
kubeai.org/v1¶
Package v1 contains API Schema definitions for the kubeai v1 API group
Resource Types¶
Adapter¶
Appears in: - ModelSpec
Field | Description | Default | Validation |
---|---|---|---|
name string |
Name must be a lowercase string with no spaces. | MaxLength: 63 Pattern: ^[a-z0-9-]+$ Required: {} |
|
url string |
LoadBalancing¶
Appears in: - ModelSpec
Field | Description | Default | Validation |
---|---|---|---|
strategy LoadBalancingStrategy |
LeastLoad | Enum: [LeastLoad PrefixHash] Optional: {} |
|
prefixHash PrefixHash |
{ } | Optional: {} |
LoadBalancingStrategy¶
Underlying type: string
Validation: - Enum: [LeastLoad PrefixHash]
Appears in: - LoadBalancing
Field | Description |
---|---|
LeastLoad |
|
PrefixHash |
Model¶
Model resources define the ML models that will be served by KubeAI.
Field | Description | Default | Validation |
---|---|---|---|
apiVersion string |
kubeai.org/v1 |
||
kind string |
Model |
||
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
||
spec ModelSpec |
|||
status ModelStatus |
ModelFeature¶
Underlying type: string
Validation: - Enum: [TextGeneration TextEmbedding SpeechToText]
Appears in: - ModelSpec
ModelSpec¶
ModelSpec defines the desired state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
url string |
URL of the model to be served. Currently the following formats are supported: For VLLM, FasterWhisper, Infinity engines: "hf:// "pvc:// "pvc:// "gs:// "oss:// "s3:// For OLlama engine: "ollama:// |
Required: {} |
|
adapters Adapter array |
|||
features ModelFeature array |
Features that the model supports. Dictates the APIs that are available for the model. |
Enum: [TextGeneration TextEmbedding SpeechToText] |
|
engine string |
Engine to be used for the server process. | Enum: [OLlama VLLM FasterWhisper Infinity] Required: {} |
|
resourceProfile string |
ResourceProfile required to serve the model. Use the format " Example: "nvidia-gpu-l4:2" - 2x NVIDIA L4 GPUs. Must be a valid ResourceProfile defined in the system config. |
||
cacheProfile string |
CacheProfile to be used for caching model artifacts. Must be a valid CacheProfile defined in the system config. |
||
image string |
Image to be used for the server process. Will be set from ResourceProfile + Engine if not specified. |
||
args string array |
Args to be added to the server process. | ||
env object (keys:string, values:string) |
Env variables to be added to the server process. | ||
replicas integer |
Replicas is the number of Pod replicas that should be actively serving the model. KubeAI will manage this field unless AutoscalingDisabled is set to true. |
||
minReplicas integer |
MinReplicas is the minimum number of Pod replicas that the model can scale down to. Note: 0 is a valid value. |
Minimum: 0 Optional: {} |
|
maxReplicas integer |
MaxReplicas is the maximum number of Pod replicas that the model can scale up to. Empty value means no limit. |
Minimum: 1 |
|
autoscalingDisabled boolean |
AutoscalingDisabled will stop the controller from managing the replicas for the Model. When disabled, metrics will not be collected on server Pods. |
||
targetRequests integer |
TargetRequests is average number of active requests that the autoscaler will try to maintain on model server Pods. |
100 | Minimum: 1 |
scaleDownDelaySeconds integer |
ScaleDownDelay is the minimum time before a deployment is scaled down after the autoscaling algorithm determines that it should be scaled down. |
30 | |
owner string |
Owner of the model. Used solely to populate the owner field in the OpenAI /v1/models endpoint. DEPRECATED. |
Optional: {} |
|
loadBalancing LoadBalancing |
LoadBalancing configuration for the model. If not specified, a default is used based on the engine and request. |
{ } |
ModelStatus¶
ModelStatus defines the observed state of Model.
Appears in: - Model
Field | Description | Default | Validation |
---|---|---|---|
replicas ModelStatusReplicas |
|||
cache ModelStatusCache |
ModelStatusCache¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
loaded boolean |
ModelStatusReplicas¶
Appears in: - ModelStatus
Field | Description | Default | Validation |
---|---|---|---|
all integer |
|||
ready integer |
PrefixHash¶
Appears in: - LoadBalancing
Field | Description | Default | Validation |
---|---|---|---|
meanLoadFactor integer |
MeanLoadPercentage is the percentage that any given endpoint's load must not exceed over the mean load of all endpoints in the hash ring. Defaults to 125% which is a widely accepted value for the Consistent Hashing with Bounded Loads algorithm. |
125 | Minimum: 100 Optional: {} |
replication integer |
Replication is the number of replicas of each endpoint on the hash ring. Higher values will result in a more even distribution of load but will decrease lookup performance. |
20 | Optional: {} |
prefixCharLength integer |
PrefixCharLength is the number of characters to count when building the prefix to hash. | 100 | Optional: {} |