Resource Profiles¶
A resource profile maps a type of compute resource (i.e. NVIDIA L4 GPU) to a collection of Kubernetes settings that are configured on inference server Pods. These profiles are defined in the KubeAI config.yaml
file (via a ConfigMap). Each model specifies the resource profile that it requires.
Kubernetes Model resources specify a resource profile and the count of that resource that they require (for example resourceProfile: nvidia-gpu-l4:2
- 2x L4 GPUs).
A given profile might need to contain slightly different settings based on the cluster/cloud that KubeAI is deployed in.
Example: A resource profile named nvidia-gpu-l4
might contain the following node selectors when installing KubeAI on a GKE Kubernetes cluster:
cloud.google.com/gke-accelerator: "nvidia-l4"
cloud.google.com/gke-spot: "true"
and add the following resource requests to the model server Pods:
nvidia.com/gpu: "1"
In addition to node selectors and resource requirements, a resource profile may optionally specify an image name. This name maps to the container image that will be selected when serving a model on that resource.
Next¶
Read about how to configure resource profiles.