Skip to content

Architect for Multitenancy

KubeAI can support multitenancy by filtering the models that it serves via Kubernetes label selectors. These label selectors can be applied when accessing any of the OpenAI-compatible endpoints through the X-Label-Selector HTTP header and will match on labels specified on the kind: Model objects. The pattern is similar to using a WHERE clause in a SQL query.

Example Models:

kind: Model
metadata:
  name: llama-3.2
  labels:
    tenancy: public
spec:
# ...
---
kind: Model
metadata:
  name: custom-private-model
  labels:
    tenancy: org-abc
spec:
# ...

Example HTTP requests:

# The returned list of models will be filtered.
curl http://$KUBEAI_ENDPOINT/openai/v1/models \
    -H "X-Label-Selector: tenancy in (org-abc, public)"

# When running inference, if the label selector does not match
# a 404 will be returned.
curl http://$KUBEAI_ENDPOINT/openai/v1/completions \
    -H "Content-Type: application/json" \
    -H "X-Label-Selector: tenancy in (org-abc, public)" \
    -d '{"prompt": "Hi", "model": "llama-3.2"}'

The header value can be any valid Kubernetes label selector. Some examples include:

X-Label-Selector: tenancy=org-abc
X-Label-Selector: tenancy in (org-abc, public)
X-Label-Selector: tenancy!=private

Multiple X-Label-Selector headers can be specified in the same HTTP request and will be treated as a logical AND. For example, the following request will only match Models that have a label tenant: org-abc and user: sam:

curl http://$KUBEAI_ENDPOINT/openai/v1/completions \
    -H "Content-Type: application/json" \
    -H "X-Label-Selector: tenant=org-abc" \
    -H "X-Label-Selector: user=sam" \
    -d '{"prompt": "Hi", "model": "llama-3.2"}'

Example architecture:

Multitenancy