Skip to content

Weaviate with local autoscaling embedding and generative models

Weaviate is a vector search engine that can integrate seamlessly with KubeAI's embedding and generative models. This tutorial demonstrates how to deploy both KubeAI and Weaviate in a Kubernetes cluster, using KubeAI as the OpenAI endpoint for Weaviate.

Why use KubeAI with Weaviate?

  • Security and privacy: KubeAI runs locally in your Kubernetes cluster, so your data never leaves your infrastructure.
  • Cost savings: KubeAI can run on your existing hardware, reducing the need for paying for embeddings and generative models.

This tutorial uses CPU only models, so it should work even on your laptop.

As you go go through this tutorial, you will learn how to:

  • Deploy KubeAI with embedding and generative models
  • Install Weaviate and connect it to KubeAI
  • Import data into Weaviate
  • Perform semantic search using the embedding model
  • Perform generative search using the generative model

Prerequisites

A Kubernetes cluster. You can use kind or minikube.

kind create cluster

KubeAI Configuration

Let's start by deploying KubeAI with the models we want to use. Nomic embedding model is used instead of text-embedding-ada-002. Gemma 2 2B is used instead of gpt-3.5-turbo. You could choose to use bigger models depending on your available hardware.

Create a file named kubeai-model-values.yaml with the following content:

catalog:
  text-embedding-ada-002:
    enabled: true
    minReplicas: 1
    features: ["TextEmbedding"]
    owner: nomic
    url: "ollama://nomic-embed-text"
    engine: OLlama
    resourceProfile: cpu:1
  gpt-3.5-turbo:
    enabled: true
    minReplicas: 1
    features: ["TextGeneration"]
    owner: google
    url: "ollama://gemma2:2b"
    engine: OLlama
    resourceProfile: cpu:2

Note: It's important that you name the models as text-embedding-ada-002 and gpt-3.5-turbo as Weaviate expects these names.

Run the following command to deploy KubeAI and install the configured models:

helm repo add kubeai https://www.kubeai.org && helm repo update

helm install kubeai kubeai/kubeai

helm install kubeai-models kubeai/models \
    -f ./kubeai-model-values.yaml

Weaviate Installation

For this tutorial, we will use the Weaviate Helm chart to deploy Weaviate.

Let's enable the text2vec-openai and generative-openai modules in Weaviate. We will also set the default vectorizer module to text2vec-openai.

The apiKey is ignored in this case as we are using KubeAI as the OpenAI endpoint.

Create a file named weaviate-values.yaml with the following content:

modules:
  text2vec-openai:
    enabled: true
    apiKey: thisIsIgnored
  generative-openai:
    enabled: true
    apiKey: thisIsIgnored
  default_vectorizer_module: text2vec-openai
service:
  # To prevent Weaviate being exposed publicly
  type: ClusterIP

Install Weaviate by running the following command:

helm repo add weaviate https://weaviate.github.io/weaviate-helm && helm repo update

helm install \
  "weaviate" \
  weaviate/weaviate \
  -f weaviate-values.yaml

Usage

We will be using Python to interact with Weaviate. The 2 use cases we will cover are: - Semantic search using the embedding model - Generative search using the generative model

Connectivity

The remaining steps require connectivity to the Weaviate service. However, Weaviate is not exposed publicly in this setup. So we setup a local port forwards to access the Weaviate services.

Setup a local port forwards to the Weaviate services by running:

kubectl port-forward svc/weaviate 8080:80
kubectl port-forward svc/weaviate-grpc 50051:50051

Weaviate client Python Setup

Create a virtual environment and install the Weaviate client:

python -m venv .venv
source .venv/bin/activate
pip install -U weaviate-client requests

Collection and Data Import

Create a file named create-collection.py with the following content:

import json
import weaviate
import requests
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:

    client.collections.create(
        "Question",
        vectorizer_config=Configure.Vectorizer.text2vec_openai(
                model="text-embedding-ada-002",
                base_url="http://kubeai/openai",
        ),
        generative_config=Configure.Generative.openai(
            model="gpt-3.5-turbo",
            base_url="http://kubeai/openai",
        ),
    )

    # import data
    resp = requests.get('https://raw.githubusercontent.com/weaviate-tutorials/quickstart/main/data/jeopardy_tiny.json')
    data = json.loads(resp.text)  # Load data

    question_objs = list()
    for i, d in enumerate(data):
        question_objs.append({
            "answer": d["Answer"],
            "question": d["Question"],
            "category": d["Category"],
        })

    questions = client.collections.get("Question")
    questions.data.insert_many(question_objs)
    print("Data imported successfully")

Create a collection that uses KubeAI as the openAI endpoint:

python create-collection.py
You should see a message Data imported successfully.

The collection is now created and data is imported. The vectors are generated by KubeAI and stored in Weaviate.

Now let's do semantic search, which uses the embeddings. Create a file named search.py with the following content:

import weaviate
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:
    questions = client.collections.get("Question")
    response = questions.query.near_text(
        query="biology",
        limit=2
    )
    print(response.objects[0].properties)  # Inspect the first object

Execute the python script:

python search.py

You should see the following output:

{
  "answer": "DNA",
  "question": "In 1953 Watson & Crick built a model of the molecular structure of this, the gene-carrying substance",
  "category": "SCIENCE"
}

Generative Search (RAG)

Now let's do generative search, which uses the generative model (Text generation LLM). The generative model is run locally and managed by KubeAI.

Create a file named generate.py with the following content:

import weaviate
from weaviate.classes.config import Configure

# This works due to port forward in previous step
with weaviate.connect_to_local(port=8080, grpc_port=50051) as client:
    questions = client.collections.get("Question")

    response = questions.generate.near_text(
        query="biology",
        limit=2,
        grouped_task="Write a tweet with emojis about these facts."
    )

    print(response.generated)  # Inspect the generated text

Run the python script:

python generate.py

You should see something similar to this:

🧬 Watson & Crick cracked the code in 1953! 🤯 They built a model of DNA, the blueprint of life. 🧬
🧠 Liver power! 💪 This organ keeps your blood sugar balanced by storing glucose as glycogen. 🩸 #ScienceFacts #Biology

Conclusion

You've now successfully set up KubeAI with Weaviate for both embedding-based semantic search and generative tasks. You've also learned how to import data, perform searches, and generate content using KubeAI-managed models.