在 Kubernetes 上部署 StableDiffusion 文生图模型#

注意：Ray Serve 应用及其客户端的 Python 文件位于 ray-project/serve_config_examples 仓库和 Ray 文档中。

步骤 1：创建一个带有 GPU 的 Kubernetes 集群#

按照 aws-eks-gpu-cluster.md 或 gcp-gke-gpu-cluster.md 的说明，创建一个包含 1 个 CPU 节点和 1 个 GPU 节点的 Kubernetes 集群。

步骤 2：安装 KubeRay operator#

按照本文档使用 Helm 仓库安装最新的稳定版 KubeRay operator。请注意，本示例中的 YAML 文件使用 serveConfigV2。此功能需要 KubeRay v0.6.0 或更高版本。

步骤 3：安装 RayService#

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.stable-diffusion.yaml

此 RayService 配置包含一些重要设置

在 RayService 中，head Pod 没有 tolerations。同时，worker Pod 使用以下 tolerations，这样调度器就不会将 head Pod 分配到 GPU 节点上。

# Please add the following taints to the GPU node.
tolerations:
    - key: "ray.io/node-type"
    operator: "Equal"
    value: "worker"
    effect: "NoSchedule"

它在 runtime_env 中包含了 diffusers，因为这个包默认不包含在 ray-ml 镜像中。

步骤 4：转发 Serve 的端口#

首先通过此命令获取服务名称。

kubectl get services

然后，将端口转发到 serve。

# Wait until the RayService `Ready` condition is `True`. This means the RayService is ready to serve.
kubectl describe rayservices.ray.io stable-diffusion

# [Example output]
#   Conditions:
#     Last Transition Time:  2025-02-13T07:10:34Z
#     Message:               Number of serve endpoints is greater than 0
#     Observed Generation:   1
#     Reason:                NonZeroServeEndpoints
#     Status:                True
#     Type:                  Ready

# Forward the port of Serve
kubectl port-forward svc/stable-diffusion-serve-svc 8000

步骤 5：向文生图模型发送请求#

# Step 5.1: Download `stable_diffusion_req.py`
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/stable_diffusion/stable_diffusion_req.py

# Step 5.2: Set your `prompt` in `stable_diffusion_req.py`.

# Step 5.3: Send a request to the Stable Diffusion model.
python stable_diffusion_req.py
# Check output.png

您可以参考文档 “服务 Stable Diffusion 模型” 以获取示例输出图像。