在 Kubernetes 上部署文本摘要服务#

注意： Ray Serve 应用及其客户端的 Python 文件位于 ray-project/serve_config_examples 仓库中。

步骤 1：创建带有 GPU 的 Kubernetes 集群#

遵循 aws-eks-gpu-cluster.md 或 gcp-gke-gpu-cluster.md 来创建包含 1 个 CPU 节点和 1 个 GPU 节点的 Kubernetes 集群。

步骤 2：安装 KubeRay operator#

遵循本文档，使用 Helm 仓库安装最新的稳定版 KubeRay operator。

步骤 3：安装 RayService#

# Create a RayService
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.text-summarizer.yaml

在 RayService 中，head Pod 没有 tolerations。同时，worker Pod 使用以下 tolerations，这样调度器就不会将 head Pod 分配到 GPU 节点上。

# Please add the following taints to the GPU node.
tolerations:
    - key: "ray.io/node-type"
    operator: "Equal"
    value: "worker"
    effect: "NoSchedule"

步骤 4：转发 Serve 端口#

# Step 4.1: Wait until the RayService is ready to serve requests.
kubectl describe rayservices text-summarizer

# Step 4.2: Get the service name.
kubectl get services

# [Example output]
# text-summarizer-head-svc                    ClusterIP   None             <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   31s
# text-summarizer-raycluster-tb9zf-head-svc   ClusterIP   None             <none>        10001/TCP,8265/TCP,6379/TCP,8080/TCP,8000/TCP   108s
# text-summarizer-serve-svc                   ClusterIP   34.118.226.139   <none>        8000/TCP                                        31s

# Step 4.3: Forward the port of Serve.
kubectl port-forward svc/text-summarizer-serve-svc 8000

步骤 5：向文本摘要模型发送请求#

# Step 5.1: Download `text_summarizer_req.py`
curl -LO https://raw.githubusercontent.com/ray-project/serve_config_examples/master/text_summarizer/text_summarizer_req.py

# Step 5.2: Send a request to the Summarizer model.
python text_summarizer_req.py
# Check printed to console

步骤 6：删除您的服务#

kubectl delete -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.text-summarizer.yaml

步骤 7：卸载您的 KubeRay operator#

遵循本文档，使用 Helm 仓库卸载最新的稳定版 KubeRay operator。