RayService 零停机增量升级#

本指南详细介绍了如何为 KubeRay 中的 RayService 配置和使用 NewClusterWithIncrementalUpgrade 策略。此功能在 Ray 增强提案 (REP) 中提出，并在 KubeRay v1.5.1 中以 alpha 支持实现。如果您不熟悉 RayServices 和 KubeRay，请参阅 RayService 快速入门。

在 KubeRay 的早期版本中，零停机升级仅通过 NewCluster 策略支持。此升级策略涉及扩展一个待定的 RayCluster，使其容量等于活动集群，然后等待更新后的 Serve 应用程序健康，最后将流量切换到新的 RayCluster。虽然此升级策略可靠，但它要求用户扩展 200% 的原始集群计算资源，这在处理昂贵的加速器资源时可能令人望而却步。

NewClusterWithIncrementalUpgrade 策略专为大规模部署而设计，例如 LLM 服务，在这种情况下，由于资源限制，标准的蓝绿部署无法复制资源。此功能在 RayService CR 升级期间最大限度地减少资源使用，同时保持服务可用性。下面我们解释其设计和用法。

该策略不创建 100% 容量的新 RayCluster，而是创建一个新集群并逐渐增加其容量，同时将用户流量从旧集群转移到新集群。这种渐进式的流量迁移使用户能够安全地扩展其更新的 RayService，而旧集群会自动缩小，从而使用户能够节省昂贵的计算资源并更好地控制升级的节奏。此过程依赖于 Kubernetes Gateway API 进行精细的流量拆分。

快速入门：执行增量升级#

1. 先决条件#

在使用此功能之前，您的 Kubernetes 集群中 **必须** 设置好以下内容：

Gateway API CRD：必须安装 K8s Gateway API 资源。通常可以使用以下命令安装：
```
kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.4.0/standard-install.yaml
```
RayService 控制器利用 GA Gateway API 资源，例如 Gateway 和 HTTPRoute，以在升级期间安全地拆分流量。
Gateway 控制器：用户必须安装一个实现 Gateway API 的 Gateway 控制器，例如 Istio、Contour 或像 GKE 的 Gateway 控制器这样的云原生实现。此功能应支持任何实现 Gateway API 并支持 Gateway 和 HTTPRoute CRD 的控制器，但这是一个 alpha 功能，主要已通过 Istio 进行测试。
GatewayClass 资源：您的集群管理员必须创建一个 GatewayClass 资源，该资源定义了要使用的控制器。KubeRay 将使用它来创建 Gateway 和 HTTPRoute 对象。

示例：Istio GatewayClass
```
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
    name: istio
spec:
    controllerName: istio.io/gateway-controller
```
您需要在 RayService spec 的 gatewayClassName 字段中使用 metadata.name（例如 istio）。
Ray Autoscaler：增量升级要求在您的 RayCluster spec 中启用 Ray Autoscaler，因为 KubeRay 通过调整 Ray Serve 的 target_capacity 来管理升级，该 target_capacity 会调整每个部署的 Serve 副本数量。这些 Serve 副本会转换为资源负载，Ray Autoscaler 在确定 KubeRay 提供的 Pod 数量时会考虑此负载。有关在 Kubernetes 上启用和配置 Ray Autoscaling 的信息，请参阅 KubeRay Autoscaling。

示例：在 kind 上设置 RayService：#

以下说明详细介绍了在 KubeRay 中配置集群并触发 RayService 的零停机增量升级的最小步骤。

创建 kind 集群

kind create cluster --image=kindest/node:v1.29.0

我们使用 v1.29.0，已知该版本与新版 Istio 兼容。

安装 istio

istioctl install --set profile=demo -y

安装 Gateway API CRD

kubectl apply --server-side -f https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.3.0/standard-install.yaml

使用以下 spec 创建 Gateway 类：

echo "apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: istio
spec:
  controllerName: istio.io/gateway-controller" | kubectl apply -f -

kubectl get gatewayclass
NAME           CONTROLLER                    ACCEPTED   AGE
istio          istio.io/gateway-controller   True       4s
istio-remote   istio.io/unmanaged-gateway    True       3s

为 kind 上的 LoadBalancer 安装和配置 MetalLB

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.14.7/config/manifests/metallb-native.yaml

为 MetalLB 创建具有以下 spec 的 IPAddressPool：

echo "apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: kind-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.8.200-192.168.8.250 # adjust based on your subnets range
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: metallb-system
spec:
  ipAddressPools:
  - kind-pool" | kubectl apply -f -

按照这些说明安装 KubeRay 操作员。本指南的最低版本为 v1.5.1。要使用此功能，必须启用 RayServiceIncrementalUpgrade 功能门。在安装 KubeRay 操作员时启用功能门，请运行以下命令：

helm install kuberay-operator kuberay/kuberay-operator --version v1.5.1 \
  --set featureGates\[0\].name=RayServiceIncrementalUpgrade \
  --set featureGates\[0\].enabled=true

创建一个启用了增量升级的 RayService。

kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/ray-service.incremental-upgrade.yaml

更新 rayClusterConfig 下的一个字段并重新应用 RayService 以触发零停机升级。

2. 工作原理：升级过程#

理解增量升级的生命周期有助于监控和配置。

触发：您可以通过更新 RayService spec 来触发升级，例如更改容器 image 或更新 rayClusterSpec 中工作节点组使用的 resources。
待定集群创建：KubeRay 检测到更改并创建一个新的、待定的 RayCluster。它将此集群的初始 target_capacity（应运行的 Serve 副本百分比）设置为 0%。
Gateway 和 Route 创建：KubeRay 为您的 RayService 创建一个 Gateway 资源和一个 HTTPRoute 资源，该资源最初将 100% 的流量路由到旧的、活动的 集群，而 0% 路由到新的、待定的集群。
升级循环开始：KubeRay 控制器现在进入一个循环，该循环重复三个阶段，直到升级完成。此循环可确保总集群容量仅超过 100% maxSurgePercent，从而防止资源耗尽。

让我们举个例子：maxSurgePercent: 20 和 stepSizePercent: 5。
- 初始状态
  - 活动集群 target_capacity：100%
  - 待定集群 target_capacity：0%
  - 总容量：100%
升级周期
- 阶段 1：待定集群扩容（容量）
  - KubeRay 检查总容量（100%）并发现其 \(\le\) 100%。它将 **待定** 集群的 target_capacity 增加 maxSurgePercent。
  - 活动 target_capacity：100%
  - 待定 target_capacity：0% \(\rightarrow\) **20%**
  - 总容量：120%
  - 如果启用了 Ray Serve Autoscaler，Serve 应用程序将根据新的 target_capacity 将其 num_replicas 从 min_replicas 扩展。如果未启用 Ray Serve Autoscaler，新的 target_capacity 值将直接调整每个 Serve 部署的 num_replicas。根据 num_replicas 的更新值，Ray Autoscaler 将开始为待定集群配置 Pod 以处理更新的资源负载。
- 阶段 2：流量迁移（HTTPRoute）
  - KubeRay 等待待定集群的新 Pod 就绪。在创建更新的 Ray Serve 副本的工作 Pod 期间，每秒请求数可能会暂时下降。
  - 就绪后，它开始 **渐进地** 迁移流量。每 intervalSeconds，它会更新 HTTPRoute 的权重，将 stepSizePercent（5%）的流量从活动集群转移到待定集群。
  - 这会一直持续到 **实际** 流量（trafficRoutedPercent）“赶上”待定集群的 target_capacity（在本例中为 20%）。
- 阶段 3：活动集群缩容（容量）
  - 一旦阶段 2 完成（trafficRoutedPercent == 20%），循环将再次运行。
  - KubeRay 检查总容量（120%）并发现其 > 100%。它将 **活动** 集群的 target_capacity 减少 maxSurgePercent。
  - 活动 target_capacity：100% \(\rightarrow\) **80%**
  - 待定 target_capacity：20%
  - 总容量：100%
  - Ray Autoscaler 会在活动集群中的 Pod 空闲时终止它们。

完成与清理：这个 **（待定集群扩容 -> 流量迁移 -> 活动集群缩容）** 的循环会一直持续，直到待定集群的 target_capacity 达到 100%，trafficRoutedPercent 达到 100%，活动集群的 target_capacity 达到 0%。

然后 KubeRay 将待定集群提升为活动集群，更新 HTTPRoute 以将 100% 的流量发送给它，并安全地终止旧的 RayCluster。

3. RayService 配置示例#

要使用此功能，请将 upgradeStrategy.type 设置为 NewClusterWithIncrementalUpgrade 并提供必需的选项。

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: rayservice-incremental-upgrade
spec:
  # This is the main configuration block for the upgrade
  upgradeStrategy:
    # 1. Set the type to NewClusterWithIncrementalUpgrade
    type: "NewClusterWithIncrementalUpgrade"
    clusterUpgradeOptions:
      # 2. The name of your K8s GatewayClass
      gatewayClassName: "istio"

      # 3. Capacity scaling: Increase new cluster's target_capacity
      #    by 20% in each scaling step.
      maxSurgePercent: 20

      # 4. Traffic shifting: Move 5% of traffic from old to new
      #    cluster every intervalSeconds.
      stepSizePercent: 5

      # 5. Interval seconds controls the pace of traffic migration during the upgrade.
      intervalSeconds: 10

  # This is your Serve config
  serveConfigV2: |
    applications:
      - name: my_app
        import_path: my_model:app
        route_prefix: /
        deployments:
          - name: MyModel
            num_replicas: 10
            ray_actor_options:
              resources: { "GPU": 1 }
            autoscaling_config:
              min_replicas: 0
              max_replicas: 20

  # This is your RayCluster config (autoscaling must be enabled)
  rayClusterSpec:
    enableInTreeAutoscaling: true
    headGroupSpec:
      # ... head spec ...
    workerGroupSpecs:
    - groupName: gpu-worker
      replicas: 0
      minReplicas: 0
      maxReplicas: 20
      template:
        # ... pod spec with GPU requests ...

4. 触发升级#

增量升级的触发方式与 KubeRay 中的标准零停机升级完全相同：通过修改 RayService 自定义资源的 spec.rayClusterConfig。

当 KubeRay 检测到集群规范的变化（例如新的容器镜像、修改的资源限制或更新的环境变量）时，它会计算一个新的哈希值。如果哈希值与活动集群不同并且启用了增量升级，则会自动启动 NewClusterWithIncrementalUpgrade 策略。

可以通过在更新的 YAML 配置文件上运行 kubectl apply -f 来更新集群规范，或者通过使用 kubectl edit rayservice <your-rayservice-name> 直接编辑 CR 来更新。

5. 监控升级#

您可以通过检查 RayService 状态和 HTTPRoute 对象来监控升级进度。

检查 RayService 状态
```
kubectl describe rayservice rayservice-incremental-upgrade
```
查看 Status 部分。您将看到 Active Service Status 和 Pending Service Status，它们显示了两个集群的状态。请密切关注这两个新字段：
- Target Capacity：KubeRay **告知** 此集群要扩展到的副本百分比。
- Traffic Routed Percent：KubeRay **当前** 通过 Gateway 发送到此集群的流量百分比。
在升级期间，您将看到待定集群的 Target Capacity 以阶梯式增加（例如 20%、40%），并且 Traffic Routed Percent 逐渐攀升以匹配它。
检查 HTTPRoute 权重：您也可以直接在 KubeRay 管理的 HTTPRoute 资源上查看流量权重。
```
kubectl get httproute rayservice-incremental-upgrade-httproute -o yaml
```
查看 spec.rules.backendRefs。您将看到旧服务和新服务的 weight 随着流量迁移（阶段 2）的进展实时变化。

例如

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  creationTimestamp: "2025-12-07T07:42:24Z"
  generation: 10
  name: stress-test-serve-httproute
  namespace: default
  ownerReferences:
  - apiVersion: ray.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: RayService
    name: stress-test-serve
    uid: 83a785cc-8745-4ccd-9973-2fc9f27000cc
  resourceVersion: "3714"
  uid: 660b14b5-78df-4507-b818-05989b1ef806
spec:
  parentRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: stress-test-serve-gateway
    namespace: default
  rules:
  - backendRefs:
    - group: ""
      kind: Service
      name: stress-test-serve-f6z4w-serve-svc
      namespace: default
      port: 8000
      weight: 90
    - group: ""
      kind: Service
      name: stress-test-serve-xclvf-serve-svc
      namespace: default
      port: 8000
      weight: 10
    matches:
    - path:
        type: PathPrefix
        value: /
status:
  parents:
  - conditions:
    - lastTransitionTime: "2025-12-07T07:42:24Z"
      message: Route was valid
      observedGeneration: 10
      reason: Accepted
      status: "True"
      type: Accepted
    - lastTransitionTime: "2025-12-07T07:42:24Z"
      message: All references resolved
      observedGeneration: 10
      reason: ResolvedRefs
      status: "True"
      type: ResolvedRefs
    controllerName: istio.io/gateway-controller
    parentRef:
      group: gateway.networking.k8s.io
      kind: Gateway
      name: stress-test-serve-gateway
      namespace: default

如何安全升级？#

由于此功能是 alpha 版且尚未支持回滚，因此我们建议使用保守的参数设置，以最大程度地降低升级过程中的风险。

计算 maxSurgePercent#

maxSurgePercent 决定了升级期间可以配置的额外资源的最高百分比。要计算最小安全值：

(1)#\[\begin{equation} \text{maxSurgePercent} = \frac{\text{每个 Pod 的资源}}{\text{总集群资源}} \times 100 \end{equation}\]

示例#

考虑一个具有以下配置的 RayCluster：

excludeHeadService：true
Head Pod：无 GPU
5 个工作 Pod，每个 Pod 1 个 GPU（总计：5 个 GPU）

对于此集群：

(2)#\[\begin{equation} \text{maxSurgePercent} = \frac{1 \text{ GPU}}{5 \text{ GPUs}} \times 100 = 20\% \end{equation}\]

使用 maxSurgePercent: 20，升级过程可确保：

新集群一次扩展 **1 个工作 Pod**（5 的 20% = 1 个 Pod）
旧集群一次缩减 **1 个工作 Pod**
您的集群在过渡期间暂时使用 6 个 GPU（原始 5 个 + 新增 1 个）

此配置可确保您有足够的资源在升级期间运行至少一个额外的工件 Pod，而不会出现资源争用。

理解 intervalSeconds#

将 intervalSeconds 设置为 60 秒，以便 Ray Serve Autoscaler 和 Ray Autoscaler 有足够的时间来：

检测负载变化
立即扩展或缩减副本以强制执行新的 min_replicas 和 max_replicas 限制（通过 target_capacity）
- 如果副本超过新的 max_replicas，则立即缩减副本
- 如果副本低于新的 min_replicas，则立即扩展副本
配置资源

较大的间隔可防止升级控制器比自动缩放器能更快地做出更改，从而降低服务中断的风险。

示例配置#

upgradeStrategy:
  maxSurgePercent: 20  # Calculated: (1 GPU / 5 GPUs) × 100
  stepSizePercent: 10  # Less than maxSurgePercent
  intervalSeconds: 60  # Wait 1 minute between steps

API 概述（参考）#

本节详细介绍了 RayService CRD 中新增和更新的字段。

`RayService.spec.upgradeStrategy`#

字段	类型	描述	必需	默认
`type`	`string`	升级策略。可以是 `NewCluster`、`None` 或 `NewClusterWithIncrementalUpgrade`。	否	`NewCluster`
`clusterUpgradeOptions`	`object`	增量升级设置的容器。如果 `type` 设置为 `NewClusterWithIncrementalUpgrade`，则必需。必须启用 `RayServiceIncrementalUpgrade` 功能门。	否	`nil`

`RayService.spec.upgradeStrategy.clusterUpgradeOptions`#

此块 **仅** 在 type 设置为 NewClusterWithIncrementalUpgrade 时才需要。

字段	类型	描述	必需	默认
`maxSurgePercent`	`int32`	在每次扩容步骤中添加到新集群的容量（Serve 副本）的百分比。例如，值为 `20` 表示新集群的 `target_capacity` 将以 20% 的增量增加（0% -> 20% -> 40%...）。必须介于 0 和 100 之间。	否	`100`
`stepSizePercent`	`int32`	在每个间隔期间从旧集群转移到新集群的流量百分比。必须介于 0 和 100 之间。	是	N/A
`intervalSeconds`	`int32`	在按 `stepSizePercent` 迁移流量之间等待的秒数。	是	N/A
`gatewayClassName`	`string`	KubeRay 应使用 `Gateway` 和 `HTTPRoute` 对象创建的 `GatewayClass` 资源的 `metadata.name`。	是	N/A

`RayService.status.activeServiceStatus` & `RayService.status.pendingServiceStatus`#

在 activeServiceStatus 和 pendingServiceStatus 块中都添加了三个新字段，以提供对升级过程的可见性。

字段	类型	描述
`targetCapacity`	`int32`	此集群配置要处理的 Serve 副本的目标百分比（从 0 到 100）。这由 KubeRay 根据 `maxSurgePercent` 控制。
`trafficRoutedPercent`	`int32`	当前路由到此集群端点的实际流量百分比（从 0 到 100）。这由 KubeRay 在升级期间根据 `stepSizePercent` 和 `intervalSeconds` 控制。
`lastTrafficMigratedTime`	`metav1.Time`	指示 `trafficRoutedPercent` 上次更新时间的戳。

后续步骤：#

有关使用 KubeRay 部署 Ray Serve 的更多信息，请参阅在 Kubernetes 上部署。
有关配置 Serve 部署以根据流量负载进行扩展的说明，请参阅 Ray Serve Autoscaling。