核心概念#

部署#

部署是 Ray Serve 中的核心概念。部署包含用于处理传入请求的业务逻辑或 ML 模型，并且可以扩展以在 Ray 集群中运行。在运行时，一个部署由多个 replicas（副本）组成，它们是类或函数的独立副本，在单独的 Ray Actor（进程）中启动。副本数量可以根据传入请求的负载进行扩展（增加或减少，甚至自动扩展）。

要定义部署，请在 Python 类（或简单用例的函数）上使用 @serve.deployment 装饰器。然后，使用可选参数 bind 部署到构造函数来定义一个应用。最后，使用 serve.run（或等效的 serve run CLI 命令，详见开发工作流程）部署生成应用。

from ray import serve
from ray.serve.handle import DeploymentHandle


@serve.deployment
class MyFirstDeployment:
    # Take the message to return as an argument to the constructor.
    def __init__(self, msg):
        self.msg = msg

    def __call__(self):
        return self.msg


my_first_deployment = MyFirstDeployment.bind("Hello world!")
handle: DeploymentHandle = serve.run(my_first_deployment)
assert handle.remote().result() == "Hello world!"

应用#

应用是 Ray Serve 集群中的升级单元。一个应用由一个或多个部署组成。其中一个部署被认为是 “入口”部署，它处理所有入站流量。

应用可以通过指定的 route_prefix 通过 HTTP 调用，或者在 Python 中使用 DeploymentHandle 调用。

DeploymentHandle（组合部署）#

Ray Serve 通过允许多个独立的部署相互调用来实现灵活的模型组合和扩展。绑定部署时，可以包含对其他已绑定部署的引用。然后，在运行时，每个这些参数都会转换为一个 DeploymentHandle，可用于通过 Python 原生 API 查询部署。下面是一个基本示例，其中 Ingress 部署可以调用两个下游模型。有关更全面的指南，请参阅模型组合指南。

from ray import serve
from ray.serve.handle import DeploymentHandle


@serve.deployment
class Hello:
    def __call__(self) -> str:
        return "Hello"


@serve.deployment
class World:
    def __call__(self) -> str:
        return " world!"


@serve.deployment
class Ingress:
    def __init__(self, hello_handle: DeploymentHandle, world_handle: DeploymentHandle):
        self._hello_handle = hello_handle
        self._world_handle = world_handle

    async def __call__(self) -> str:
        hello_response = self._hello_handle.remote()
        world_response = self._world_handle.remote()
        return (await hello_response) + (await world_response)


hello = Hello.bind()
world = World.bind()

# The deployments passed to the Ingress constructor are replaced with handles.
app = Ingress.bind(hello, world)

# Deploys Hello, World, and Ingress.
handle: DeploymentHandle = serve.run(app)

# `DeploymentHandle`s can also be used to call the ingress deployment of an application.
assert handle.remote().result() == "Hello world!"

入口部署（HTTP 处理）#

Serve 应用可以由多个部署组成，这些部署可以组合起来执行模型组合或复杂的业务逻辑。然而，总有一个部署是传递给 serve.run 以部署应用的“顶级”部署。此部署称为“入口部署”，因为它充当所有流向应用的流量的入口点。通常，它会路由到其他部署或使用 DeploymentHandle API 调用它们，并在返回给用户之前组合结果。

入口部署定义了应用的 HTTP 处理逻辑。默认情况下，会调用类的 __call__ 方法，并将一个 Starlette 请求对象传递进去。响应将被序列化为 JSON，但也可以直接返回其他 Starlette 响应对象。以下是一个示例：

import requests
from starlette.requests import Request

from ray import serve


@serve.deployment
class MostBasicIngress:
    async def __call__(self, request: Request) -> str:
        name = (await request.json())["name"]
        return f"Hello {name}!"


app = MostBasicIngress.bind()
serve.run(app)
assert (
    requests.get("http://127.0.0.1:8000/", json={"name": "Corey"}).text
    == "Hello Corey!"
)

绑定部署并运行 serve.run() 后，它现在由 HTTP 服务器公开，并使用指定的类处理请求。我们可以使用 requests 查询模型，以验证其是否正常工作。

为了更具表现力的 HTTP 处理，Serve 还内置了与 FastAPI 的集成。这使您可以使用 FastAPI 的完整表现力来定义更复杂的 API。

import requests
from fastapi import FastAPI
from fastapi.responses import PlainTextResponse

from ray import serve

fastapi_app = FastAPI()


@serve.deployment
@serve.ingress(fastapi_app)
class FastAPIIngress:
    @fastapi_app.get("/{name}")
    async def say_hi(self, name: str) -> str:
        return PlainTextResponse(f"Hello {name}!")


app = FastAPIIngress.bind()
serve.run(app)
assert requests.get("http://127.0.0.1:8000/Corey").text == "Hello Corey!"

下一步是什么？#

了解了核心概念后，您可以深入学习以下指南：