Ray Serve：可扩展且可编程的服务#

Ray Serve 是一个可扩展的模型服务库，用于构建在线推理 API。Serve 不依赖于特定框架，因此您可以使用单个工具包来服务从使用 PyTorch、TensorFlow 和 Keras 等框架构建的深度学习模型，到 Scikit-Learn 模型，再到任意 Python 业务逻辑的各种内容。它为大型语言模型（LLM）提供了许多特性和性能优化，例如响应流式传输、动态请求批处理、多节点/多 GPU 服务等。

Ray Serve 特别适合模型组合和许多模型服务场景，使您能够构建一个由多个 ML 模型和业务逻辑组成的复杂推理服务，所有这些都用 Python 代码实现。

Ray Serve 构建在 Ray 之上，因此可以轻松扩展到多台机器，并提供灵活的调度支持，例如分数 GPU，以便您可以共享资源并以低成本服务许多机器学习模型。

快速入门#

安装 Ray Serve 及其依赖项

pip install "ray[serve]"

定义一个简单的“hello world”应用，在本地运行它，并通过 HTTP 查询它。

import requests
from starlette.requests import Request
from typing import Dict

from ray import serve


# 1: Define a Ray Serve application.
@serve.deployment
class MyModelDeployment:
    def __init__(self, msg: str):
        # Initialize model state: could be very large neural net weights.
        self._msg = msg

    def __call__(self, request: Request) -> Dict:
        return {"result": self._msg}


app = MyModelDeployment.bind(msg="Hello world!")

# 2: Deploy the application locally.
serve.run(app, route_prefix="/")

# 3: Query the application and print the result.
print(requests.get("http://localhost:8000/").json())
# {'result': 'Hello world!'}

更多示例#

模型组合

使用 Serve 的模型组合 API 将多个部署合并到一个应用中。

import requests
import starlette
from typing import Dict
from ray import serve
from ray.serve.handle import DeploymentHandle


# 1. Define the models in our composition graph and an ingress that calls them.
@serve.deployment
class Adder:
    def __init__(self, increment: int):
        self.increment = increment

    def add(self, inp: int):
        return self.increment + inp


@serve.deployment
class Combiner:
    def average(self, *inputs) -> float:
        return sum(inputs) / len(inputs)


@serve.deployment
class Ingress:
    def __init__(
        self,
        adder1: DeploymentHandle,
        adder2: DeploymentHandle,
        combiner: DeploymentHandle,
    ):
        self._adder1 = adder1
        self._adder2 = adder2
        self._combiner = combiner

    async def __call__(self, request: starlette.requests.Request) -> Dict[str, float]:
        input_json = await request.json()
        final_result = await self._combiner.average.remote(
            self._adder1.add.remote(input_json["val"]),
            self._adder2.add.remote(input_json["val"]),
        )
        return {"result": final_result}


# 2. Build the application consisting of the models and ingress.
app = Ingress.bind(Adder.bind(increment=1), Adder.bind(increment=2), Combiner.bind())
serve.run(app)

# 3: Query the application and print the result.
print(requests.post("http://localhost:8000/", json={"val": 100.0}).json())
# {"result": 101.5}

FastAPI 集成

使用 Serve 的 FastAPI 集成来优雅地处理 HTTP 解析和验证。

import requests
from fastapi import FastAPI
from ray import serve

# 1: Define a FastAPI app and wrap it in a deployment with a route handler.
app = FastAPI()


@serve.deployment
@serve.ingress(app)
class FastAPIDeployment:
    # FastAPI will automatically parse the HTTP request for us.
    @app.get("/hello")
    def say_hello(self, name: str) -> str:
        return f"Hello {name}!"


# 2: Deploy the deployment.
serve.run(FastAPIDeployment.bind(), route_prefix="/")

# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/hello", params={"name": "Theodore"}).json())
# "Hello Theodore!"

Hugging Face Transformers 模型

要运行此示例，请安装以下内容：pip install transformers

使用 Ray Serve 服务一个预训练的 Hugging Face Transformers 模型。我们将使用的模型是一个情感分析模型：它将接受一个文本字符串作为输入，并返回文本是“POSITIVE”还是“NEGATIVE”。

import requests
from starlette.requests import Request
from typing import Dict

from transformers import pipeline

from ray import serve


# 1: Wrap the pretrained sentiment analysis model in a Serve deployment.
@serve.deployment
class SentimentAnalysisDeployment:
    def __init__(self):
        self._model = pipeline("sentiment-analysis")

    def __call__(self, request: Request) -> Dict:
        return self._model(request.query_params["text"])[0]


# 2: Deploy the deployment.
serve.run(SentimentAnalysisDeployment.bind(), route_prefix="/")

# 3: Query the deployment and print the result.
print(
    requests.get(
        "http://localhost:8000/", params={"text": "Ray Serve is great!"}
    ).json()
)
# {'label': 'POSITIVE', 'score': 0.9998476505279541}