Ray Serve:可扩展且可编程的服务#
Ray Serve 是一个可扩展的模型服务库,用于构建在线推理 API。Serve 不依赖于特定框架,因此您可以使用单个工具包来服务从使用 PyTorch、TensorFlow 和 Keras 等框架构建的深度学习模型,到 Scikit-Learn 模型,再到任意 Python 业务逻辑的各种内容。它为大型语言模型(LLM)提供了许多特性和性能优化,例如响应流式传输、动态请求批处理、多节点/多 GPU 服务等。
Ray Serve 特别适合模型组合和许多模型服务场景,使您能够构建一个由多个 ML 模型和业务逻辑组成的复杂推理服务,所有这些都用 Python 代码实现。
Ray Serve 构建在 Ray 之上,因此可以轻松扩展到多台机器,并提供灵活的调度支持,例如分数 GPU,以便您可以共享资源并以低成本服务许多机器学习模型。
快速入门#
安装 Ray Serve 及其依赖项
pip install "ray[serve]"
定义一个简单的“hello world”应用,在本地运行它,并通过 HTTP 查询它。
import requests
from starlette.requests import Request
from typing import Dict
from ray import serve
# 1: Define a Ray Serve application.
@serve.deployment
class MyModelDeployment:
def __init__(self, msg: str):
# Initialize model state: could be very large neural net weights.
self._msg = msg
def __call__(self, request: Request) -> Dict:
return {"result": self._msg}
app = MyModelDeployment.bind(msg="Hello world!")
# 2: Deploy the application locally.
serve.run(app, route_prefix="/")
# 3: Query the application and print the result.
print(requests.get("http://localhost:8000/").json())
# {'result': 'Hello world!'}
更多示例#
使用 Serve 的模型组合 API 将多个部署合并到一个应用中。
import requests
import starlette
from typing import Dict
from ray import serve
from ray.serve.handle import DeploymentHandle
# 1. Define the models in our composition graph and an ingress that calls them.
@serve.deployment
class Adder:
def __init__(self, increment: int):
self.increment = increment
def add(self, inp: int):
return self.increment + inp
@serve.deployment
class Combiner:
def average(self, *inputs) -> float:
return sum(inputs) / len(inputs)
@serve.deployment
class Ingress:
def __init__(
self,
adder1: DeploymentHandle,
adder2: DeploymentHandle,
combiner: DeploymentHandle,
):
self._adder1 = adder1
self._adder2 = adder2
self._combiner = combiner
async def __call__(self, request: starlette.requests.Request) -> Dict[str, float]:
input_json = await request.json()
final_result = await self._combiner.average.remote(
self._adder1.add.remote(input_json["val"]),
self._adder2.add.remote(input_json["val"]),
)
return {"result": final_result}
# 2. Build the application consisting of the models and ingress.
app = Ingress.bind(Adder.bind(increment=1), Adder.bind(increment=2), Combiner.bind())
serve.run(app)
# 3: Query the application and print the result.
print(requests.post("http://localhost:8000/", json={"val": 100.0}).json())
# {"result": 101.5}
使用 Serve 的 FastAPI 集成来优雅地处理 HTTP 解析和验证。
import requests
from fastapi import FastAPI
from ray import serve
# 1: Define a FastAPI app and wrap it in a deployment with a route handler.
app = FastAPI()
@serve.deployment
@serve.ingress(app)
class FastAPIDeployment:
# FastAPI will automatically parse the HTTP request for us.
@app.get("/hello")
def say_hello(self, name: str) -> str:
return f"Hello {name}!"
# 2: Deploy the deployment.
serve.run(FastAPIDeployment.bind(), route_prefix="/")
# 3: Query the deployment and print the result.
print(requests.get("http://localhost:8000/hello", params={"name": "Theodore"}).json())
# "Hello Theodore!"
要运行此示例,请安装以下内容:pip install transformers
使用 Ray Serve 服务一个预训练的 Hugging Face Transformers 模型。我们将使用的模型是一个情感分析模型:它将接受一个文本字符串作为输入,并返回文本是“POSITIVE”还是“NEGATIVE”。
import requests
from starlette.requests import Request
from typing import Dict
from transformers import pipeline
from ray import serve
# 1: Wrap the pretrained sentiment analysis model in a Serve deployment.
@serve.deployment
class SentimentAnalysisDeployment:
def __init__(self):
self._model = pipeline("sentiment-analysis")
def __call__(self, request: Request) -> Dict:
return self._model(request.query_params["text"])[0]
# 2: Deploy the deployment.
serve.run(SentimentAnalysisDeployment.bind(), route_prefix="/")
# 3: Query the deployment and print the result.
print(
requests.get(
"http://localhost:8000/", params={"text": "Ray Serve is great!"}
).json()
)
# {'label': 'POSITIVE', 'score': 0.9998476505279541}
为什么选择 Serve?#
Serve 如何帮助我作为…#
Serve 与 … 相比如何?#
我们坚信 Serve 是独一无二的,因为它为您提供了对 ML 应用的端到端控制,同时提供了可伸缩性和高性能。要使用其他工具实现 Serve 的功能,您可能需要将 Tensorflow Serving 和 SageMaker 等多个框架连接起来,甚至自己开发微批处理组件来提高吞吐量。
了解更多#
查看快速入门和关键概念,或者前往示例开始构建您的 Ray Serve 应用。
快速入门
从我们的快速入门教程开始,了解如何在本地部署单个模型以及如何将现有模型转换为 Ray Serve 部署。
关键概念
了解 Ray Serve 背后的关键概念。了解部署、如何查询它们,以及使用DeploymentHandles将多个模型和业务逻辑组合在一起。
示例
按照教程学习如何将 Ray Serve 与TensorFlow 和Scikit-Learn集成。
API 参考
获取 Ray Serve API 的更深入信息。
更多信息,请参见以下关于 Ray Serve 的博客文章
在生产中服务 ML 模型:常见模式 by Simon Mo, Edward Oakes, and Michael Galarnyk
使用纯 Python 在生产中服务 NLP 模型的最简单方法 by Edward Oakes and Bill Chambers
机器学习服务已损坏 by Simon Mo
如何使用 Ray Serve 扩展您的 FastAPI 应用 by Archit Kulkarni