服务 DeepSeek#

本示例展示了如何使用 Ray Serve LLM 部署 DeepSeek R1 或 V3 模型。

安装#

要运行此示例，请安装以下内容

pip install "ray[llm]"

部署#

快速部署#

为了快速部署和测试，将以下代码保存到名为 deepseek.py 的文件中，然后运行 python3 deepseek.py。

from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

llm_config = LLMConfig(
    model_loading_config={
        "model_id": "deepseek",
        "model_source": "deepseek-ai/DeepSeek-R1",
    },
    deployment_config={
        "autoscaling_config": {
            "min_replicas": 1,
            "max_replicas": 1,
        }
    },
    # Change to the accelerator type of the node
    accelerator_type="H100",
    runtime_env={"env_vars": {"VLLM_USE_V1": "1"}},
    # Customize engine arguments as needed (e.g. vLLM engine kwargs)
    engine_kwargs={
        "tensor_parallel_size": 8,
        "pipeline_parallel_size": 2,
        "gpu_memory_utilization": 0.92,
        "dtype": "auto",
        "max_num_seqs": 40,
        "max_model_len": 16384,
        "enable_chunked_prefill": True,
        "enable_prefix_caching": True,
        "trust_remote_code": True,
    },
)

# Deploy the application
llm_app = build_openai_app({"llm_configs": [llm_config]})
serve.run(llm_app)

生产环境部署#

对于生产环境部署，将以下内容保存到名为 deepseek.yaml 的 YAML 文件中，然后运行 serve run deepseek.yaml。

applications:
- args:
    llm_configs:
      - model_loading_config:
          model_id: "deepseek"
          model_source: "deepseek-ai/DeepSeek-R1"
        accelerator_type: "H100"
        deployment_config:
          autoscaling_config:
            min_replicas: 1
            max_replicas: 1
        runtime_env:
          env_vars:
            VLLM_USE_V1: "1"
        engine_kwargs:
          tensor_parallel_size: 8
          pipeline_parallel_size: 2
          gpu_memory_utilization: 0.92
          dtype: "auto"
          max_num_seqs: 40
          max_model_len: 16384
          enable_chunked_prefill: true
          enable_prefix_caching: true
          trust_remote_code: true
  import_path: ray.serve.llm:build_openai_app
  name: llm_app
  route_prefix: "/"

配置#

您可能需要根据您的设置调整上述代码中的配置，特别是

accelerator_type: 对于 NVIDIA GPU，DeepSeek 需要 Hopper GPU 或更高版本。因此，您可以根据您的硬件指定 H200、H100、H20 等。
tensor_parallel_size 和 pipeline_parallel_size: DeepSeek 需要一个 8xH200 的节点，或者两个 8xH100 的节点。使用 H100 的典型设置是将 tensor_parallel_size 设置为 8，并将 pipeline_parallel_size 设置为 2，如代码示例所示。使用 H200 时，可以将 tensor_parallel_size 设置为 8，并省略 pipeline_parallel_size 参数（默认值为 1）。
model_source: 尽管您可以在代码示例中指定 HuggingFace 模型 ID，例如 deepseek-ai/DeepSeek-R1，但建议预先下载模型，因为它非常庞大。您可以将其下载到本地文件系统（例如，/path/to/downloaded/model）或远程对象存储（例如，s3://my-bucket/path/to/downloaded/model），并将其指定为 model_source。建议使用 Ray 模型缓存工具将其下载到远程对象存储。请注意，如果您有两个节点并希望下载到本地文件系统，则需要在两个节点上的相同路径下载模型。

测试服务#

您可以使用以下请求查询已部署的模型并获取相应的响应。

请求

curl -X POST http://localhost:8000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer fake-key" \
     -d '{
           "model": "deepseek",
           "messages": [{"role": "user", "content": "Hello!"}]
         }'

响应

{"id":"deepseek-68b5d5c5-fd34-42fc-be26-0a36f8457ffe","object":"chat.completion","created":1743646776,"model":"deepseek","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Hello! How can I assist you today? 😊","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":6,"total_tokens":18,"completion_tokens":12,"prompt_tokens_details":null},"prompt_logprobs":null}

另一个示例请求和响应

请求

curl -X POST http://localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer fake-key" \
    -d '{
        "model": "deepseek",
        "messages": [{"role": "user", "content": "The future of AI is"}]
        }'

响应

{"id":"deepseek-b81ff9be-3ffc-4811-80ff-225006eff27c","object":"chat.completion","created":1743646860,"model":"deepseek","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"The future of AI is multifaceted and holds immense potential across various domains. Here are some key aspects that are likely to shape its trajectory:\n\n1. **Advanced Automation**: AI will continue to automate routine and complex tasks across industries, increasing efficiency and productivity. This includes everything from manufacturing and logistics to healthcare and finance.\n\n2. **Enhanced Decision-Making**: AI systems will provide deeper insights and predictive analytics, aiding in better decision-making processes for businesses, governments, and individuals.\n\n3. **Personalization**: AI will drive more personalized experiences in areas such as shopping, education, and entertainment, tailoring services and products to individual preferences and behaviors.\n\n4. **Healthcare Revolution**: AI will play a significant role in diagnosing diseases, personalizing treatment plans, and even predicting health issues before they become critical, potentially transforming the healthcare industry.\n\n5. **Ethical and Responsible AI**: As AI becomes more integrated into society, there will be a growing focus on developing ethical guidelines and frameworks to ensure AI is used responsibly and transparently, addressing issues like bias, privacy, and security.\n\n6. **Human-AI Collaboration**: The future will see more seamless collaboration between humans and AI, with AI augmenting human capabilities rather than replacing them. This includes areas like creative industries, where AI can assist in generating ideas and content.\n\n7. **AI in Education**: AI will personalize learning experiences, adapt to individual learning styles, and provide real-time feedback, making education more accessible and effective.\n\n8. **Robotics and Autonomous Systems**: Advances in AI will lead to more sophisticated robots and autonomous systems, impacting industries like transportation (e.g., self-driving cars), agriculture, and home automation.\n\n9. **AI and Sustainability**: AI will play a crucial role in addressing environmental challenges by optimizing resource use, improving energy efficiency, and aiding in climate modeling and conservation efforts.\n\n10. **Regulation and Governance**: As AI technologies advance, there will be increased efforts to establish international standards and regulations to govern their development and use, ensuring they benefit society as a whole.\n\n11. **Quantum Computing and AI**: The integration of quantum computing with AI could revolutionize data processing capabilities, enabling the solving of complex problems that are currently intractable.\n\n12. **AI in Creative Fields**: AI will continue to make strides in creative domains such as music, art, and literature, collaborating with human creators to push the boundaries of innovation and expression.\n\nOverall, the future of AI is both promising and challenging, requiring careful consideration of its societal impact and the ethical implications of its widespread adoption.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":9,"total_tokens":518,"completion_tokens":509,"prompt_tokens_details":null},"prompt_logprobs":null}