Ray Tune 的并行度和资源指南#

并行性由每个 trial 的资源（默认每个 trial 1 个 CPU，0 个 GPU）和 Tune 可用的资源 (ray.cluster_resources()) 决定。

默认情况下，Tune 会自动并发运行 N 个 trial，其中 N 是你机器上的 CPU（核心）数量。

# If you have 4 CPUs on your machine, this will run 4 concurrent trials at a time.
tuner = tune.Tuner(
    trainable,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

你可以通过 tune.with_resources 覆盖每个 trial 的资源。你可以在此使用字典或 PlacementGroupFactory 对象指定你的资源请求。无论哪种情况，Ray Tune 都将尝试为每个 trial 启动一个放置组。

# If you have 4 CPUs on your machine, this will run 2 concurrent trials at a time.
trainable_with_resources = tune.with_resources(trainable, {"cpu": 2})
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

# If you have 4 CPUs on your machine, this will run 1 trial at a time.
trainable_with_resources = tune.with_resources(trainable, {"cpu": 4})
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

# Fractional values are also supported, (i.e., {"cpu": 0.5}).
# If you have 4 CPUs on your machine, this will run 8 concurrent trials at a time.
trainable_with_resources = tune.with_resources(trainable, {"cpu": 0.5})
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

# Custom resource allocation via lambda functions are also supported.
# If you want to allocate gpu resources to trials based on a setting in your config
trainable_with_resources = tune.with_resources(trainable,
    resources=lambda spec: {"gpu": 1} if spec.config.use_gpu else {"gpu": 0})
tuner = tune.Tuner(
    trainable_with_resources,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

Tune 将根据 tune.with_resources 为每个单独的 trial 分配指定的 GPU 和 CPU。即使 trial 当前无法调度，Ray Tune 仍将尝试启动相应的放置组。如果资源不足，在使用 Ray 集群启动器时，这将触发自动扩缩容行为。

警告

tune.with_resources 不能与 Ray Train Trainers 一起使用。如果你将 Trainer 传递给 Tuner，请使用 ScalingConfig 在 Trainer 实例中指定资源需求。下面概述的一般原则仍然适用。

也可以指定内存（"memory"，以字节为单位）和自定义资源需求。

如果你的可训练函数启动更多远程 worker，你需要传递所谓的放置组工厂对象来请求这些资源。更多信息请参阅 PlacementGroupFactory 文档。如果你使用其他利用 Ray 的库（例如 Modin），这也适用。未能正确设置资源可能会导致死锁，使集群“挂起”。

注意

以这种方式指定的资源仅用于调度 Tune trial。这些资源不会自动强制应用于你的目标函数（Tune 可训练函数）。你需要确保你的可训练函数有足够的资源运行（例如，相应地为 scikit-learn 模型设置 n_jobs）。

如何在 Tune 中利用 GPU？#

要利用 GPU，你必须在 tune.with_resources(trainable, resources_per_trial) 中设置 gpu。这将自动为每个 trial 设置 CUDA_VISIBLE_DEVICES 环境变量。

# If you have 8 GPUs, this will run 8 trials at once.
trainable_with_gpu = tune.with_resources(trainable, {"gpu": 1})
tuner = tune.Tuner(
    trainable_with_gpu,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

# If you have 4 CPUs and 1 GPU on your machine, this will run 1 trial at a time.
trainable_with_cpu_gpu = tune.with_resources(trainable, {"cpu": 2, "gpu": 1})
tuner = tune.Tuner(
    trainable_with_cpu_gpu,
    tune_config=tune.TuneConfig(num_samples=10)
)
results = tuner.fit()

你可以在Keras MNIST 示例中找到一个例子。

警告

如果未设置 gpu，CUDA_VISIBLE_DEVICES 环境变量将设置为空，禁止访问 GPU。

故障排除：有时，在运行新 trial 时，你可能会遇到 GPU 内存问题。这可能是由于之前的 trial 未能足够快地清理其 GPU 状态所致。为避免这种情况，你可以使用 tune.utils.wait_for_gpu。

如何使用 Tune 运行分布式训练？#

要对分布式训练任务进行调优，可以将 Ray Tune 与 Ray Train 一起使用。Ray Tune 将并行运行多个 trial，每个 trial 都使用 Ray Train 运行分布式训练。

更多详情，请参阅Ray Train 超参数优化。

如何在 Tune 中限制并发？#

要指定同时运行的最大 trial 数量，请在 TuneConfig 中设置 max_concurrent_trials。

请注意，实际并行度可能小于 max_concurrent_trials，并将取决于集群一次可以容纳多少个 trial（例如，如果你有一个 trial 需要 16 个 GPU，你的集群有 32 个 GPU，并且 max_concurrent_trials=10，Tuner 只能同时运行 2 个 trial）。

from ray.tune import TuneConfig

config = TuneConfig(
    # ...
    num_samples=100,
    max_concurrent_trials=10,
)