使用 Optuna 运行 Tune 实验#

在本教程中，我们将介绍 Optuna，同时运行一个简单的 Ray Tune 实验。Tune 的搜索算法与 Optuna 集成，因此你可以无缝地扩展 Optuna 优化过程，而无需牺牲性能。

与 Ray Tune 类似，Optuna 是一个自动超参数优化软件框架，尤其针对机器学习设计。它具有命令式（强调“如何”而非“是什么”）的、定义即运行风格的用户 API。使用 Optuna，用户能够动态构建超参数的搜索空间。Optuna 属于“无导数优化”和“黑箱优化”的范畴。

在此示例中，我们将最小化一个简单目标，以简要演示如何通过 OptunaSearch 将 Optuna 与 Ray Tune 结合使用，包括条件搜索空间（连接超参数之间的关系）和多目标问题（衡量所有重要指标之间的权衡）的示例。需要注意的是，尽管强调机器学习实验，Ray Tune 可以优化任何隐式或显式目标。这里我们假设已安装 optuna>=3.0.0 库。要了解更多信息，请参考 Optuna 网站。

请注意，复杂的调度器（例如 AsyncHyperBandScheduler）可能无法正确地与多目标优化配合使用，因为它们通常期望一个标量分数来比较试验之间的适应度。

前提条件#

# !pip install "ray[tune]"
!pip install -q "optuna>=3.0.0"

接下来，导入必要的库

import time
from typing import Dict, Optional, Any

import ray
from ray import tune
from ray.tune.search import ConcurrencyLimiter
from ray.tune.search.optuna import OptunaSearch

ray.init(configure_logging=False)  # initialize Ray

首先定义一个简单的评估函数。此处查询一个显式数学公式作为演示，但在实践中，这通常是一个黑箱函数——例如，训练 ML 模型后的性能结果。我们人工休眠一小段时间（0.1 秒）来模拟一个长时间运行的 ML 实验。这种设置假设我们在调整三个超参数（即 width、height 和 activation）的同时运行实验的多个 step。

def evaluate(step, width, height, activation):
    time.sleep(0.1)
    activation_boost = 10 if activation=="relu" else 0
    return (0.1 + width * step / 100) ** (-1) + height * 0.1 + activation_boost

接下来，我们的 objective 函数（待优化）接受一个 Tune config，在训练循环中评估实验的 score，并使用 tune.report 将 score 报告回 Tune。

def objective(config):
    for step in range(config["steps"]):
        score = evaluate(step, config["width"], config["height"], config["activation"])
        tune.report({"iterations": step, "mean_loss": score})

接下来，我们定义一个搜索空间。关键假设是最佳超参数存在于此空间内。然而，如果空间非常大，那么这些超参数可能很难在短时间内找到。

最简单的情况是具有独立维度的搜索空间。在这种情况下，一个配置字典就足够了。

search_space = {
    "steps": 100,
    "width": tune.uniform(0, 20),
    "height": tune.uniform(-100, 100),
    "activation": tune.choice(["relu", "tanh"]),
}

这里我们定义 Optuna 搜索算法

algo = OptunaSearch()

我们还使用 ConcurrencyLimiter 将并发试验数量限制为 4。

algo = ConcurrencyLimiter(algo, max_concurrent=4)

样本数量是要尝试的超参数组合的数量。此 Tune 运行设置为 1000 个样本。（如果你的机器上运行时间过长，可以减少此数量）。

num_samples = 1000

最后，我们通过 algo 搜索 search_space，运行 num_samples 次实验，以 "min" 化 objective 的“mean_loss”。前面这句话完整地描述了我们要解决的搜索问题。考虑到这一点，请注意执行 tuner.fit() 是多么高效。

tuner = tune.Tuner(
    objective,
    tune_config=tune.TuneConfig(
        metric="mean_loss",
        mode="min",
        search_alg=algo,
        num_samples=num_samples,
    ),
    param_space=search_space,
)
results = tuner.fit()

显示代码单元格输出隐藏代码单元格输出

Tune 状态

当前时间	2025-02-10 18:06:12
运行时长	00:00:35.68
内存	22.7/36.0 GiB

系统信息

使用 FIFO 调度算法。
逻辑资源使用量：1.0/12 CPU, 0/0 GPU

试验状态

试验名称	状态	位置	激活函数	高度	宽度	损失	迭代次数	总时间 (秒)	迭代次数
objective_989a402c	TERMINATED	127.0.0.1:42307	relu	6.57558	8.66313	10.7728	100	10.3642	99
objective_d99d28c6	TERMINATED	127.0.0.1:42321	tanh	51.2103	19.2804	5.17314	100	10.3775	99
objective_ce34b92b	TERMINATED	127.0.0.1:42323	tanh	-49.4554	17.2683	-4.88739	100	10.3741	99
objective_f650ea5f	TERMINATED	127.0.0.1:42332	tanh	20.6147	3.19539	2.3679	100	10.3804	99
objective_e72e976e	TERMINATED	127.0.0.1:42356	relu	-12.5302	3.45152	9.03132	100	10.372	99
objective_d00b4e1a	TERMINATED	127.0.0.1:42362	tanh	65.8592	3.14335	6.89726	100	10.3776	99
objective_30c6ec86	TERMINATED	127.0.0.1:42367	tanh	-82.0713	14.2595	-8.13679	100	10.3755	99
objective_691ce63c	TERMINATED	127.0.0.1:42368	tanh	29.406	2.21881	3.37602	100	10.3653	99
objective_3051162c	TERMINATED	127.0.0.1:42404	relu	61.1787	12.9673	16.1952	100	10.3885	99
objective_04a38992	TERMINATED	127.0.0.1:42405	relu	6.28688	11.4537	10.7161	100	10.4051	99

现在我们得到了找到的最小化平均损失的超参数。

print("Best hyperparameters found were: ", results.get_best_result().config)

Best hyperparameters found were:  {'steps': 100, 'width': 14.259467682064852, 'height': -82.07132174642958, 'activation': 'tanh'}

提供一组初始超参数#

在定义搜索算法时，我们可以选择提供一组我们认为特别有前景或提供信息的初始超参数，并将这些信息作为 OptunaSearch 对象的有用起点。

initial_params = [
    {"width": 1, "height": 2, "activation": "relu"},
    {"width": 4, "height": 2, "activation": "relu"},
]

现在，使用 OptunaSearch 构建的 search_alg 接受 points_to_evaluate。

searcher = OptunaSearch(points_to_evaluate=initial_params)
algo = ConcurrencyLimiter(searcher, max_concurrent=4)

并运行带有初始超参数评估的实验

tuner = tune.Tuner(
    objective,
    tune_config=tune.TuneConfig(
        metric="mean_loss",
        mode="min",
        search_alg=algo,
        num_samples=num_samples,
    ),
    param_space=search_space,
)
results = tuner.fit()

显示代码单元格输出隐藏代码单元格输出

Tune 状态

当前时间	2025-02-10 18:06:47
运行时长	00:00:35.44
内存	22.7/36.0 GiB

系统信息

使用 FIFO 调度算法。
逻辑资源使用量：1.0/12 CPU, 0/0 GPU

试验状态

试验名称	状态	位置	激活函数	高度	宽度	损失	迭代次数	总时间 (秒)	迭代次数
objective_1d2e715f	TERMINATED	127.0.0.1:42435	relu	2	1	11.1174	100	10.3556	99
objective_f7c2aed0	TERMINATED	127.0.0.1:42436	relu	2	4	10.4463	100	10.3702	99
objective_09dcce33	TERMINATED	127.0.0.1:42438	tanh	28.5547	17.4195	2.91312	100	10.3483	99
objective_b9955517	TERMINATED	127.0.0.1:42443	tanh	-73.0995	13.8859	-7.23773	100	10.3682	99
objective_d81ebd5c	TERMINATED	127.0.0.1:42464	relu	-1.86597	1.46093	10.4601	100	10.3969	99
objective_3f0030e7	TERMINATED	127.0.0.1:42465	relu	38.7166	1.3696	14.5585	100	10.3741	99
objective_86bf6402	TERMINATED	127.0.0.1:42470	tanh	40.269	5.13015	4.21999	100	10.3769	99
objective_75d06a83	TERMINATED	127.0.0.1:42471	tanh	-11.2824	3.10251	-0.812933	100	10.3695	99
objective_0d197811	TERMINATED	127.0.0.1:42496	tanh	91.7076	15.1032	9.2372	100	10.3631	99
objective_5156451f	TERMINATED	127.0.0.1:42497	tanh	58.9282	3.96315	6.14136	100	10.4732	99

我们再次查看最优超参数。

print("Best hyperparameters found were: ", results.get_best_result().config)

Best hyperparameters found were:  {'steps': 100, 'width': 13.885889617119432, 'height': -73.09947583621019, 'activation': 'tanh'}

条件搜索空间#

有时我们可能希望构建一个更复杂的搜索空间，该空间对其他超参数具有条件依赖。在这种情况下，我们将一个定义即运行函数传递给 ray.tune() 中的 search_alg 参数。

def define_by_run_func(trial) -> Optional[Dict[str, Any]]:
    """Define-by-run function to construct a conditional search space.

    Ensure no actual computation takes place here. That should go into
    the trainable passed to ``Tuner()`` (in this example, that's
    ``objective``).

    For more information, see https://docs.optuna.cn/en/stable\
    /tutorial/10_key_features/002_configurations.html

    Args:
        trial: Optuna Trial object
        
    Returns:
        Dict containing constant parameters or None
    """

    activation = trial.suggest_categorical("activation", ["relu", "tanh"])

    # Define-by-run allows for conditional search spaces.
    if activation == "relu":
        trial.suggest_float("width", 0, 20)
        trial.suggest_float("height", -100, 100)
    else:
        trial.suggest_float("width", -1, 21)
        trial.suggest_float("height", -101, 101)
        
    # Return all constants in a dictionary.
    return {"steps": 100}

和之前一样，我们从 OptunaSearch 和 ConcurrencyLimiter 创建 search_alg，这次我们通过 space 参数定义搜索范围，并且不提供初始化。使用 space 时，还必须指定 metric 和 mode。

searcher = OptunaSearch(space=define_by_run_func, metric="mean_loss", mode="min")
algo = ConcurrencyLimiter(searcher, max_concurrent=4)

[I 2025-02-10 18:06:47,670] A new study created in memory with name: optuna

运行带有定义即运行搜索空间的实验

tuner = tune.Tuner(
    objective,
    tune_config=tune.TuneConfig(
        search_alg=algo,
        num_samples=num_samples,
    ),
)
results = tuner.fit()

显示代码单元格输出隐藏代码单元格输出

Tune 状态

当前时间	2025-02-10 18:07:23
运行时长	00:00:35.58
内存	22.9/36.0 GiB

系统信息

使用 FIFO 调度算法。
逻辑资源使用量：1.0/12 CPU, 0/0 GPU

试验状态

试验名称	状态	位置	激活函数	高度	步数	宽度	损失	迭代次数	总时间 (秒)	迭代次数
objective_48aa8fed	TERMINATED	127.0.0.1:42529	relu	-76.595	100	9.90896	2.44141	100	10.3957	99
objective_5f395194	TERMINATED	127.0.0.1:42531	relu	-34.1447	100	12.9999	6.66263	100	10.3823	99
objective_e64a7441	TERMINATED	127.0.0.1:42532	relu	-50.3172	100	3.95399	5.21738	100	10.3839	99
objective_8e668790	TERMINATED	127.0.0.1:42537	tanh	30.9768	100	16.22	3.15957	100	10.3818	99
objective_78ca576b	TERMINATED	127.0.0.1:42559	relu	80.5037	100	0.906139	19.0533	100	10.3731	99
objective_4cd9e37a	TERMINATED	127.0.0.1:42560	relu	77.0988	100	8.43807	17.8282	100	10.3881	99
objective_a40498d5	TERMINATED	127.0.0.1:42565	tanh	-24.0393	100	12.7274	-2.32519	100	10.4031	99
objective_43e7ea7e	TERMINATED	127.0.0.1:42566	tanh	-92.349	100	15.8595	-9.17161	100	10.4602	99
objective_cb92227e	TERMINATED	127.0.0.1:42591	relu	3.58988	100	17.3259	10.417	100	10.3817	99
objective_abed5125	TERMINATED	127.0.0.1:42608	tanh	86.0127	100	11.2746	8.69007	100	10.3995	99

我们再次查看最优超参数。

print("Best hyperparameters for loss found were: ", results.get_best_result("mean_loss", "min").config)

Best hyperparameters for loss found were:  {'activation': 'tanh', 'width': 15.859495323836288, 'height': -92.34898015005697, 'steps': 100}

多目标优化#

最后，我们来看一下多目标情况。这允许我们同时优化多个指标，并根据不同的目标组织结果。

def multi_objective(config):
    # Hyperparameters
    width, height = config["width"], config["height"]

    for step in range(config["steps"]):
        # Iterative training function - can be any arbitrary training procedure
        intermediate_score = evaluate(step, config["width"], config["height"], config["activation"])
        # Feed the score back back to Tune.
        tune.report({
           "iterations": step, "loss": intermediate_score, "gain": intermediate_score * width
        })

这次我们将 metric 和 mode 作为列表参数来定义 OptunaSearch 对象。

searcher = OptunaSearch(metric=["loss", "gain"], mode=["min", "max"])
algo = ConcurrencyLimiter(searcher, max_concurrent=4)

tuner = tune.Tuner(
    multi_objective,
    tune_config=tune.TuneConfig(
        search_alg=algo,
        num_samples=num_samples,
    ),
    param_space=search_space
)
results = tuner.fit();

显示代码单元格输出隐藏代码单元格输出

Tune 状态

当前时间	2025-02-10 18:07:58
运行时长	00:00:35.27
内存	22.7/36.0 GiB

系统信息

使用 FIFO 调度算法。
逻辑资源使用量：1.0/12 CPU, 0/0 GPU

试验状态

试验名称	状态	位置	激活函数	高度	宽度	迭代次数	总时间 (秒)	迭代次数	损失	增益
multi_objective_0534ec01	TERMINATED	127.0.0.1:42659	tanh	18.3209	8.1091	100	10.3653	99	1.95513	15.8543
multi_objective_d3a487a7	TERMINATED	127.0.0.1:42660	relu	-67.8896	2.58816	100	10.3682	99	3.58666	9.28286
multi_objective_f481c3db	TERMINATED	127.0.0.1:42665	relu	46.6439	19.5326	100	10.3677	99	14.7158	287.438
multi_objective_74a41d72	TERMINATED	127.0.0.1:42666	tanh	-31.9508	11.413	100	10.3685	99	-3.10735	-35.4643
multi_objective_d673b1ae	TERMINATED	127.0.0.1:42695	relu	83.6004	5.04972	100	10.3494	99	18.5561	93.7034
multi_objective_25ddc340	TERMINATED	127.0.0.1:42701	relu	-81.7161	4.45303	100	10.382	99	2.05019	9.12955
multi_objective_f8554c17	TERMINATED	127.0.0.1:42702	tanh	43.5854	6.84585	100	10.3638	99	4.50394	30.8333
multi_objective_a144e315	TERMINATED	127.0.0.1:42707	tanh	39.8075	19.1985	100	10.3706	99	4.03309	77.4292
multi_objective_50540842	TERMINATED	127.0.0.1:42739	relu	75.2805	11.4041	100	10.3529	99	17.6158	200.893
multi_objective_f322a9e3	TERMINATED	127.0.0.1:42740	relu	-51.3587	5.31683	100	10.3756	99	5.05057	26.853

现在有两个超参数集对应两个目标。

print("Best hyperparameters for loss found were: ", results.get_best_result("loss", "min").config)
print("Best hyperparameters for gain found were: ", results.get_best_result("gain", "max").config)

Best hyperparameters for loss found were:  {'steps': 100, 'width': 11.41302483988651, 'height': -31.950786209072476, 'activation': 'tanh'}
Best hyperparameters for gain found were:  {'steps': 100, 'width': 19.532566002677832, 'height': 46.643925051045784, 'activation': 'relu'}

我们可以混合使用初始超参数评估、通过定义即运行函数实现的条件搜索空间以及多目标任务。这也适用于调度器的使用，但多目标优化除外——调度器通常依赖于单个标量分数，而不是我们在此使用的两个分数：loss、gain。