Ray 事件导出#
从 2.49 版本开始,Ray 支持将结构化事件导出到配置的 HTTP 端点。每个节点通过 HTTP POST 请求将事件发送到该端点。
以前,Ray 的 任务事件 仅供 Ray Dashboard 和 State API 内部用于监控和调试。借助新的事件导出功能,您现在可以将这些原始事件发送到外部系统,以进行自定义分析、监控以及与第三方工具集成。
注意
Ray 事件导出仍处于 Alpha 阶段。事件报告的配置方式以及事件的格式可能会发生变化。
启用事件报告#
要启用事件报告,您需要在启动每个 Ray 工作节点时将 RAY_enable_core_worker_ray_event_to_aggregator 环境变量设置为 1。
要设置目标 HTTP 端点,请将 RAY_DASHBOARD_AGGREGATOR_AGENT_EVENTS_EXPORT_ADDR 环境变量设置为一个有效的 HTTP URL,并使用 http:// URL 方案。
事件格式#
事件是 POST 请求体中的 JSON 对象。
所有事件都包含相同的基本字段和不同的事件特定字段。有关基本字段,请参见 src/ray/protobuf/public/events_base_event.proto。
任务事件#
对于每个任务,Ray 会导出两种类型的事件:任务定义事件(Task Definition Event)和任务执行事件(Task Execution Event)。
每次任务尝试都会生成一个包含任务元数据的任务定义事件。有关普通任务和 actor 任务的事件格式,请分别参见 src/ray/protobuf/public/events_task_definition_event.proto 和 src/ray/protobuf/public/events_actor_task_definition_event.proto。
任务执行事件包含任务状态转换信息和任务执行期间生成的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_task_lifecycle_event.proto。
任务定义事件和任务执行事件的示例
// task definition event
{
"eventId":"N5n229xkwyjlZRFJDF2G1sh6ZNYlqChwJ4WPEQ==",
"sourceType":"CORE_WORKER",
"eventType":"TASK_DEFINITION_EVENT",
"timestamp":"2025-09-03T18:52:14.467290Z",
"severity":"INFO",
"sessionName":"session_2025-09-03_11-52-12_635210_85618",
"taskDefinitionEvent":{
"taskId":"yO9FzNARJXH///////////////8BAAAA",
"taskFunc":{
"pythonFunctionDescriptor":{
"moduleName":"test-tasks",
"functionName":"test_task",
"functionHash":"37ddb110c0514b049bd4db5ab934627b",
"className":""
}
},
"taskName":"test_task",
"requiredResources":{
"CPU":1.0
},
"serialized_runtime_env": "{}",
"jobId":"AQAAAA==",
"parentTaskId":"//////////////////////////8BAAAA",
"placementGroupId":"////////////////////////",
"taskAttempt":0,
"taskType":"NORMAL_TASK",
"language":"PYTHON",
"refIds":{
}
},
"message":""
}
// task lifecycle event
{
"eventId":"vkIaAHlQC5KoppGosqs2kBq5k2WzsAAbawDDbQ==",
"sourceType":"CORE_WORKER",
"eventType":"TASK_LIFECYCLE_EVENT",
"timestamp":"2025-09-03T18:52:14.469074Z",
"severity":"INFO",
"sessionName":"session_2025-09-03_11-52-12_635210_85618",
"taskLifecycleEvent":{
"taskId":"yO9FzNARJXH///////////////8BAAAA",
"stateTransitions": [
{
"state":"PENDING_NODE_ASSIGNMENT",
"timestamp":"2025-09-03T18:52:14.467402Z"
},
{
"state":"PENDING_ARGS_AVAIL",
"timestamp":"2025-09-03T18:52:14.467290Z"
},
{
"state":"SUBMITTED_TO_WORKER",
"timestamp":"2025-09-03T18:52:14.469074Z"
}
],
"nodeId":"ZvxTI6x9dlMFqMlIHErJpg5UEGK1INsKhW2zyg==",
"workerId":"hMybCNYIFi+/yInYYhdc+qH8yMF65j/8+uCTmw==",
"jobId":"AQAAAA==",
"taskAttempt":0,
"workerPid":0
},
"message":""
}
Actor 事件#
对于每个 actor,Ray 会导出两种类型的事件:Actor 定义事件(Actor Definition Events)和 Actor 生命周期事件(Actor Lifecycle Events)。
Actor 定义事件包含 actor 定义时的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_actor_definition_event.proto。
Actor 生命周期事件包含 actor 状态转换信息以及与每次转换相关的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_actor_lifecycle_event.proto。
// actor definition event
{
"eventId": "gsRtAfaWn5TZsjUPFm8nOXd/cKGz82FXdr3Lqg==",
"sourceType": "GCS",
"eventType": "ACTOR_DEFINITION_EVENT",
"timestamp": "2025-10-24T21:12:10.742651Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-12-05_804800_55420",
"actorDefinitionEvent": {
"actorId": "0AFtngcXtEoxwqmJAQAAAA==",
"jobId": "AQAAAA==",
"name": "actor-test",
"rayNamespace": "bd2ad7f8-650b-495c-b709-55d4c8a7d09f",
"serializedRuntimeEnv": "{}",
"className": "test_ray_actor_events.<locals>.A",
"isDetached": false,
"requiredResources": {},
"placementGroupId": "",
"labelSelector": {}
},
"message": ""
}
// actor lifecycle event
{
"eventId": "mOdfn5SRx3X0B05OvEDV0rcIOzqf/SGBJmrD/Q==",
"sourceType": "GCS",
"eventType": "ACTOR_LIFECYCLE_EVENT",
"timestamp": "2025-10-24T21:12:10.742654Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-12-05_804800_55420",
"actorLifecycleEvent": {
"actorId": "0AFtngcXtEoxwqmJAQAAAA==",
"stateTransitions": [
{
"timestamp": "2025-10-24T21:12:10.742654Z",
"state": "ALIVE",
"nodeId": "zpLG7coqThVMl8df9RYHnhK6thhJqrgPodtfjg==",
"workerId": "nrBehSG3HXu0PvHZBkPl2kovmjzAaoCuVj2KHA=="
}
]
},
"message": ""
}
Driver 作业事件#
对于每个 driver 作业,Ray 会导出两种类型的事件:Driver 作业定义事件(Driver Job Definition Events)和 Driver 作业生命周期事件(Driver Job Lifecycle Events)。
Driver 作业定义事件包含 driver 作业定义时的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_driver_job_definition_event.proto。
Driver 作业生命周期事件包含 driver 作业状态转换信息以及与每次转换相关的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_driver_job_lifecycle_event.proto。
// driver job definition event
{
"eventId": "7YnwZPJr0KUC28T7KnzsvGyceEIrjNDTHuQfrg==",
"sourceType": "GCS",
"eventType": "DRIVER_JOB_DEFINITION_EVENT",
"timestamp": "2025-10-24T21:17:07.316482Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-17-05_575968_59360",
"driverJobDefinitionEvent": {
"jobId": "AQAAAA==",
"driverPid": "59360",
"driverNodeId": "9eHWUIruJWnMjQuPas0W+TRNUyjY5PwFpWUfjA==",
"entrypoint": "...",
"config": {
"serializedRuntimeEnv": "{}",
"metadata": {}
}
},
"message": ""
}
// driver job lifecycle event
{
"eventId": "0cmbCI/RQghYe4ZQiJ+HrnK1RiZH+cg8ltBx2w==",
"sourceType": "GCS",
"eventType": "DRIVER_JOB_LIFECYCLE_EVENT",
"timestamp": "2025-10-24T21:17:07.316483Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-17-05_575968_59360",
"driverJobLifecycleEvent": {
"jobId": "AQAAAA==",
"stateTransitions": [
{
"state": "CREATED",
"timestamp": "2025-10-24T21:17:07.316483Z"
}
]
},
"message": ""
}
节点事件#
对于每个节点,Ray 会导出两种类型的事件:节点定义事件(Node Definition Events)和节点生命周期事件(Node Lifecycle Events)。
节点定义事件包含节点定义时的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_node_definition_event.proto。
节点生命周期事件包含节点状态转换信息以及与每次转换相关的元数据。有关事件格式,请参见 src/ray/protobuf/public/events_node_lifecycle_event.proto。
// node definition event
{
"eventId": "l7r4gwq4UPhmZGFJYEym6mUkcxqafra60LB6/Q==",
"sourceType": "GCS",
"eventType": "NODE_DEFINITION_EVENT",
"timestamp": "2025-10-24T21:19:14.063953Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-19-12_675240_61141",
"nodeDefinitionEvent": {
"nodeId": "0yfRX1ex+VtcC+TFXjXcgesdpnEwM76+pEATrQ==",
"nodeIpAddress": "127.0.0.1",
"labels": {
"ray.io/node-id": "d327d15f57b1f95b5c0be4c55e35dc81eb1da6713033bebea44013ad"
},
"startTimestamp": "2025-10-24T21:19:14.063Z"
},
"message": ""
}
// node lifecycle event
{
"eventId": "u3KTG8615MIKBH5PLcii0BMfGFWcvLuSOXM6zg==",
"sourceType": "GCS",
"eventType": "NODE_LIFECYCLE_EVENT",
"timestamp": "2025-10-24T21:19:14.063955Z",
"severity": "INFO",
"sessionName": "session_2025-10-24_14-19-12_675240_61141",
"nodeLifecycleEvent": {
"nodeId": "0yfRX1ex+VtcC+TFXjXcgesdpnEwM76+pEATrQ==",
"stateTransitions": [
{
"timestamp": "2025-10-24T21:19:14.063955Z",
"resources": {"node:__internal_head__": 1.0, "CPU": 1.0, "object_store_memory": 157286400.0, "node:127.0.0.1": 1.0, "memory": 42964287488.0},
"state": "ALIVE",
"aliveSubState": "UNSPECIFIED"
}
]
},
"message": ""
}
高层架构#
下图显示了 Ray 事件导出的高层架构。
所有 Ray 组件通过 gRPC 将事件发送到聚合器代理(aggregator agent)。每个节点上都有一个聚合器代理。聚合器代理收集该节点上的所有事件,并将事件发送到配置的 HTTP 端点。