Ray 事件导出#

从 2.49 版本开始,Ray 支持将结构化事件导出到配置的 HTTP 端点。每个节点通过 HTTP POST 请求将事件发送到该端点。

以前,Ray 的 任务事件 仅供 Ray Dashboard 和 State API 内部用于监控和调试。借助新的事件导出功能,您现在可以将这些原始事件发送到外部系统,以进行自定义分析、监控以及与第三方工具集成。

注意

Ray 事件导出仍处于 Alpha 阶段。事件报告的配置方式以及事件的格式可能会发生变化。

启用事件报告#

要启用事件报告,您需要在启动每个 Ray 工作节点时将 RAY_enable_core_worker_ray_event_to_aggregator 环境变量设置为 1

要设置目标 HTTP 端点,请将 RAY_DASHBOARD_AGGREGATOR_AGENT_EVENTS_EXPORT_ADDR 环境变量设置为一个有效的 HTTP URL,并使用 http:// URL 方案。

事件格式#

事件是 POST 请求体中的 JSON 对象。

所有事件都包含相同的基本字段和不同的事件特定字段。有关基本字段,请参见 src/ray/protobuf/public/events_base_event.proto

任务事件#

对于每个任务,Ray 会导出两种类型的事件:任务定义事件(Task Definition Event)和任务执行事件(Task Execution Event)。

任务定义事件和任务执行事件的示例

// task definition event
{
    "eventId":"N5n229xkwyjlZRFJDF2G1sh6ZNYlqChwJ4WPEQ==",
    "sourceType":"CORE_WORKER",
    "eventType":"TASK_DEFINITION_EVENT",
    "timestamp":"2025-09-03T18:52:14.467290Z",
    "severity":"INFO",
    "sessionName":"session_2025-09-03_11-52-12_635210_85618",
    "taskDefinitionEvent":{
        "taskId":"yO9FzNARJXH///////////////8BAAAA",
        "taskFunc":{
            "pythonFunctionDescriptor":{
                "moduleName":"test-tasks",
                "functionName":"test_task",
                "functionHash":"37ddb110c0514b049bd4db5ab934627b",
                "className":""
            }
        },
        "taskName":"test_task",
        "requiredResources":{
            "CPU":1.0
        },
        "serialized_runtime_env": "{}",
        "jobId":"AQAAAA==",
        "parentTaskId":"//////////////////////////8BAAAA",
        "placementGroupId":"////////////////////////",
        "taskAttempt":0,
        "taskType":"NORMAL_TASK",
        "language":"PYTHON",
        "refIds":{

        }
    },
    "message":""
}

// task lifecycle event
{
    "eventId":"vkIaAHlQC5KoppGosqs2kBq5k2WzsAAbawDDbQ==",
    "sourceType":"CORE_WORKER",
    "eventType":"TASK_LIFECYCLE_EVENT",
    "timestamp":"2025-09-03T18:52:14.469074Z",
    "severity":"INFO",
    "sessionName":"session_2025-09-03_11-52-12_635210_85618",
    "taskLifecycleEvent":{
        "taskId":"yO9FzNARJXH///////////////8BAAAA",
        "stateTransitions": [
            {
                "state":"PENDING_NODE_ASSIGNMENT",
                "timestamp":"2025-09-03T18:52:14.467402Z"
            },
            {
                "state":"PENDING_ARGS_AVAIL",
                "timestamp":"2025-09-03T18:52:14.467290Z"
            },
            {
                "state":"SUBMITTED_TO_WORKER",
                "timestamp":"2025-09-03T18:52:14.469074Z"
            }
        ],
        "nodeId":"ZvxTI6x9dlMFqMlIHErJpg5UEGK1INsKhW2zyg==",
        "workerId":"hMybCNYIFi+/yInYYhdc+qH8yMF65j/8+uCTmw==",
        "jobId":"AQAAAA==",
        "taskAttempt":0,
        "workerPid":0
    },
    "message":""
}

Actor 事件#

对于每个 actor,Ray 会导出两种类型的事件:Actor 定义事件(Actor Definition Events)和 Actor 生命周期事件(Actor Lifecycle Events)。

// actor definition event
{
    "eventId": "gsRtAfaWn5TZsjUPFm8nOXd/cKGz82FXdr3Lqg==",
    "sourceType": "GCS",
    "eventType": "ACTOR_DEFINITION_EVENT",
    "timestamp": "2025-10-24T21:12:10.742651Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-12-05_804800_55420",
    "actorDefinitionEvent": {
        "actorId": "0AFtngcXtEoxwqmJAQAAAA==",
        "jobId": "AQAAAA==",
        "name": "actor-test",
        "rayNamespace": "bd2ad7f8-650b-495c-b709-55d4c8a7d09f",
        "serializedRuntimeEnv": "{}",
        "className": "test_ray_actor_events.<locals>.A",
        "isDetached": false,
        "requiredResources": {},
        "placementGroupId": "",
        "labelSelector": {}
    },
    "message": ""
}

// actor lifecycle event
{
    "eventId": "mOdfn5SRx3X0B05OvEDV0rcIOzqf/SGBJmrD/Q==",
    "sourceType": "GCS",
    "eventType": "ACTOR_LIFECYCLE_EVENT",
    "timestamp": "2025-10-24T21:12:10.742654Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-12-05_804800_55420",
    "actorLifecycleEvent": {
        "actorId": "0AFtngcXtEoxwqmJAQAAAA==",
        "stateTransitions": [
            {
                "timestamp": "2025-10-24T21:12:10.742654Z",
                "state": "ALIVE",
                "nodeId": "zpLG7coqThVMl8df9RYHnhK6thhJqrgPodtfjg==",
                "workerId": "nrBehSG3HXu0PvHZBkPl2kovmjzAaoCuVj2KHA=="
            }
        ]
    },
    "message": ""
}

Driver 作业事件#

对于每个 driver 作业,Ray 会导出两种类型的事件:Driver 作业定义事件(Driver Job Definition Events)和 Driver 作业生命周期事件(Driver Job Lifecycle Events)。

// driver job definition event
{
    "eventId": "7YnwZPJr0KUC28T7KnzsvGyceEIrjNDTHuQfrg==",
    "sourceType": "GCS",
    "eventType": "DRIVER_JOB_DEFINITION_EVENT",
    "timestamp": "2025-10-24T21:17:07.316482Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-17-05_575968_59360",
    "driverJobDefinitionEvent": {
        "jobId": "AQAAAA==",
        "driverPid": "59360",
        "driverNodeId": "9eHWUIruJWnMjQuPas0W+TRNUyjY5PwFpWUfjA==",
        "entrypoint": "...",
        "config": {
            "serializedRuntimeEnv": "{}",
            "metadata": {}
        }
    },
    "message": ""
}

// driver job lifecycle event
{
    "eventId": "0cmbCI/RQghYe4ZQiJ+HrnK1RiZH+cg8ltBx2w==",
    "sourceType": "GCS",
    "eventType": "DRIVER_JOB_LIFECYCLE_EVENT",
    "timestamp": "2025-10-24T21:17:07.316483Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-17-05_575968_59360",
    "driverJobLifecycleEvent": {
        "jobId": "AQAAAA==",
        "stateTransitions": [
            {
                "state": "CREATED",
                "timestamp": "2025-10-24T21:17:07.316483Z"
            }
        ]
    },
    "message": ""
}

节点事件#

对于每个节点,Ray 会导出两种类型的事件:节点定义事件(Node Definition Events)和节点生命周期事件(Node Lifecycle Events)。

// node definition event
{
    "eventId": "l7r4gwq4UPhmZGFJYEym6mUkcxqafra60LB6/Q==",
    "sourceType": "GCS",
    "eventType": "NODE_DEFINITION_EVENT",
    "timestamp": "2025-10-24T21:19:14.063953Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-19-12_675240_61141",
    "nodeDefinitionEvent": {
        "nodeId": "0yfRX1ex+VtcC+TFXjXcgesdpnEwM76+pEATrQ==",
        "nodeIpAddress": "127.0.0.1",
        "labels": {
            "ray.io/node-id": "d327d15f57b1f95b5c0be4c55e35dc81eb1da6713033bebea44013ad"
        },
        "startTimestamp": "2025-10-24T21:19:14.063Z"
    },
    "message": ""
}

// node lifecycle event
{
    "eventId": "u3KTG8615MIKBH5PLcii0BMfGFWcvLuSOXM6zg==",
    "sourceType": "GCS",
    "eventType": "NODE_LIFECYCLE_EVENT",
    "timestamp": "2025-10-24T21:19:14.063955Z",
    "severity": "INFO",
    "sessionName": "session_2025-10-24_14-19-12_675240_61141",
    "nodeLifecycleEvent": {
        "nodeId": "0yfRX1ex+VtcC+TFXjXcgesdpnEwM76+pEATrQ==",
        "stateTransitions": [
            {
                "timestamp": "2025-10-24T21:19:14.063955Z",
                "resources": {"node:__internal_head__": 1.0, "CPU": 1.0, "object_store_memory": 157286400.0, "node:127.0.0.1": 1.0, "memory": 42964287488.0},
                "state": "ALIVE",
                "aliveSubState": "UNSPECIFIED"
            }
        ]
    },
    "message": ""
}

高层架构#

下图显示了 Ray 事件导出的高层架构。

../../_images/ray-event-export.png

所有 Ray 组件通过 gRPC 将事件发送到聚合器代理(aggregator agent)。每个节点上都有一个聚合器代理。聚合器代理收集该节点上的所有事件,并将事件发送到配置的 HTTP 端点。