Failed to download runtime_env file package

【Ray使用环境】生产
【Ray版本和类库】 ray 2.4, python 3.10
【使用现场】报错信息如下:
2023-11-20 20:03:30,068 INFO runtime_env_agent.py:497 – Got request from raylet to decrease reference for runtime env: {“working_dir”: “gcs://_ray_pkg_62304e6af9aab441.zip”}.
2023-11-20 20:03:30,068 WARNING runtime_env_agent.py:128 – Runtime env {“working_dir”: “gcs://_ray_pkg_62304e6af9aab441.zip”} does not exist.
2023-11-20 20:03:30,068 WARNING runtime_env_agent.py:109 – URI gcs://_ray_pkg_62304e6af9aab441.zip does not exist.
2023-11-20 20:03:30,069 INFO runtime_env_agent.py:339 – Creating runtime env: {“working_dir”: “gcs://_ray_pkg_62304e6af9aab441.zip”} with timeout 600 seconds.
2023-11-20 20:03:30,070 INFO runtime_env_agent.py:497 – Got request from raylet to decrease reference for runtime env: {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
2023-11-20 20:03:30,070 WARNING runtime_env_agent.py:128 – Runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”} does not exist.
2023-11-20 20:03:30,070 WARNING runtime_env_agent.py:109 – URI gcs://_ray_pkg_07d964c2ad903c39.zip does not exist.
2023-11-20 20:03:30,070 INFO runtime_env_agent.py:339 – Creating runtime env: {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”} with timeout 600 seconds.
2023-11-20 20:03:30,071 INFO runtime_env_agent.py:497 – Got request from raylet to decrease reference for runtime env: {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
2023-11-20 20:03:30,071 INFO runtime_env_agent.py:130 – Unused runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
2023-11-20 20:03:30,071 INFO runtime_env_agent.py:111 – Unused uris [(‘gcs://_ray_pkg_07d964c2ad903c39.zip’, ‘working_dir’)].
2023-11-20 20:03:30,082 ERROR runtime_env_agent.py:365 – Failed to create runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
Traceback (most recent call last):
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 357, in _create_runtime_env_with_retry
runtime_env_context = await asyncio.wait_for(
File “/opt/conda/envs/artemis/lib/python3.10/asyncio/tasks.py”, line 445, in wait_for
return fut.result()
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 312, in _setup_runtime_env
await create_for_plugin_if_needed(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/plugin.py”, line 252, in create_for_plugin_if_needed
size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/working_dir.py”, line 155, in create
local_dir = await download_and_unpack_package(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/packaging.py”, line 655, in download_and_unpack_package
raise IOError(
OSError: Failed to download runtime_env file package gcs://_ray_pkg_07d964c2ad903c39.zip from the GCS to the Ray worker node. The package may have prematurely been deleted from the GCS due to a long upload time or a problem with Ray. Try setting the environment variable RAY_RUNTIME_ENV_TEMPORARY_REFERENCE_EXPIRATION_S to a value larger than the upload time in seconds (the default is 600). If this fails, try re-running after making any change to a file in the file package.
2023-11-20 20:03:30,110 INFO runtime_env_agent.py:390 – Successfully created runtime env: {“working_dir”: “gcs://_ray_pkg_62304e6af9aab441.zip”}, the context: {“command_prefix”: [“cd”, “/tmp/ray/session_2023-11-20_20-03-27_995203_19/runtime_resources/working_dir_files/_ray_pkg_62304e6af9aab441”, “&&”], “env_vars”: {“PYTHONPATH”: “/tmp/ray/session_2023-11-20_20-03-27_995203_19/runtime_resources/working_dir_files/_ray_pkg_62304e6af9aab441”}, “py_executable”: “/opt/conda/envs/artemis/bin/python”, “resources_dir”: null, “container”: {}, “java_jars”: []}
2023-11-20 20:03:30,110 INFO runtime_env_agent.py:426 – Runtime env already created successfully. Env: {“working_dir”: “gcs://_ray_pkg_62304e6af9aab441.zip”}, context: {“command_prefix”: [“cd”, “/tmp/ray/session_2023-11-20_20-03-27_995203_19/runtime_resources/working_dir_files/_ray_pkg_62304e6af9aab441”, “&&”], “env_vars”: {“PYTHONPATH”: “/tmp/ray/session_2023-11-20_20-03-27_995203_19/runtime_resources/working_dir_files/_ray_pkg_62304e6af9aab441”}, “py_executable”: “/opt/conda/envs/artemis/bin/python”, “resources_dir”: null, “container”: {}, “java_jars”: []}
2023-11-20 20:03:31,086 ERROR runtime_env_agent.py:365 – Failed to create runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
Traceback (most recent call last):
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 357, in _create_runtime_env_with_retry
runtime_env_context = await asyncio.wait_for(
File “/opt/conda/envs/artemis/lib/python3.10/asyncio/tasks.py”, line 445, in wait_for
return fut.result()
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 312, in _setup_runtime_env
await create_for_plugin_if_needed(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/plugin.py”, line 252, in create_for_plugin_if_needed
size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/working_dir.py”, line 155, in create
local_dir = await download_and_unpack_package(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/packaging.py”, line 655, in download_and_unpack_package
raise IOError(
OSError: Failed to download runtime_env file package gcs://_ray_pkg_07d964c2ad903c39.zip from the GCS to the Ray worker node. The package may have prematurely been deleted from the GCS due to a long upload time or a problem with Ray. Try setting the environment variable RAY_RUNTIME_ENV_TEMPORARY_REFERENCE_EXPIRATION_S to a value larger than the upload time in seconds (the default is 600). If this fails, try re-running after making any change to a file in the file package.
2023-11-20 20:03:32,090 ERROR runtime_env_agent.py:365 – Failed to create runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
Traceback (most recent call last):
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 357, in _create_runtime_env_with_retry
runtime_env_context = await asyncio.wait_for(
File “/opt/conda/envs/artemis/lib/python3.10/asyncio/tasks.py”, line 445, in wait_for
return fut.result()
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 312, in _setup_runtime_env
await create_for_plugin_if_needed(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/plugin.py”, line 252, in create_for_plugin_if_needed
size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/working_dir.py”, line 155, in create
local_dir = await download_and_unpack_package(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/packaging.py”, line 655, in download_and_unpack_package
raise IOError(
OSError: Failed to download runtime_env file package gcs://_ray_pkg_07d964c2ad903c39.zip from the GCS to the Ray worker node. The package may have prematurely been deleted from the GCS due to a long upload time or a problem with Ray. Try setting the environment variable RAY_RUNTIME_ENV_TEMPORARY_REFERENCE_EXPIRATION_S to a value larger than the upload time in seconds (the default is 600). If this fails, try re-running after making any change to a file in the file package.
2023-11-20 20:03:33,091 ERROR runtime_env_agent.py:383 – Runtime env creation failed for 3 times, don’t retry any more.
2023-11-20 20:03:33,091 INFO runtime_env_agent.py:130 – Unused runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}.
2023-11-20 20:03:33,091 INFO runtime_env_agent.py:111 – Unused uris [(‘gcs://_ray_pkg_07d964c2ad903c39.zip’, ‘working_dir’)].
2023-11-20 20:03:33,091 INFO runtime_env_agent.py:437 – Runtime env already failed. Env: {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”}, err: Traceback (most recent call last):
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 357, in _create_runtime_env_with_retry
runtime_env_context = await asyncio.wait_for(
File “/opt/conda/envs/artemis/lib/python3.10/asyncio/tasks.py”, line 445, in wait_for
return fut.result()
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/dashboard/modules/runtime_env/runtime_env_agent.py”, line 312, in _setup_runtime_env
await create_for_plugin_if_needed(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/plugin.py”, line 252, in create_for_plugin_if_needed
size_bytes = await plugin.create(uri, runtime_env, context, logger=logger)
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/working_dir.py”, line 155, in create
local_dir = await download_and_unpack_package(
File “/opt/conda/envs/artemis/lib/python3.10/site-packages/ray/_private/runtime_env/packaging.py”, line 655, in download_and_unpack_package
raise IOError(
OSError: Failed to download runtime_env file package gcs://_ray_pkg_07d964c2ad903c39.zip from the GCS to the Ray worker node. The package may have prematurely been deleted from the GCS due to a long upload time or a problem with Ray. Try setting the environment variable RAY_RUNTIME_ENV_TEMPORARY_REFERENCE_EXPIRATION_S to a value larger than the upload time in seconds (the default is 600). If this fails, try re-running after making any change to a file in the file package.

2023-11-20 20:03:33,091 WARNING runtime_env_agent.py:128 – Runtime env {“working_dir”: “gcs://_ray_pkg_07d964c2ad903c39.zip”} does not exist.
2023-11-20 20:03:33,091 WARNING runtime_env_agent.py:109 – URI gcs://_ray_pkg_07d964c2ad903c39.zip does not exist.

使用的方式通过ray client的方式提交任务。

ray.init(
runtime_env={
“working_dir”: _get_working_dir(),
“excludes”: [“.ipynb", “tests”, “folder_tmp”, "_test.py”, “*.html”],
},
)

这个报错看着是因为本地的working_dir内容没有变过,所以hash值没有变化,就不会再upload的gcs了,但是其他worker节点从gcs下载的时候,貌似被gcs已经删除了。这个应该怎么解决啊

最好用自己的对象存储,gcs分发文件这套只是做实验方便,上生产问题会比较多。

ray gcs 本身支持对象存储吗

是指runtime_env={
“working_dir”: [“S3://xxxx”]
}
这样吗