这个是我的ray集群信息,单节点,内存资源417G, ray status
======== Autoscaler status: 2024-07-16 09:41:11.299619 ========
Node status
Active:
1 node_6b25042f95a0e21383f0d801838bcf4da193cc29e0ed6e3b0a65dfab
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
Usage:
16.0/128.0 CPU
8.0/8.0 GPU
0B/417.01GiB memory
0B/75.97GiB object_store_memory
Demands:
(no resource demands),但是在运行时,Exception in thread Thread-96:
Traceback (most recent call last):
File “/opt/conda/envs/AudioPipeline/lib/python3.9/threading.py”, line 980, in _bootstrap_inner
self.run()
File “/opt/conda/envs/AudioPipeline/lib/python3.9/threading.py”, line 917, in run
self._target(*self._args, **self._kwargs)
File “/root/AudioDataCollection/main_remote.py”, line 85, in submit_tasks
while not audio_queue.empty():
File “/opt/conda/envs/AudioPipeline/lib/python3.9/site-packages/ray/util/queue.py”, line 78, in empty
return ray.get(self.actor.empty.remote())
File “/opt/conda/envs/AudioPipeline/lib/python3.9/site-packages/ray/_private/auto_init_hook.py”, line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File “/opt/conda/envs/AudioPipeline/lib/python3.9/site-packages/ray/_private/client_mode_hook.py”, line 103, in wrapper
return func(args, **kwargs)
File “/opt/conda/envs/AudioPipeline/lib/python3.9/site-packages/ray/_private/worker.py”, line 2639, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File “/opt/conda/envs/AudioPipeline/lib/python3.9/site-packages/ray/_private/worker.py”, line 866, in get_objects
raise value
ray.exceptions.OutOfMemoryError: Task was killed due to the node running low on memory.
Memory on the node (IP: 192.168.168.190, ID: 6b25042f95a0e21383f0d801838bcf4da193cc29e0ed6e3b0a65dfab) where the task (task ID: ffffffffffffffffb375e243356e49c6bc50cb9401000000, name=_QueueActor.init, pid=22724, memory used=0.05GB) was running was 485.16GB / 503.04GB (0.964451), which exceeds the memory usage threshold of 0.95. Ray killed this worker (ID: 9fe1a34b66eb4597152e5e3835c92767ccf3ab340765772428489292) because it was the most recently scheduled task; to see more information about memory usage on this node, use ray logs raylet.out -ip 192.168.168.190
. To see the logs of the worker, use `ray logs worker-9fe1a34b66eb4597152e5e3835c92767ccf3ab340765772428489292out -ip 192.168.168.190. Top 10 memory users:
PID MEM(GB) COMMAND报上面的错误,Task was killed due to the node running low on memory.,这个是什么原因,大佬帮忙看看,感谢。