ray 1.8.0运行卡住

【Ray使用环境】生产/测试/POC
CPU: ARM
OS: openEuler

【Ray版本和类库】
ray 版本:1.8.0

ray 启动: ray start --head
其它三个服务器执行命令:ray start --address=‘xxx.xxx.xxx.xxx:6379’ --redis-password=‘524195000000000’

【运行代码】
#include <ray/api.h>
int Plus(int x, int y) { return x + y; }

RAY_REMOTE(Plus);

class Counter {
public:
int count;

Counter(int init) { count = init; }
static Counter *FactoryCreate(int init) { return new Counter(init); }

int Add(int x)
{
count += x;
return count;
}
};

RAY_REMOTE(Counter::FactoryCreate, &Counter::Add);

const std::unordered_map<std::string, double> RESOUECES_ACTOR{
{“CPU”, 1.0}, {“memory”, 1024.0 * 1024.0 * 1024.0}};

const std::unordered_map<std::string, double> RESOUECES{
{“CPU”, 1.0}, {“memory”, 1024.0 * 1024.0 * 1024.0}};

int main(int argc, char **argv)
{
ray::Init();

ray::ActorHandle<Counter> actor_1 = ray::Actor(CreateMainServer)
	.SetName(MAIN_SERVER_NAME)
	.SetResources(RESOUECES)
	.SetPlacementGroup(placement_group, 0)
	.Remote();

auto actor_object = actor.Task(&Counter::Add).Remote(3);
int actor_task_sesult = *(ray::Get(actor_object));
std::cout << "actor_task_result = " << actor_task_sesult << std::endl;

ray::ActorHandle<Counter> actor_2 = ray::Actor(CreateMainServer)
	.SetName(MAIN_SERVER_NAME)
	.SetResources(RESOUECES)
	.SetPlacementGroup(placement_group, 0)
	.Remote();

auto a2 = actor.Task(&Counter::Add).Remote(10);
int actor_2 = *(ray::Get(a2));
std::cout << "actor_2 = " << actor_2 << std::endl;	

ray::Shutdown();
return 0;

}

程序执行状态:运行卡住
gcs_server.out 信息:
Export metrice to agent failed: IOError:14:failed to connect to all addresses. This won`t affect Ray, but you can lose metrics form the cluster.

日志报错:IOError:14:failed to connect to all addresses. This won`t affect Ray, but you can lose metrics form the cluster.

一般是哪方面原因,是否会影响到任务在分布式系统执行

经过线下沟通,问题已解决,用户需要手动分发so到所有的ray node上。后续社区支持job submission后可以实现自动分发。