灵活、跨语言、分布式的模型推理服务：Ray Serve with Java API

osanswerPM · 2022 年12 月 16 日 08:04

What is Ray Serve

Ray Serve是基于Ray框架的模型在线推理服务框架。相比于其他的推理服务框架，Ray Serve注重于分布式的弹性扩缩容，多个模型组成的推理图，以及极多模型复用少量硬件的业务场景。突出Ray的弹性调度能力以及高性能的RPC。Ray Serve支持任意的机器学习框架，自带微批处理功能提高吞吐量，以及原生支持FastAPI框架。

Ray-Serve-Architecture
Background on Ray

Ray是一个高性能的分布式运行时。Ray 旨在为分布式计算提供一个通用的 API。实现这一目标的核心部分是提供简单且通用的编程抽象，让系统完成所有复杂的工作（内存共享，分布式容错，扩缩容，调度，以及可观测功能）。这种理念使得开发人员可以将 Ray 与现有的 Python和Java库一起使用。

Ray 寻求在一般情况下实现分布式应用程序和库的组合开发。具体来说，这包括粗粒度弹性工作负载（即Serverless计算类型）、机器学习训练（例如 Ray AIR）、在线服务（例如 Ray Serve）、数据处理（例如 Ray Datasets、Modin、Dask-on -Ray）和临时计算（例如，并行化 Python 应用程序，将不同的分布式框架结合在一起）。

Ray 的 API 使开发人员能够轻松地在单个分布式应用程序中组合多个库。例如，Ray 任务和参与者可能会调用分布式训练（例如，torch.distributed），或者调用在 Ray 中运行的在线服务。这就是 Ray AI Runtime (AIR) 的构建方式，通过将其他 Ray 库组合在一起。从这个意义上说，Ray 是一个优秀的“分布式胶水”系统，因为它的 API 是通用的并且性能足以充当许多不同工作负载类型之间的接口。

Ray Serve with Java

蚂蚁集团Ant Ray Serving正在与Anyscale Ray Serve合作，目标是将两者融合。Java是计算机界主流的编程语言之一，拥有庞大的用户群体，Ant Ray Serving的很大一部分业务场景也是Java语言开发，因此我们在Ray Serve中贡献了Java语言支持，既拓展了Java场景，又为未来的深度融合打下基础。现在，Ray Serve原生支持了Java。用户可以将自己的Java代码部署为Deployment，并且通过Java API对其进行调用和管理。同时，Python API也可以跨语言作用于Java Deployment，反之亦然。本章节，我们会对Ray Serve的Java部分进行详细介绍。

Start Ray Serve

与Ray Serve的Python API一样，在Java中使用Ray Serve之前，我们需要进行start操作，拉起Serve的Controller和Proxy角色。例如：

Serve.start(true, false, null);

Creating a Deployment

通过在Serve.deployment()接口中指定类全名，用户可以创建和部署deployment：

public static class Counter {
private AtomicInteger value;
public Counter(String value) {
  this.value = new AtomicInteger(Integer.valueOf(value));
}

public String call(String delta) {
  return String.valueOf(value.addAndGet(Integer.valueOf(delta)));
}
}

public void create() {
Serve.deployment()
.setName(“counter”)
.setDeploymentDef(Counter.class.getName())
.setInitArgs(new Object[] {“1”})
.setNumReplicas(1)
.create()
.deploy(true);
}
Accessing a Deployment

一旦deployment创建成功，就可以通过它的名字进行查询：

 public Deployment query() {
    Deployment deployment = Serve.getDeployment("counter");
    return deployment;
  }

Calling a Deployment

对于一个已经部署的deployment，可以通过Java的RayServeHandle进行调用，例如：

Deployment deployment = Serve.getDeployment("counter");
System.out.println(deployment.getHandle().remote("10").get());

同样，也可以通过HTTP进行调用：

curl -d '"10"' http://127.0.0.1:8000/counter

Updating a Deployment

一个deployment部署之后，可以修改它的配置进行重新部署。例如下面的代码，将“counter”的复本数改为2：

 public void update() {
    Serve.deployment()
        .setName("counter")
        .setDeploymentDef(Counter.class.getName())
        .setInitArgs(new Object[] {"2"})
        .setNumReplicas(1)
        .create()
        .deploy(true);
  }

Configuring a Deployment

通过Java API，我们还可以对deployment进行配置：

对deployment复本数进行扩缩容
指定每个复本的CPU或GPU资源

Scaling Out

deployment的numReplicas参数可以指定复本数，并且该参数可以动态调整，例如：


 public void scaleOut() {
    Deployment deployment = Serve.getDeployment("counter");

    // Scale up to 2 replicas.
    deployment.options().setNumReplicas(2).create().deploy(true);

    // Scale down to 1 replica.
    deployment.options().setNumReplicas(1).create().deploy(true);
  }

Resource Management (CPUs, GPUs)

通过deployment的rayActorOptions参数，可以设置每个deployment复本绑定的资源，例如一个GPU：

 public void manageResource() {
    Map<String, Object> rayActorOptions = new HashMap<>();
    rayActorOptions.put("num_gpus", 1);
    Serve.deployment()
        .setName("counter")
        .setDeploymentDef(Counter.class.getName())
        .setRayActorOptions(rayActorOptions)
        .create()
        .deploy(true);
  }

Cross Language Deployment

通过Java API，我们也可以跨语言来部署和调用Python的deployment。假设在/path/to/code/目录下有一个Python文件counter.py：

from ray import serve

@serve.deployment
class Counter(object):
    def __init__(self, value):
        self.value = int(value)

    def increase(self, delta):
        self.value += int(delta)
        return str(self.value)

部署和调用这个Python deployment的示例如下：

import io.ray.api.Ray;
import io.ray.serve.api.Serve;
import io.ray.serve.deployment.Deployment;
import io.ray.serve.generated.DeploymentLanguage;
import java.io.File;

public class ManagePythonDeployment {

  public static void main(String[] args) {

    System.setProperty(
        "ray.job.code-search-path",
        System.getProperty("java.class.path") + File.pathSeparator + "/path/to/code/");

    Serve.start(true, false, null);

    Deployment deployment =
        Serve.deployment()
            .setDeploymentLanguage(DeploymentLanguage.PYTHON)
            .setName("counter")
            .setDeploymentDef("counter.Counter")
            .setNumReplicas(1)
            .setInitArgs(new Object[] {"1"})
            .create();
    deployment.deploy(true);

    System.out.println(Ray.get(deployment.getHandle().method("increase").remote("2")));
  }
}

注意：在Ray.init或者Serve.start之前，需要指定Python代码的目录，详情参考《Cross-Language Programming》。

https://docs.ray.io/en/master/ray-core/cross-language.html#cross-language

Conclusion

这篇文章简单介绍了Ray Serve及其Java API的使用。除此之外，Ray Serve还有更多有趣且实用的能力等待大家探索，详细的内容可以通过Ray官方文档进行了解。

《Serve: Scalable and Programmable Serving》

https://docs.ray.io/en/latest/serve/index.html