Allgemein

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

NVIDIA Dynamo Planner Brings SLO-Driven Automation to Multi-Node LLM Inference

Microsoft and NVIDIA have released Part 2 of their collaboration on running NVIDIA Dynamo for large language model inference on Azure Kubernetes Service (AKS). The first announcement aimed for a raw throughput of 1.2 million tokens per second on distributed GPU systems.

By Claudio Masolo