Allgemein

CNCF Dragonfly Speeds Container, Model Sharing with P2P

CNCF Dragonfly Speeds Container, Model Sharing with P2P

The Dragonfly project, an open source peer-to-peer image and file distribution system, has graduated from the Cloud Native Computing Foundation‘s program for incubating new cloud native technologies.

The open source technology, under CNCF’s wing since 2018, has shown that it can work in production settings, with its ability to copy containers and large AI models across a network at scale, according to the organization. Built to run on Kubernetes, it has found use by organizations managing large-scale AI workloads, and has found a home in other environs as well, including CI/CD and edge computing.

CNCF Dragonfly, originally developed for internal use by Alibaba Cloud, provides a way for organizations to distribute images across a network. It can copy container images to thousands of nodes nearly simultaneously.

It also works well with files, caches and logs.

Overall, 271 individuals across 130 companies have contributed 26,000 commits to building out the project.

“Looking back on this journey over the past eight years, every step has embodied the open source spirit and the tireless efforts of the many contributors,” said Zuozheng Hu, founder of Dragonfly and emeritus maintainer, in a statement.

The Power of P2P

A peer-to-peer file sharing mechanism could help cloud native deployments in distributing new and updated container images across a cluster more quickly and with less stress to the upstream network.

P2P, first popularized by music sharing programs such as Napster over two decades ago, can make full use of the cluster’s bandwidth while eliminating the possible bottleneck of having a single server respond to all the requests for a new image.

In a P2P network, each node, or “peer,” can share files with each other, rather than all the nodes saturating the bandwidth to the image server by downloading identical copies of a single image.

Dragonfly is not a pure P2P technology; It still requires a supernode, to schedule and control distribution within the peer network. An agent on each node, dfget, downloads the file pieces. Another component, the dfdaemon proxy, intercepts image downloading requests from a container engine to dfget.

Dragonfly’s Robust Support Stack

As a CNCF project, the development team has built a robust support stack in the past decade. The Dragonfly can be installed via Helm, and monitored with Prometheus and OpenTelemetry.

To speed transfers, it can run on the gRPC protocol. Images can be “preheated” for faster sharing via the Harbor open source registry.

Dragonfly also supports CNCF’s ModelPack specification for tidier AI model distribution.

One Dragonfly subproject, called Nydus, has brought considerable value to the software by further accelerating model distribution.

“The combination of Dragonfly and Nydus substantially shortens launch times for container images and AI models, enhancing system resilience and efficiency,” said Jiang Liu, Nydus maintainer, in a statement.

Use Cases for Dragonfly

Dragonfly has found a home across some of the most innovative cloud native services, many located in Asia. CNCF provided a few key examples.

It has become a core component of the container image and data distribution system for Alibaba, providing support for the annual Double 11 (Singles’ Day) shopping festival, as well as an ongoing role in model data distribution and cache acceleration.

It has saved considerable transmission bandwidth across the 10,000 Kubernetes nodes of the Asian financial company Ant Group. Nydus, in particular, helped the organization reduce image pull time to near zero, and the technology is used for large language model movement as well.

For the Datadog observability firm, Dragonfly with Nydus cut the time it takes node daemonsets to start up within seconds, whereas the image pulls would previously drag that time out to five minutes.

Chinese mobile technology company DiDi uses Dragonfly for large-scale file synchronization and image distribution for enterprises.

And container registry service Kuaishou is about to use Dragonfly to support image distribution capabilities for tens of thousands of services and hundreds of thousands of servers.

The post CNCF Dragonfly Speeds Container, Model Sharing with P2P appeared first on The New Stack.