Learn more about how OpenStack has enabled Graphcore to create the infrastructure they use to run their AI Machine Learning workloads.

image

Graphcore’s advanced AI compute systems deliver leading-edge performance for machine intelligence workloads in the cloud and on-premises. At the heart of every Graphcore system is the Intelligence Processing Unit (IPU), a made-for-AI processor, designed to meet the unique computational requirements of advanced artificial intelligence.

Graphcore technology is used by commercial and public sector customers, as well as private and academic research institutions, to make new AI breakthroughs and accelerate at-scale deployment.

In 2021, Graphcore adopted OpenStack as their reference platform for building platform services and entire clouds. Learn more about how OpenStack has enabled them to create the infrastructure they use to run their AI Machine Learning workloads. 

How has OpenStack transformed your organization?

Graphcore’s hardware architecture is uniquely flexible and built from the ground up for machine learning. With traditional deployment methods, it would be technically possible to build our flexible network architecture, but this would be very time consuming and would not allow for rapid changes of tenant or use model.

Using infrastructure as code (mainly Terraform) against the OpenStack APIs allows for rapid reconfiguration at every level including RDMA via SRIOV which is critical to the function of the IPU product. 

What workloads are you running on OpenStack?

We use OpenStack to create infrastructure on which we run our AI and Machine Learning workloads. We create AI processing products using custom chips and the fastest networking interfaces and devices available. These require the most innovative and optimized infrastructure around them to allow for efficient utilization. Maximizing data ingress and processing bandwidth at every point is essential to keep the AI beast fed. 

What is the scale of your OpenStack environment?

We currently have multiple environments including two production clouds, each containing at least 64 high-core-count servers and 1024 IPUs for workloads, 400Gb networking, hyper-converged Ceph deployments and eight control-plane and storage servers.

What other open source technologies are integrated with your OpenStack environment?

Our OpenStack deployments rely heavily on Terraform, ansible, Kubernetes, Azimuth and AWX.

Graphcore and StackHPC have extended the OpenStack capabilities to make the IPU-Machine a first-class citizen (through Ironic), and continue to develop comprehensive resource management via Blazar. 

Share your use case by filling out the OpenStack Case Study Survey! 

Kristin Barrientos