Check out these highlighted talks from the OpenInfra Summit.

image

At this year’s OpenInfra Summit held in Vancouver, attendees had the opportunity to see many sessions and workshops geared toward private, public and hybrid cloud environments.

Now check out these highlighted talks from the OpenInfra Summit!

The Tortoise Beats the Hare: Upgrading the Operating System of the Cloud at Scale

At Bloomberg, the company’s private cloud, which is made of thousands of physical machines, has become the home of tens of thousands of VMs and the foundation for its core applications. When it came to upgrading the OS of the physical machines, they wanted to not only speed up the process but also minimize the impact on our engineers and end-users. They had two options: either upgrade in place by using Ubuntu update-manager (the hare) or fully rebuild the entire fleet to a newer version of Ubuntu (the tortoise). Naturally, they wanted the faster approach of upgrading in place. However, in the end, the tortoise beat the hare – they realized the slower method had greater benefits, and they successfully upgraded the OS of the physical machines by rebuilding them all while minimizing the impact on both their clients and engineers. In this presentation, they will discuss why they made their decisions, why they were able to do so, and the lessons learned along the way.

Using Openstack to enable High Availability Workloads (Multi-Availability Zones and Multi-Region Disaster Recovery)

As a provider of public cloud-type services, many of Samsung SDS’ enterprise customers require high availability. Enterprise workloads need to withstand zonal failures as well as full regional failures. This talk focuses on how Samsung SDS set up their Openstack-based cloud in multiple regions and multiple zones to enable applications to run on multiple zones as well as support disaster recovery with multiple regions.

Primary technologies used include setting up a Kubernetes control plane, ceph file clusters and Keystone across zones and various architecture assessments before making their final decision. Finally,  Daniel Paik will discuss ways that OpenStack developers can expand this area of functionality in the future.

Use Case: A Close Look at OpenStack Ocata as a Public Cloud

OpenStack Ocata, which is of particular interest in how much it fully embraces containers, is the first OpenStack experience to be utilized as a NIPA public cloud in the past five years and it’s still running great. NIPA Cloud has been updating Ocata while in production until the final update without any glitches. Ceph Jewel is used as volume storage.

During its services, they have encountered an increasing network workload that forces us to scale the Neutron agent to handle the higher capacity of the virtual network and router. They migrated compute resources of more than 600 VMs due to space limitations to new hardware without any impact on customers. They scaled OpenStack API, using OpenStack-Ansible to help manage and scale components due to higher requests from the customer dashboard NIPA Cloud Platform.

NIPA Cloud’s experience proves that using OpenStack in the long run will produce higher ROI because it’s open source and this makes their product very competitive. In turn, they invest more in people for training to be OpenStack engineers.

Running OVN at Scale and Checking Where it Burns

When running an OpenStack environment with more than 10,000 VMs, STACKIT has noticed some scaling issues with the ml2/ovs network plugin (looking at you, RabbitMQ).

To get away from these issues they decided to replace the network backend with OVN during their migration from Queens to Yoga.
In this presentation, they share their experiences with running OVN with more than 10,000 VMs and any issues they found along the way.

They also share the upstream changes needed to make OVN work for them. This includes changes to the neutron OVN plugin, the OVN code itself, as well as the OVN bgp agent. Additionally, OVN offers a bunch of additional features which are not available in Neutron at the moment.

They also share their plans for these features and how they might impact Neutron to support these features.

vGPUs with OpenStack Nova

This session on vGPUs with OpenStack Nova covers various use cases and considerations for utilizing vGPUs effectively. OpenStack Nova is an open-source cloud computing platform that provides the foundation for building and managing virtual machines (VMs) in a cloud environment. It offers flexible and scalable VM provisioning, resource management, and access control, making it a fundamental component of the OpenStack ecosystem for cloud infrastructure.

This session covers example use cases (and when not to use vGPUs), as well as specific hardware requirements, server and software configuration, and spinning up your first vGPU-enabled virtual machine

Overall, the educational session provides a comprehensive overview of vGPUs with OpenStack Nova, enabling participants to understand the use cases, hardware requirements, configuration steps, and considerations involved in effectively utilizing vGPUs for various tasks.

Deploying and Managing Baremetal Kubernetes with Ironic

G-Research uses Armada to distribute millions of batch jobs per day, across many 1000 nodes, across many baremetal and virtual Kubernetes clusters. But how do they build and provision all of the nodes that make up their HPC farms within their private OpenStack cloud?

Ironic, of course!

Whether it’s an initial power on of a node to check that they got what they paid for, running workloads, moving a node from one network to another, checking for cabling errors, or ensuring nodes are secure, compliant, and have firmware that is up to date, Ironic underpins the tooling and automation that drives the enrolment, provisioning and recycling of baremetal hardware across their data centers. Check out this talk if you want to hear some of their successes, failures, and a few lessons learned during G-Research’s journey in moving Armada clusters from virtual machines to baremetal.

Kristin Barrientos