Cloud Team at Bloomberg is one of 11 nominees for the Superuser Awards to be presented at the OpenInfra Live: Keynotes.

image

Who do you think should win the 2021 Superuser Awards?

It’s time for you to help determine the winner of the 2021 Superuser Awards! The annual Superuser Awards is to recognize organizations that have used open infrastructure to improve their business while contributing back to the community.

This year, the Superuser Awards winner will be announced at the OpenInfra Live: Keynotes, November 17 – 18th! This will be the best opportunity for the global community to get together this year to hear about all things OpenInfra. Registration is free and is now live, so get your virtual ticket today and join us for Keynotes!

Cloud Team at Bloomberg is one of 11 nominees for the Superuser Awards. Check out why its team getting nominated and support them on Twitter!

Who is the nominee?

Cloud Team at Bloomberg

How has open infrastructure transformed the organization’s business? 

Bloomberg’s OpenStack-based private cloud (known as Bloomberg Cloud Compute, or BCC) has become the largest Compute platform in Bloomberg’s data centers. Most Engineering teams have at least partially adopted virtual machines, which has provided improved utilization, stability, and operational flexibility.

BCC delivers an open Infrastructure as a Service (IaaS) to Bloomberg’s community of more than 6,500 software engineers. As a first for us, it supports an automated end-to-end process for our application teams to get their machines provisioned, automatically built, and online without human intervention or tickets being issued. This has helped us achieve much higher machine utilization in our data centers (4x or more)

How has the organization participated in or contributed to an open source project?

BCC is an open source project managed on GitHub. We also contribute fixes to OpenStack (Nova and Neutron) and the Calico project either directly or via our support vendors.

To date, Bloomberg has hosted three OpenStack Operators Meetups. Two of these were hosted in Bloomberg’s offices in New York and London. We have also been involved in the organization of many other Ops Meetups in Mexico, Germany, Italy, Japan, etc.

Some of our notable upstream contributions include: 

AZ anti-affinity filter blueprint

bug fixes:

Twice for Calico

What open source technologies does the organization use in its open infrastructure environment?

OpenStack, Ceph, Ubuntu, MySQL, HAproxy, memcached, unbound, tinyproxy, Calico, bird2, Apache httpd (full list is here)

What is the scale of your open infrastructure environment?

We have deployed approximately 120k physical OpenStack (Ussuri/Ceph Octopus/Neutron/Calico) cores (and growing) over several thousand hosts (each typically with 1.5+ TB RAM) in our modern clusters spread across 4 sites. Each also has multiple PB of NVME-backed Ceph storage.

Some resources are still deployed in our legacy OpenStack Mitaka/nova-network/Ceph Hammer clusters (20k cores and dropping).

What kind of operational challenges have you overcome during your experience with open infrastructure?

Our old clusters were stuck on nova-network with few upgrade paths. 

Our new clusters launched with OpenStack Rocky/neutron in late 2018 and were recently upgraded with zero downtime to OpenStack Ussuri. The co-located Ceph clusters were recently upgraded (also with zero downtime) from Ceph Mimic to Octopus. These new OpenStack/Ceph clusters have already provided several years of continuous service. 

Our older architecture would not scale much beyond 200 compute nodes due to the use of L2 spanning techniques, tagged VLANs, and other constrained architectural choices. Our new architecture is pure Layer 3 and easily supports 4x the scale or more (800 compute nodes today and increasing rapidly). As an example in these clusters we live migrated 20,000 VMs in three weekends for maintenance.

How is this team innovating with open infrastructure? 

BCC has proven the value of OpenStack for large scale data center compute for Bloomberg. It has met our stringent requirements for performance, stability, and availability, and displaced task-specific discrete servers and, in some cases, proprietary virtualization solutions. 

We also proved the use of Ceph for pooled software-defined storage, enabling us to move away from traditional enterprise storage vendors’ products.

Our use of Calico as our Neutron implementation gives us distributed scalable firewalling on a single, highly-scalable L3 fully-routed IP fabric with none of the overhead and complexity of a full SDN no network virtualization, no VLANs, VXLANs, no encap/decap/tunnelling. This helps us meet our need for real-time traffic monitoring, threat detection, compliance etc

 

 

The Superuser Editorial Advisory Board will review the nominees and determine the finalists and overall winner after the community has had a chance to review the nominees. Stay tuned!

Superuser