Highlights from the 2023 OpenInfra Summit: AI, Machine Learning & HPC

At this year’s OpenInfra Summit held in Vancouver, attendees had the opportunity to immerse themselves in a variety of informative sessions and workshops centered around artificial intelligence (AI), machine learning and high-performance computing (HPC).

Now check out these highlighted talks from the OpenInfra Summit!

How Large Language and Deep Learning Models Can Prevent Toxicity Such as Unconscious Biases From Spreading in Online Communities

Despite the proliferation of AI models in our everyday activities to make impactful decisions, there are growing concerns about their trustworthiness. It’s of utmost importance to have fairer interpretable models to make decisions in things such as healthcare, finances, the justice system, etc. This presentation by Armstrong Foundjem aims to predict biases earlier enough in a multi-class and multi-label problem before they induce harm. The distributed nature of online communities and their complex data sources makes it difficult to identify biases in data. Thus, they use large language models to accurately classify textual/images/video data across languages, cultures, religions, ages, genders, etc. Also, they fine-tune a transformer (BERT) to predict complicated NLP tasks, which traditional machine learning models would be limited. A typical BERT model can contextually generate text embedding for a multi-class problem and task-specific classification embedding. The results predict biases with an accuracy of 98.7%.

Why Graphcore Bets on OpenStack

Graphcore provides AI solutions using their own IPU (Intelligence Processing Unit) chips. Frederic Lardinois, Senior Enterprise Editor at TechCrunch, explores the role OpenStack plays in helping Graphcore deliver their services. In this Q&A, Nathan Harper, Senior Cloud Development Engineer at Graphcore, talks about why Graphcore has chosen OpenStack rather than something else.

Armada – Building a Research Platform on Top of Openstack and Kubernetes

Quantitative researchers require a lot of hardware and the major goals of their research platform are to enable high-throughput scheduling of jobs, use open source software and provide queueing capabilities once the number of jobs exceeds capacity. They use Openstack for their hardware provisioning. However, provisioning hardware is only the first step in building a research platform. In this talk, Kevin Hannon discuss how you can build a research platform on top of OpenStack and Kubernetes to enable quantitative researchers better to run their workflows. The types of workflows they have to support range from analytics to machine learning. Armada is an open source project developed at G-Research. Armada enables researchers to submit thousands of jobs to a multi-cluster Kubernetes computing platform and these jobs get prioritized.

Sokovan: Container Orchestrator for Accelerated AI/ML Workloads and Massive-scale GPU Computing

Sokovan is a Python-based container orchestrator that addresses the challenges of running resource-intensive batch workloads in a containerized environment. It offers acceleration-aware, multi-tenant, batch-oriented job scheduling and fully integrates multiple hardware acceleration technologies into various system layers. It consists of two layers of schedulers, the cluster-level scheduler which allows users to customize job placement strategies and control the density and priority of workloads and the node-level scheduler, which optimizes per-container performance by automatically detecting and mapping underlying hardware accelerators to individual containers, improving the performance of AI workloads compared to Slurm and other existing tools. Sokovan has been deployed on a large scale in various industries for a range of GPU workloads, including AI training and services. It helps container-based MLOps platforms unleash the potential of the latest hardware technologies.

Explore the HPC Storage on Arm64

With the increasing adoption rate of the Arm64 architecture, there is an emerging trend of introducing Arm64 servers to the HPC area. HPC storage is the fundamental infrastructure of the HPC framework. The functionality, stability and performance are the three emphases for HPC users to choose their HPC infrastructure. Addressing the three areas is critical for Arm64 architecture to enter into the HPC world.

In this presentation, Kevin Zhao and Xinliang Liu share their works for HPC storage Lustre, DAOS, Ceph and BeeGFS on Arm64. It includes the enablement, stability and performance optimization for the HPC storage framework. The details for setting up and maintaining Arm64 CI and producing the Arm64 release of different operating systems for Lustre and DAOS will also be shared. They also cover the IO500 performance optimization method for the HPC storage and some user scenarios will also be covered.

Tags: AI, HPC, machine learning, OpenStack

Author
Recent Posts

Kristin Barrientos

Kristin is the Marketing Coordinator at the OpenInfra Foundation. Prior to joining, she worked for a nonprofit in the social services sector. In her free time, she enjoys music, baking, playing tennis, and spending time with her family. She is also obsessed with her cat named Callie.

Latest posts by Kristin Barrientos (see all)