Inside HPC, GPU, AI : Must-see sessions at the Vancouver Summit

Join the people building and operating open infrastructure at the OpenStack Summit Vancouver in May. The Summit schedule features over 300 sessions organized by use cases including: artificial intelligence and machine learning, high performance computing, edge computing, network functions virtualization, container infrastructure and public, private and multi-cloud strategies.

Here we’re highlighting some of the sessions you’ll want to add to your schedule about HPC, GPU and AI. Check out all the sessions, workshops and lightning talks focusing on these three topics here.

Ceph and the CERN HPC infrastructure

For the past five years, CERN’s IT department has used Ceph to build scale-out storage for its massive OpenStack cloud. For the block and object storage use-cases, with and without erasure coding, Ceph has demonstrated to be flexible and scalable while being resilient to infrastructure failures. In this intermediate-level talk, CERN’s Dan van der Ster and Arne Wiebalck will highlight the key metrics required by users, including POSIX compliance, small-file latency, metadata throughput and scalability, and fault tolerance, while showing the results of industry standards and new micro-benchmarks. Details here. Speakers from CERN and SKA the Square Kilmeter Array are also teaming up to give a talk on HPC and bare metal – more on that here.

Call it real: Virtual GPUs in Nova

GPUs in OpenStack are a long-standing question, say Red Hat’s Sylvain Bauza and Citrix’s Jianghua Wang. While there are many business cases for providing high-profile GPUs for every instance—namely AI, mining, and desktop. Until Queens, the only solution to expose these devices to the guests was PCI passt hrough in Nova. In this intermediate-level talk, they’ll show how you can now request virtual GPUs (vGPUs) for XenServer and libvirt/KVM Nova drivers with a demo and share the roadmap for upcoming releases.

Optimized HPC/AI cloud with OpenStack acceleration service and composable hardware

In this advanced session, Shuquan Huang from 99cloud and Jianfeng JF Ding from Intel Corporation will introduce OpenStack Acceleration Service – Cyborg, which provides a management framework for accelerator devices (e.g. FPGA, GPU, NVMe SSD). They will also discuss Rack Scale Design (RSD) technology and explain how physical hardware resources can be dynamically aggregated to meet the AI/HPC requirements. The ability to compose on the fly with workload-optimized hardware and accelerator devices through an API allows data center managers to manage these resources in an efficient automated manner. Details here.

Artificial intelligence-driven orchestration, challenges and opportunities

Sana Tariq from TELUS Communications will share the journey of operationalizing service orchestration platform from developing evaluation criteria (open source/commercial vendors) to architectural considerations in chaos of multi-vendor, multi domain hybrid cloud ecosystem. The intermediate-level talk offers a peek into the future of automation and orchestration driven by AI and ML to optimize cloud/network resource management, enhance security, drive better customer experience creating new business opportunities towards future services landscape. Details here.

Lessons learned in deploying OpenStack for HPC users

The Minnesota Supercomputing Institute deployed an OpenStack cloud called Stratus. This beginner-level talk describes the lessons learned in launching a platform to support research with specific data-use agreements; and also issues concerning accountability, risk acceptance, and the role of project leadership when a large supercomputing facility deviates from its traditional base of support. Details here.

Case study: Large-scale deployment for machine learning with high-speed storage

Three speakers from NTT will offer a case study centering on a fully open-sourced reference cluster model with Ansible and container orchestrator automation. The environment built on GPU computation and high speed storage, in which the company uses Chainer and ChainerMN learning framework with many NVIDIDA GPU nodes, and attach perfectly scalable OpenStack Swift object storage with file system APIs as the high speed data storage. Details here.

See you at the OSF Summit in Vancouver, May 21-24, 2018! Register here.

Tags: AI, GPU, HPC

Author
Recent Posts

Superuser

Superuser Magazine is the Open Infrastructure Foundation's official online publication. It covers open infrastructure ecosystem news, case studies, event recaps, product updates and announcements, project releases, and more.