CSC | An OpenStack Case Study

Finnish non-profit, CSC, is carrying out unique responsibilities. Working closely with international research organizations such as CERN for high-energy physics and EMBL for molecular biology, CSC is creating, integrating and delivering high-quality information technology services as part of the national research system, ensuring that Finland stays at the forefront of innovation. Learn how they are using their community cloud for research, powered by OpenStack.

What year did your organization launch its first OpenStack deployment?

2011

How has OpenStack transformed your organization?

OpenStack has been and is the starting point for our multi-tenant cloud services. It has allowed us to take away the hardware layer from users and act as an integration platform with an API. This functionality is transformative for a high-performance computing center like CSC. It enables access to computational resources for some scientific areas, such as bioinformatics, that may have previously struggled to benefit from large-scale computing facilities.

As it is open source, OpenStack has allowed us to scale our services to serve researchers cost-efficiently. The stability of the APIs has also allowed us to grow our services while keeping them stable across hardware generations.

We currently have several other cloud services available and being developed, but OpenStack remains a building block for most of them.

What workloads are you running on OpenStack?

We run a lot of our own production and development on top of our OpenStack services and serve education and other non-research cases. But, one of our main user groups is researchers, and we support a range of cases from small individual research cases to big science projects. Bioinformatics is an especially big user group.

At the moment CSC computing infrastructure, which includes supercomputing, cloud services and data storage, supports about 7,000 active researchers. From those, 3,109 researchers are solving health and biology-related research problems, i.e., bioinformatics. In total, there are 1,394 projects in bioinformatics on CSC’s infrastructure, forming 35% of the total computing resource consumption.

In 10 years, the number of data-intensive computing projects at CSC has increased by an order of magnitude. Our capability to do this is connected to computing infrastructure services enabled by OpenStack technology. Many laboratories doing data-intensive computing have tailored and quickly evolving scientific software environments to analyze their data. CSC cannot support them all in a traditional HPC service setting, but virtualization and containerization of computational resources enable us to support a richer software ecosystem in collaboration with specialized labs. Running OpenStack also gives us security controls to be able to handle sensitive data in our environment.

For example, metagenomics means the study of microbial DNA in their natural living environment. The term generally refers to bacterial genomes in a sample, but it also means the genomes of other microorganisms, such as those of archaea and fungi, as well as genomes of the eukaryotes, inhabiting the sample of interest. Metagenomics can thus be used to study millions of genomes from microbiomes. To do this, they needed computing power because the volume of data in the materials runs in terabytes.

More example bioinformatics projects have been publicly documented by ELIXIR node operated by CSC.

What is the scale of your OpenStack environment?

We have two separate OpenStack services, in two datacenters. Each OpenStack service has around 250-300 compute nodes each. The total amount of physical CPU cores across both datacenters is roughly 24,000. We also have over 1,000 projects of different scales running on our OpenStack services.

What other open source technologies are integrated with your OpenStack environment?

We’re a big user of Ceph (both block and object), and we also work with OKD for containers. Many of our in-house services are built on open source, like Graphite and Grafana for metrics and OpenSearch for logging.

Share your use case by filling out the OpenStack User Survey!

Tags: Bioinformatics, Ceph, CERN, CSC, EMBL, Grafana, graphite, okd containers, open source, OpenStack

Author
Recent Posts

Kristin Barrientos

Kristin is the Marketing Coordinator at the OpenInfra Foundation. Prior to joining, she worked for a nonprofit in the social services sector. In her free time, she enjoys music, baking, playing tennis, and spending time with her family. She is also obsessed with her cat named Callie.

Latest posts by Kristin Barrientos (see all)