Learn about one option when scaling your OpenStack-Ansible deployment with RabbitMQ and MariaDB.

image

OpenStack is a cloud computing platform that is designed to be highly scalable. However, even though OpenStack is designed to be scalable, there are a few potential bottlenecks that can occur in large deployments. These bottlenecks typically involve the performance and throughput of RabbitMQ and MariaDB clusters.

RabbitMQ is a message broker that is used to decouple different components of OpenStack. MariaDB is a database that is used to store data for OpenStack. If these two components are not performing well, it can have a negative impact on the performance of the entire OpenStack deployment.

There are a number of different methodologies that can be used to improve the performance of RabbitMQ and MariaDB clusters. These methodologies include scaling up the clusters, using a different message broker or database, or optimizing the configuration of the clusters.

In this article, we will discuss the potential bottlenecks that can occur in large OpenStack deployments and ways to scale up deployments to improve the performance of RabbitMQ and MariaDB clusters.

NOTE: Examples provided in this article were made on OpenStack 2023.1 (Antelope). It is possible to achieve the same flows in earlier releases, but some extra steps or slightly different configurations might be required.

Most Common Deployment

But before talking about ways on how to improve things, let’s quickly describe our “starting point”, to understand what we’re dealing with at the starting point.

The most common OpenStack-Ansible deployment design is three control nodes, each one is running all OpenStack API services along with supporting infrastructure, like MariaDB and RabbitMQ clusters. This is a good starting point for small to medium-sized deployments. However, as the deployment grows, you may start to experience performance problems. Typically communication between services and MySQL/RabbitMQ looks like this:

MariaDB
As you might see on the diagram, all connections to MariaDB come through the HAProxy which has Internal Virtual IP (VIP). OpenStack-Ansible does configure the Galera cluster for MariaDB, which is a multi-master replication system. Although you can issue any request to any member of the cluster, all write requests will be passed to the current “primary” instance creating more internal traffic and raising the amount of work each instance should do. So it is recommended to pass write requests only to the “primary” instance.

However HAProxy is not capable of balancing MySQL queries at an application level (L7 of OSI model), to separate read and write requests, so we have to balance TCP streams (L3) and pass all traffic without any separation to the current “primary” node in the Galera cluster, which creates a potential bottleneck.

RabbitMQ
RabbitMQ is clustered differently. We supply IP addresses of all cluster members to clients and it’s up to the client to decide which backend it will use for interaction. Only RabbitMQ management UI is balanced through haproxy, so the connection of clients to queues does not depend on HAProxy in any way.

Though usage of HA queues and even quorum queues makes all messages and queues to be mirrored to all or several cluster members. While quorum queues show way better performance, they still suffer from clustering traffic which still becomes a problem at a certain scale.

Option 1: Independent clusters per service

With this approach, you might provide the most loaded services, like Nova or Neutron, their standalone MariaDB and RabbitMQ clusters. These new clusters might reside on a separate hardware.

In the example below we assume that only Neutron is being reconfigured to use the new standalone cluster, while other services remain sharing the already existing one.
So Neutron connectivity will look like this:

As you might have noticed, we still consume the same HAProxy instance for MySQL balancing to the new infra cluster.

Next, we will describe how to configure such a stack and execute the service transition to this new layout.

Setup of new MariaDB and RabbitMQ clusters

To configure such a layout and migrate Neutron using it with OpenStack-Ansible you need to follow these steps:

NOTE: You can reference the following documentation for a deeper understanding of how env.d and conf.d files should be constructed: https://docs.openstack.org/openstack-ansible/latest/reference/inventory/understanding-inventory.html

  • Define new groups for RabbitMQ and MariaDB. For that, you can create files with the following content:
    /etc/openstack_deploy/env.d/galera-neutron.yml:
# env.d file are more clear if you read them bottom-up
# At component skeleton you map component to ansible groups
component_skel:
  # Component itself is an ansible group as well
  neutron_galera:
    # You tell in which ansible groups component will appear
    belongs_to:
      - neutron_galera_all
      - galera_all

# At container skeleton you link components to physical layer
container_skel:
  neutron_galera_container:
    # Here you define on which physical hosts container will reside
    belongs_to:
      - neutron-database_containers
    # Here you define which components will reside on container
    contains:
      - neutron_galera
# At physical skeleton level you map containers to hosts
physical_skel:
  # Here you tell to which global group containers will be added
  # from the host in question.
  # Please note, that <name>_hosts and <name>_containers are
  # interconnected, and <name> can not contain underscores.
  neutron-database_containers:
    belongs_to:
      - all_containers

  # You define `<name>_hosts` in your openstack_user_config or conf.d
  # files to tell on which physical hosts containers should be spawned
  neutron-database_hosts:
    belongs_to:
      - hosts

/etc/openstack_deploy/env.d/rabbit-neutron.yml:

# On the component level we are creating group `neutron_rabbitmq`
# that is also part of `rabbitmq_all` and `neutron_rabbitmq_all`

component_skel:
  neutron_rabbitmq:
    belongs_to:
      - rabbitmq_all
      - neutron_rabbitmq_all

# On the container level we tell to create neutron_rabbitmq on
# neutron-mq_hosts
container_skel:
  neutron_rabbit_mq_container:
    belongs_to:
      - neutron-mq_containers
    contains:
      - neutron_rabbitmq

# We define the physical level as a base level which can be consumed
# by container and component skeleton.
physical_skel:
  neutron-mq_containers:
    belongs_to:
      - all_containers
  neutron-mq_hosts:
    belongs_to:
      - hosts

Map your new neutron-infra hosts to these new groups. To add to your openstack_user_config.yml the following content:

neutron-mq_hosts: &neutron_infra
  neutron-infra1:
    ip: 172.29.236.200
  neutron-infra2:
    ip: 172.29.236.201
  neutron-infra3:
    ip: 172.29.236.202
neutron-database_hosts: *neutron_infra
  • Define some specific configurations for newly created groups and balance them:
  • MariaDB
    • In file /etc/openstack_deploy/group_vars/neutron_galera.yml:
galera_cluster_members: "{{ groups['neutron_galera'] }}"
galera_cluster_name: neutron_galera_cluster
galera_root_password: mysecret

In file  /etc/openstack_deploy/group_vars/galera.yml:

galera_cluster_members: "{{ groups['galera'] }}"
  • Move `galera_root_password` from /etc/openstack_deploy/user_secrets.yml to /etc/openstack_deploy/group_vars/galera.yml
  • RabbitMQ
    In file /etc/openstack_deploy/group_vars/neutron_rabbitmq.yml:
rabbitmq_host_group: neutron_rabbitmq
rabbitmq_cluster_name: neutron_cluster

In file /etc/openstack_deploy/group_vars/rabbitmq.yml

rabbitmq_host_group: rabbitmq
  • HAProxy
    In /etc/openstack_deploy/user_variables.yml define extra service for MariaDB:
haproxy_extra_services:

  - haproxy_service_name: galera_neutron
    haproxy_backend_nodes: "{{ (groups['neutron_galera'] | default([]))[:1] }}"
    haproxy_backup_nodes: "{{ (groups['neutron_galera'] | default([]))[1:] }}"
    haproxy_bind: "{{ [haproxy_bind_internal_lb_vip_address | default(internal_lb_vip_address)] }}"
    haproxy_port: 3307
    haproxy_backend_port: 3306
    haproxy_check_port: 9200
    haproxy_balance_type: tcp
    haproxy_stick_table_enabled: False
    haproxy_timeout_client: 5000s
    haproxy_timeout_server: 5000s
    haproxy_backend_options:
      - "httpchk HEAD / HTTP/1.0\\r\\nUser-agent:\\ osa-haproxy-healthcheck"
    haproxy_backend_server_options:
      - "send-proxy-v2"
    haproxy_allowlist_networks: "{{ haproxy_galera_allowlist_networks }}"
    haproxy_service_enabled: "{{ groups['neutron_galera'] is defined and groups['neutron_galera'] | length > 0 }}"

haproxy_galera_service_overrides:
  haproxy_backend_nodes: "{{ groups['galera'][:1] }}"
  haproxy_backup_nodes: "{{ groups['galera'][1:] }}"
  • Prepare new infra hosts and create containers on them. For that, run the command:

    openstack-ansible playbooks/setup-hosts.yml --limit neutron-mq_hosts,neutron-database_hosts,neutron_rabbitmq,neutron_galera
  • Deploy clusters:
    • MariaDB:

      openstack-ansible playbooks/galera-install.yml --limit neutron_galera
    • RabbitMQ:

      openstack-ansible playbooks/rabbitmq-install.yml --limit neutron_rabbitmq

Migrating the service to use new clusters

While it’s relatively easy to start using the new RabbitMQ cluster for the service, migration of the database is slightly tricky and will include some downtime.

First, we need to tell Neutron that from now on, the MySQL database for the service is listening on a different port. So you should add the following override to your user_variables.yml:

neutron_galera_port: 3307

Now let’s prepare the destination database: create the database itself along with required users and provide them permissions to interact with the database. For that, we will run the neutron role with a common-db tag and limit execution to the neutron_server group only. You can use the following command for that:

openstack-ansible playbooks/os-neutron-install.yml --limit neutron_server --tags common-db

Once we have a database prepared, we need to disable HAProxy backends that proxy traffic to the API of the service in order to prevent any user or service actions with it.

For that, we use a small custom playbook. Let’s name it haproxy_backends.yml:

- hosts: haproxy_all
  tasks:
    - name: Manage backends
      community.general.haproxy:
        socket: /run/haproxy.stat
        backend: "{{ backend_group }}-back"
        drain: "{{ haproxy_drain | default(False) }}"
        host: "{{ item }}"
        state: "{{ haproxy_state | default('disabled') }}"
        shutdown_sessions: "{{ haproxy_shutdown_sessions | default(False) | bool }}"
        wait: "{{ haproxy_wait | default(False) | bool }}"
        wait_interval: "{{ haproxy_wait_interval | default(5) }}"
        wait_retries: "{{ haproxy_wait_retries | default(24) }}"
      with_items: "{{ groups[backend_group] }}"

We run it as follows:

openstack-ansible haproxy_backends.yml -e backend_group=neutron_server

No, we can stop the API service for Neutron:

ansible -m service -a "state=stopped name=neutron-server" neutron_server

And run a backup/restore of the MySQL database for the service. For this purpose, we will use another small playbook, that we name as mysql_backup_restore.yml with the following content:

- hosts: "{{ groups['galera'][0] }}"
  vars:
    _db: "{{ neutron_galera_database | default('neutron') }}"
  tasks:
    - name: Dump the db
      shell: "mysqldump --single-transaction {{ _db }} > /tmp/{{ _db }}"
    - name: Fetch the backup
      fetch:
        src: "/tmp/{{ _db }}"
        dest: "/tmp/db-backup/"
        flat: yes
- hosts: "{{ groups['neutron_galera'][0] }}"
  vars:
    _db: "{{ neutron_galera_database | default('neutron') }}"
  tasks:
    - name: Copy backups to destination
      copy:
        src: "/tmp/db-backup/"
        dest: "/tmp/db-backup/"
    - name: Restore the DB backup
      shell: "mysql {{ _db }} < /tmp/db-backup/{{ _db }}"

Now let’s run the playbook we’ve just created:

openstack-ansible mysql_backup_restore.yml

NOTE: The playbook above is not idempotent as it will override database content on the destination hosts.

Once the database content is in place, we can now re-configure the service using the playbook.

It will not only tell Neutron to use the new database but also will switch it to using the new RabbitMQ cluster as well and re-enable the service in HAProxy.

For that to happen we should run the following command:

openstack-ansible playbooks/os-neutron-install.yml --tags neutron-config,common-mq

After the playbook has finished, neutron services will be started and configured to use new clusters.