When we think of cloud software that powers large data centers across the globe it is hard to imagine that same platform out at the edge and OpenStack is not an exception. To break this myth, the OpenInfra Edge Computing Group recently met with the Chameleon project to learn about how they are using an adaptation of OpenStack to power their use cases in areas of computer science research and education while taking their infrastructure out of the datacenter and university campus to manage autonomous vehicles or small devices in students’ and lecturers’ homes.
But what is the Chameleon project?
It is an NSF-funded testbed that supports research and education activities, such as the development and innovation of operating systems, virtualization solutions or new networking protocols. The hardware for the testbed sits in 2 supercomputing centers and it is powered by OpenStack with some extensions to support experimentation with networks and support for repeatability, which is crucial in a scientific environment. The software infrastructure consisting of OpenStack, extensions and adaptations to provide features supporting experimentation with e.g., Software Defined Networks (SDN), and operational tools is available for partners to install locally at universities and other sites, and it is called CHI-in-a-Box, where CHI stands for CHameleon Infrastructure. The project has been running for over 6 years and served over 6,000 users so far.
With the rise of interest in edge computing, Chameleon users increasingly needed resources such as Raspberry Pis or Nvidia nanos. While some useful experiments can be carried out with those devices in a datacenter, most users were interested in deploying them in a different deployment context: associated with self-driving cars, drones, or floating vehicles or attached to Software Defined Radios (SDRs), cameras, and smart sensors deployed in the field. This is why the Chameleon team created [email protected] so that the testbed could be extended to edge and IoT devices deployed and operated over a range of locations: from a datacenter to wherever the users are. The original set of requirements for a cloud testbed already emphasized deep configurability and adaptability — by taking the infrastructure out of the datacenter and to the edge the need to be able to reconfigure the environment increased by orders of magnitude.
So how does OpenStack fulfill all these needs?
In designing the edge offering, the Chameleon team wanted to rely on OpenStack interfaces as much as possible as this allowed them to preserve all the features built on the Chameleon front end, such as authentication via federated identity or integration with Jupyter, valued by many users. At the same time, providing bare metal reconfiguration for edge devices proved challenging: many of them are not server class entities and thus don’t support the required functionality. To keep the reconfiguration method relatively lightweight and suitable to the reduced capabilities of edge resources, the team chose to use containers as part of their solution that could be managed using the OpenStack Zun project. The main challenge of the project was to adapt Zun to work with the changed networking and security issues that arise when working with resources outside of the datacenter.
In order to get all the desired functionality, there are some gaps that needed to be filled:
One of the goals of the Chameleon edge platform is to make it easy for researchers to bring their own device such that it can be managed using the OpenStack-based platform. The team implemented a new Doni service that makes it possible to enroll and manage the lifecycle of devices programmatically by presenting a single access point for enrollment (encapsulating, e.g., requesting access credentials, configuring Zun, registering the device in Blazar). An important part of this encapsulation is that it provides an abstraction layer to hide privileged APIs from the users who are enrolling their devices.
Tunelo + Neutron + Wireguard
Chameleon uses Wireguard to create and maintain the tunnels that secure the device-to-cloud communications. One tunnel carries all control-plane traffic (e.g., MySQL, RabbitMQ, and internal OpenStack APIs), while another carries all tenant traffic generated from within a container launched on the device. This second tunnel wraps a VxLAN overlay network, which is made available to containers via Neutron and Kuryr.
Neutron is also used on a higher level to maintain the state of the Wireguard tunnels themselves as Neutron “ports” via a new ML2 plugin. This might sound a bit odd as Wireguard operates at Layer 3, but in practice it works by requiring that Wireguard ports in Neutron are associated with a subnet; Neutron’s IPAM provides a nice way to ensure that two such ports never use the same IP address. The ML2 plugin then creates the Wireguard interface and stores its public key for later querying. The device-to-cloud networking is similar to a mesh network, however currently follows a hub-and-spoke model, where devices use the hub as an intermediary when addressing each other, though this is possible to work around (on a per-device basis) with static routing when two devices are on the same local network relative to the cloud.
To simplify deployment on the device, the ML2 plugin and agent do not run at the edge; Chameleon provides an Edge SDK, which is invoked by the device owner to create the Wireguard interfaces and then request a Wireguard port from Neutron by providing the public key. Neutron can then ensure the hub port in the topology is updated with a new list of peers.
The Wireguard port abstraction requires invoking privileged Neutron API operations; in order to control such access the Chameleon team also created a thin service called Tunelo that’s main responsibility is to wrap these privileged calls to Neutron.
Over the years of the project, the Chameleon team has also significantly contributed to the development of OpenStack Blazar which implements advance reservations. For systems like Chameleon, this is a key component that ensures that users can provision resources when they need them. This is particularly important in cases of resource scarcity, e.g., when users want to provision clusters composed of many nodes or any of the GPU resources that are much in demand, or when users want to ensure that resources will be available before a paper deadline, for a demonstration or a class.
The improvements to the project include a new resource type that is called “device” which allows to reserve a Zun compute node and with that to run containers on those nodes orchestrated by OpenStack. Blazar also provides with the ability to limit the list of reservable resources to a specified list of projects and with that it turns the platform into a management interface for the users in a sort of private setup as opposed to enroll and share all their devices with everyone in the cloud.
Cyborg – Zun integration
Another problem the team had to solve was organizing access to various “peripherals”, i.e., Internet of Things (IoT) devices, such as cameras, software defined radios, or actuators, available with the edge devices. To support it, the team leveraged the OpenStack Cyborg project. Cyborg allows access to hardware accelerators and helps plugging them into virtual machines through its integration with Nova. As both these accelerator devices and specific interfaces such as camera interfaces as peripherals should be available to connect to containers as well, the Chameleon team added a new device type called ‘peripherals’ to Cyborg as well as a new attach type called ‘OCI (Open Container Initiative)’ to be able to integrate Cyborg with Zun and allow containers to take advantage of specialized hardware that are attached as edge devices.
As mentioned earlier, it is the system the device owners install on a device to plug their devices into the infrastructure as well as share information about the edge device with the central cloud.
This project is still under development and with that there are still challenges to address. Addressing more security scenarios to support more use cases is one of the areas that is on top of the priority list for Chameleon similarly to others in the edge computing space. In this case the central device ownership model is broken as the users can bring their own devices. With that concept the need also arrises to give high level access to the central cloud in order to be able to connect the devices efficiently to perform all the desired tasks — but in a way that reduces the surface of action the device is entitled to.
The Cyborg-Zun integration is not implemented upstream yet though the team is looking into working with the project to share their work.
[email protected] is still under heavy development and there are considerations in several areas such as networking, to make the setup more agnostic to be able to leverage innovative solutions such as software-defined-radio that they are currently connecting as a peripheral. There are several ongoing partnerships in the networking area as well that will be utilizing the testbed which will need improvements in tooling, connectivity and security.
If you are interested in listening back to the session, you can find a full recording on the OpenInfra Edge Computing Group’s wiki page.
Some of the above topics will be discussed at the upcoming Project Teams Gathering Event (PTG) as part of the OpenInfra Edge Computing Sessions as well as the sessions of the respective projects that were mentioned in this article. Please visit our etherpad for further session details and don’t forget to register for the event to receive all the latest updates. See you at the PTG!
About OpenInfra Edge Computing Group:
The Edge Computing Group is a working group comprised of architects and engineers across large enterprises, telecoms and technology vendors working to define and advance edge cloud computing. The focus is open infrastructure technologies, not exclusive to OpenStack.
Get Involved and Join Our Discussions:
- Weekly Meetings
- Join the Mailing List
- Cloud Edge Computing: Beyond the Data Center White Paper
- Edge Computing: Next Steps in Architecture, Design and Testing
- Is It Edge or Just a Piece of a Large Distributed System? – Part 2 - December 14, 2022
- Is It Edge or Just a Piece of a Large Distributed System? - December 6, 2022
- What does it take to bring and operate your edge in production? — Day 2 - May 23, 2022