During the recent Project Teams Gathering (PTG) event, the OpenInfra Edge Computing Group pivoted from edge theory, such as developing use cases to the practicalities of turning theory into the reality of building, delivering and managing edge deployments. This time the focus was on the how, that is the requirements for preparing, deploying and operating edge infrastructures in production. Attendees addressed questions such as, is it just like any other distributed system, or is edge deployments significantly different?
The sessions were structured with well defined themes borrowed from telecommunications industry terminology.
- ‘Day 0’ — What is needed to develop the product and the requirements for preparing to deploy the infrastructure and services; everything but not the actual deployment.
- ‘Day 1 ’ — The requirements and logistics of the actual deployment phase. That is getting the systems out in the field, which might seem straight forward, but nothing is like that when you talk about edge!
- ‘Day 2/n’ — This covers the challenges of running the end-to-end edge infrastructure in production.
The discussion on these wide ranging topics were fast and furious, but the outcome included lots of great ideas and potential new work in several OpenStack projects.
‘Day 0’ – Product development and preparation
At the ‘Day 0’ sessions we explored the challenges of building the frameworks to support production deployments supported with war stories highlighting the issues. As common practice the discussion circulated around templates of different kinds and highlighted automation opportunities and needs. Heat templates also came up during the conversation as some attendees were running OpenStack and have been using that functionality. Using templates is a balancing act to find the right level of detail, while making it general enough so that you don’t have to create separate templates for each use case. How standard should they be? How many templates are the right amount?
Automation was a big theme during all the discussions. What is the right level? Can you automate the creation of templates? Should you version your templates? How can automation be used to manage changes in the system, such as modifying a setup that might include binaries or license files to multiple sites with a similar but not quite the same configuration? Adding the site differences to the packages points back to the difficulties of creating good templates as they are like blueprints. What does a good blueprint look like? What are the variables to inject? There was agreement that edge use cases are unique in their continuous and somewhat organic growth that can be tricky to manage. One attendee’s solution was size-based templates with 4 levels. Another approach was to use Heat templates in TripleO with composable roles, such as a cache resolver. Something to keep in mind when designing an edge solution is that there is often no spare capacity or it is hard to access so you need to build in guardrails in cases where multiple users will share an edge site.
Network connectivity can quickly get complicated, but understanding how it can be incorporated into the automation without needing a new template every time a service gets deployed at a new location is key. One approach is to create a more generic template and include site specific variable entries in it, which could even possibly be site-generated.
After automation, validation and testing sparked a lively discourse. The templates need to work correctly, therefore much testing and validation needs to be done when developing them. For instance, sometimes hardware is mixed up and servers get delivered with different sets of interfaces than expected by the templates, or there can be issues to the firmware. When boxes are delivered with the same Out-of-Band (OoB) management information, like the same default IP address, you need to power them on in the correct sequence. Speaking of the OoB connection, DHCP is commonly used for bootstrapping. Boxes can also come with a “phone-home” function with built-in information that sets up the connection and allows the box to talk to the local orchestrator or control plane function. It is also worthwhile to note that not every component at site is in the same stage, like a new box can be in ‘Day 0’, while the local control plane might be in ‘Day 2’.
Overall the conclusion was that validation and testing needs to be an embedded part of the solution from ‘Day 0’ to ‘Day 2’ to avoid issues in the field, which could happen if the system was tested out in the field for the first time. At the same time, if a reboot is needed once the components are brought up on site the validation process needs to be able to run again to ensure that every service came up properly and the systems and images have not been tampered with. Securing an edge site is crucial. For example, key management needs to be designed with edge in mind. Some considerations include storing keys off site or have an external secret cache with a Time To Live (TTL) parameter set.
The session finished up with a few more key details about setting up and running edge sites. For instance, edge sites are often in remote locations with sometimes unreliable connections which makes them hard to reach and manage by nature, so it is desirable to minimize changes to the systems’ configurations.
If you’re ready for deployment, check out the summaries of the ‘Day 1’ discussions!
If you missed the event and would like to listen to the sessions you can access the recordings on the OpenInfra Edge Computing Group wiki. You can also find notes on the etherpad that we used during the event. The group has a list of follow-up discussions and presentations scheduled already! Check out our lineup and join our weekly meetings on Mondays to get involved!
About OpenInfra Edge Computing Group:
The Edge Computing Group is a working group comprised of architects and engineers across large enterprises, telecoms and technology vendors working to define and advance edge cloud computing. The focus is open infrastructure technologies, not exclusive to OpenStack.
Get Involved and Join Our Discussions:
- Is It Edge or Just a Piece of a Large Distributed System? – Part 2 - December 14, 2022
- Is It Edge or Just a Piece of a Large Distributed System? - December 6, 2022
- What does it take to bring and operate your edge in production? — Day 2 - May 23, 2022