Students from the Doha campus at Carnegie Mellon University participated in a university partnership program where they were mentored by OpenStack Swift contributors. The first time a UPP has focused on Swift! Mentor Matthew Oliver summarized his experiences with the students and co-mentor, Tim Burke. If you’re interested in more information about the University Partnership Program (UPP) reach out to [email protected] with your questions or check out the UPP website.
OpenStack Swift isn’t a project that one can easily jump in and submit patches, well you can, but to be effective there is a level of onboarding that needs to take place first. And I’m very happy to say everyone did great, they all seemed to really start understanding the distributed and eventual consistent nature of Swift and all contributed to make some real changes to the Swift codebase. All the projects we set are real feature requests from the community, so not only are they making OpenStack Swift better, but they are helping real operators out in the world!
Evolution of Communication
I live in a very remote part of the world, well at least in our industry. And after many years working remotely for teams based all over the world, effective communication is a big topic for me. The students coming from the Qatar time zone may find similar issues as me so effective asynchronous communication is paramount. This was also evident in this summer internship project. I myself was in Australia, but had a better overlap with the students then Tim who is based on the west coast of the US.
We managed to meet via OpenDev’s Meetpad (Jitsi) once a week and then other questions and guidance was via more asynchronous communication mechanisms as we got them established. Initially email, but as the students build up their dev environments and onboarded into the project also via Matrix/IRC and eventually into the code review system, Gerrit, itself.
Some extra effective upstream learnings I tried to drive into them:
- Transparency
- push up code early, even if it isn’t ready.
- We are all code-blind to our own code, and we need more eyes on it to keep up the quality and reduce bugs. But mostly to make the code even better!
- Don’t be scared of how my code looks. Take the vanity out of it.
- Do as much in the open as possible. In the project channel, email or code review system.
- Reviewing code and asking questions in the code review system is a great way to learn about codebase and be a better dev.
- push up code early, even if it isn’t ready.
- Project first
- Have a project first mindset, we’re all trying to make Swift better.
- A -1 review in Gerrit isn’t a bad thing. Just means it isn’t ready to merge. Everyone gets -1’s, There are usually many patchsets before a change is ready to land. This is normal, it’s how we get great quality code.
- Ask questions
- It’s a big code base. There are no stupid questions and people are always happy to answer what something means, or why they did what they did.
- Asking questions in code reviews of other devs via a +0 vote is great, especially if you want to know more about that particular area of the codebase. It’s how we’ve all learnt.
- Have fun
- The community is a great bunch of people, and it should be fun. You have a dev environment you can easily reset. So go try patches out, break things, go have a blast.
Onboarding
As I already mentioned, Swift is a distributed system and there are a lot of moving parts. So having an understanding of all those parts is essential. So the first few weeks were devoted to onboarding and building up dev environments. I first started with a very high level of Swift, object storage in general, and meeting all the conceptual storage server types in the cluster and the flow of requests through the cluster. I’ve been working on OpenTelemetry integration into Swift, so I could actually show live paths requests took through the cluster, which I think worked well.
We talked about the WSGI framework and how one of the most important features of Swift is how easy it is to extend via WSGI middlewares. How many of Swift’s features are in middleware so operators can add their own code and/or turn off whatever they don’t want.
From there we delved deeper into the Swift rings that are used by Sift components that allow Swift to scale without any central brain. Next the consistency engine behind the scenes that is checking and repairing objects on disk.
Alongside the onboarding, everyone had a go of manually setting up a ‘Swift all in one’ (SAIO) dev environment following the upstream docs. This allowed all the students to get a taste of Swift configuration and learn more about the components. Later they could just use the ‘vagrant SAIO’ (vSAIO) that all the devs use as it automates the process. A SAIO simulates a 4 node cluster and has helper scripts to reset your Swift environment, making a great sandbox for dev work, exploring and more fun… breaking the system.
As we went along the students also followed the upstream docs and started appearing on the Swift project IRC channel, using the Matrix to IRC bridge so they were permanently connected and aided in asynchronous communication with the upstream community.
Projects
The projects were real feature requests and I picked a bunch that seemed manageable, interesting and tended to follow the normal progression I see devs moving as they work on Swift. Starting out working on existing middleware in the proxy WSGI pipeline, moving to the consistency engine itself and starting to work on a new API whose starting point is an operator tool to start enabling a long awaited feature in Swift.
The team surprised me and asked for more to choose from as they got the first one completed quickly and wanted to sink their teeth into more before moving to the harder ones.
The original plan was for the team to work as a group to get the first few done, then split into smaller groups and each tackle a different project…but turns out we ran out of time. Although, I’d like to note I’ve heard from multiple students that they plan to finish what they’ve started and some are even interested in continuing to pick up some of the other projects I put forward!
Add object-count quota for accounts in middleware
There is an existing account quota middleware that provides soft quotas based on the number of bytes in an account. But there was a feature request to also provide a quota on the number of objects in an account too. This main use case for this as provided by the requester was due to filesystem inode limits.
The entire team worked on this together. I know each had a go of writing tests and discussing how to implement it in the middleware. There was some good refactoring and this patch led to a deeper dive into Swift tests, the different types of tests (unit, function and probe) and how do you unit test in a distributed system that involves responses from other servers.
This middleware is an older middleware in Swift and as such the API namespace we use is older. There have been discussions about modernising this to take advantage of the Swift sysmeta namespace. Currently this patch just extends the existing quota namespace to keep using the older meta namespace. And as Swift always tries to support backward compatibility, it would be better to migrate object-bytes quotas into the new namespace and only introduce object-count into the new namespace to reduce backwards compatibility code.
This additional work was considered out of scope, and something we can come back to.
The above code has landed, making all the students official contributors to Swift. I suggested if there is time, a follow up patch could be provided to migrate to the new namespace and if we land that before the next Swift release there would be no increased backwards compatibility burden.
I’m very pleased to announce that the team picked up this follow up only a few weeks ago, and we are currently working on the namespace account quota migration patch.
Account quota meta -> sysmeta migration patch
This strictly didn’t come next by the team, they picked this up only last week. But it’s related so I’ll talk about it now. The `X-Account-Meta-*` namespace is reserved for users to put whatever metadata they want. There is also an `X-Account-Sysmeta-*` namespace that can only be used internally to the cluster to store system metadata on accounts. When the account quota middleware was created, sysmeta didn’t. So the middleware used the former. It’s also prudent to mention that only admins can set account level quotas, not the users themselves. But it means if someone happened to set `X-Account-Meta-Quota-Bytes: <some number>` before adding the account quota middleware. This user defined one would be used.
So this patch migrated the old object-bytes and new object-count over to the new location as well as support backwards compatibility for object-bytes. It is currently still in review, but looking great so far.
Periodic reaper recon dump showing current progress
The account reaper looks for deleted accounts and works through all the accounts containers and objects and makes sure they’re all deleted. As you can imagine this could take a while. So there was a request to log the progress.
This led us to talk about logging, metrics and other ways of getting information out of a distributed system, especially in Swift. Logs are always after the fact, so it would be nice to get more live information. So an operator can see why, say a reaper daemon, is taking so long.
Swift has an additional location where it can dump process and state information to disk periodically that can be picked up by the operators via the reconnaissance sub system. So this patch uses that. It adds a watcher thread that on an interval will dump the current state of the reaper letting us know it’s progress.
This led to interesting discussions about python, threading and green threading. I am very impressed with the questions being asked and work here.
The earlier patch sets had less of a scope and looked good. The later patchsets are getting even better. Adding more information like the current container’s being worked from multiple green threads. This hasn’t landed but is getting much closer. SRE and ops are going to love this addition!
Container storage-policy modification operator tool
When a container is created the user can pick a storage-policy. This is the policy in which the objects will be stored when put into this container. I.e some replication or EC policy.
But currently, this can only be set at creation and after that not changed. As part of the Swift consistency engine we have a special daemon called the reconciler that is used to move objects in the wrong storage-policy to the right one. Currently it’s used when container replicators find objects for a container in the wrong policy. How this can happen was a great discussion point about eventual consistency and network split-brains.
There is a lot of interest in the Swift community to add policy migration. And we have some of the bits and pieces in place, but there is still a lot of work to be done to get this feature fully supported for users. A step on that road is to provide an operator tool, a CLI tool available for Swift operators/SRE to use internally of the cluster, to change the policy of a container on disk and trigger this migration.
We spent time discussing policy migration, what it would look like, and how this project could easily keep going.
The team here really surprised me, thinking outside the box and taking learning from the account quota patch and thinking about where we want to go with this in the future. They created not only an ops tool but also an admin only API allowing an admin to send a POST to a container and change the storage-policy. Giving operators the ability to trigger from inside or outside the cluster.
Further, it gets us closer to having a policy-migration API available to users. Once we’re ready we can simply remove the admin guard.
This has led us into discussions about thinking about possible operator workflows and requirements.
This is still in review, but patches like this can take some time to go back and forwards before landing. And might also require a new type of testing they haven’t played with in Swift. A probe test. Which is a very opinionated testing framework (using a SAIO) that has the ability to actually simulate a situation and probe into the backend/disk to assert the correct behaviour.
Container-Sharder concurrency improvements
We never quite got to writing code for this one. But we did discuss it a lot. And would involve breaking up a very serial driven daemon into one that would concurrently work through containers on disk. Refactoring out and forking into workers.
The sharder daemon was purposely developed to be very serial, as it made it easier to debug and get confidence in.. now it’s just too slow and it needs to finally be made concurrent.
We’ve had discussions on how this could look. How the sharder could divide the work amongst workers. Forking processes vs python eventlet green threads.
I mention this, not just because it led to deep and great discussions but also because I’ve been told some of the students are still keen to tackle this project after the summer internship ends.
Students
I’d love to go into more detail about the students. They all did really great. And they all impressed me. We mostly talked face to face once a week, so I did get a chance to talk quite a bit. But some were very quiet. Which is ok. But it does make it harder for me to talk about them personally. In hindsight, maybe I should have split them up sooner, so I’d get to know more about each of them through their code.
Because unfortunately we never got a chance to break into the smaller teams like we’d planned, they all basically work together on all the patches. So I need to talk about them as a whole team, which is one of the reasons I laid out this report the way I did.
As a group, they were all very punctual to every meeting, I was impressed with the code they produced. To be honest, I expected a lot more back and forth on patches then we did. But their level of code was much better than I anticipated!
Working as a team they minimised code blindness, saw gaps and really produced some great code!
Best of all, OpenStack Swift is already a much better product today thanks to their efforts!
- October 2024 OpenInfra PTG Summary - November 7, 2024
- H3C’s Use of the Open Infrastructure Blueprint - October 23, 2024
- China Mobile’s Use of the Open Infrastructure Blueprint - October 9, 2024