Takeaways from OpenStack’s Mid-Cycle Ops Meetup, Liberty edition

PALO ALTO, CA. — We’ve had a fantastic couple of days at the OpenStack Operators Mid-Cycle Meeting. Two hundred operators, users, and developers came together to discuss how they’re deploying and maintaining their OpenStack clusters.

The free meetup took place halfway through the development cycle for the upcoming [Liberty OpenStack release. Sponsored by Hewlett-Packard Helion and GoDaddy, the event drew participants from across North America and as far away as Japan and the U.K. The meetup used a collaborative model, where a moderator led the room in a shared discussion about best practices, stumbling blocks and operations innovations.

https://twitter.com/jjstevensjj/status/633689298668273665

The first day started with a discussion lead by Joe Topjian in which operators shared their tips for getting the best performance out of their hypervisors. The conversation — you can check out the proceedings on the session etherpad — ranged from tuning disk, scheduling, memory, and kernel parameters, with tips for getting the best performance out of your OpenStack hypervisor based on your compute needs and underlying technology.

https://twitter.com/daveixd/status/633685132977704960

This talk was followed by a number of breakout sessions. The Ops Guide team planned an update their excellent operations guide, with plans to update the technical information to reflect the current state of OpenStack add new user stories. The next Operator Mid-Cycle will feature an extra day devoted to updating the Ops Guide in a face-to-face sprint.

The Logging Working Group continued their mission of refining and rationalizing logs. Working with the Jim Blair, the Infra Project Team Lead (PTL), they covered a range of OpenStack log monitoring tools. The Infra team is working on repackaging their own logging tools into a new logging library meant for downstream consumption by operators. They worked on the request/return id spec and hashed out how to document log configuration. The working group has a weekly working group meeting, and encourages people interested in building out logging standards and analysis tools to join them.

.@noggin143 missing you at Operator's Midcycle. We (TWC) just published our Kilo upgrade notes here: http://t.co/wPlr7HgrMm #openstack PAO

— David @[email protected] Medberry 🇺🇦 (@davidmedberry) August 19, 2015

The Upgrades Working Group met to share stories and best practices around migrating a production system from one version of OpenStack to another.

The consensus was that most of the "upgrade pain" is around API upgrades that can disrupt running services. Operators are using heavily customized home-grown tools and scripts to successfully manage software, service, and database migrations. They offered feedback to the development community about adding information to release notes about feature deprecation and dropping, critical bugs, and inter-dependency issues. They cited the Cinder Release Notes as a particularly good example. Cinder, along with Nova, was also cited as leading a successful charge in making database migrations compatible across neighboring versions of OpenStack.

Nice points by @davidmedberry during the @OpenStack Large Deployment discussion. #OSMidCyle

— Edgar Magana (@emaganap) August 18, 2015

The Large Deployments and Public Cloud Working Groups continued a conversation they started in Vancouver. They offered blueprints and interactions with the Neutron developers about needs of large deployments, including network segmentation. Kyle Mestery, the Neutron PTL, attended from remote to work with the operators on their needs and experiences. Other issues including the scale of clusters and the difficulty of naming things (for example, the meaning of the word "region" may differ between operators). The session wrapped up by choosing talking points for the Tokyo Summit. With such tremendous progress made and fruitful collaboration with the networking team, they now face the task of determining their new collective angst.

Working from a list of brainstormed in the morning introductory session, the Burning Issues group covered a range of topics — from "smoldering" to "tire fire." The lively session started with the state of Neutron, reaching a consensus that more hands-on work and tutorials are essential for an understanding of not only how to set up a dynamic network stack, but to also debug and maintain it.

From there, the conversation moved to strategies for capacity management and monitoring. Anish from RabbitMQ spoke about the roadmap for the message queue, and Morgan Fainberg, Keystone PTL, also talked about the upcoming release, the need for more granular roles and how to scale Keystone to larger deployments. The session wrapped up with discussions about compliance and tricks to troubleshoot problems.

After lunch, a full session was dedicated to container-based deployments. Those using containers to manage their deployments praised the ability to control conflicting Python dependencies during upgrades through container isolation, scale out services, stage and test new systems, and build a single artifact for development, testing, and production. Containers aren’t suitable for full deployments, though, and the suggestion was to go with more traditional deployment methods for things that are still difficult to do with containers.

Day One wound up with a set of lightning talks with stories about deployments, upgrades, the infra cloud, billing, testing, client libraries, and how the operator community has contributed back to the development community. A full list of the talks and their slides is available on the lightning talks etherpad.

Day Two kicked off with a session on integrating OpenStack deployments into configuration management databases, followed by deployment tips lead by Matt Fischer of Time Warner Cable. Config management, orchestration, database configuration, message queue tuning and load balancing were just some of the covered topics.

The config management session naturally flowed into a full session devoted to networking led by Edgar Magana of Workday. The maturity of Neutron was on full display, with only one deployment of those surveyed still running nova-net (mainly because it meets their internal needs, and there’s no pressure to upgrade yet). There’s a wide variety of neutron deployments out there, using almost every type of network backend available. High availability with DVR still isn’t widely adopted, but is one of the most eagerly-awaited features.

Next up was a session devoted to the work of the User Committee, the official group that reports to both the Board and the Technical Committee about the issues and needs facing the user community. Topics included updates to the user survey, product and working group feedback, and how to better recognize the contributions that Operators make to the OpenStack community that go beyond patches and reviews — free Summit tickets and stickers, anyone?

The working sessions concluded with another round of breakout workshops. The Tags Working Group continued their analysis of to contribute to the new project tagging process. During the session, they proposed a new tag: "containerizable," truly a sign of the times.

The Product Working Group made great progress on identifying and refining user stories, defining a personal taxonomy for consistent user experience evaluation, and drafting recommendations for future cross project work, including a proposal to break the new Graffiti project out of Glance.

The Packaging Group shared how they manage packaging the source, system admin, and configurations for their OpenStack deployments. This included how to manage packages across multiple versions, testing, package lifecycles and external dependencies. They expressed a common goal of being able to manage the complex an ever-shifting dependency tree, as well as easily deploy bug-fixes, security patches, and backports into running systems.

Couldn't make it here? There's an #OpenStack Ops Monitoring/Tools Working Group meeting on IRC every other week: http://t.co/wXQwt2xRns

— Elizabeth K. Joseph (@pleia2) August 19, 2015

Matt Young of HP led the Tools and Monitoring session. Participants covered an impressive number of topics — 24 four in 90 minutes, or about three-and-a-half minutes per topic. Capacity planning, live migration, metering, and testing were a few of the tools and techniques you can use to keep your OpenStack cloud healthy.

Thanks to all those who traveled all this way to come to the #OpenStack operators mid cycle that's awesome! #OSMidCycle

— Shilla Saebi (@ShillaSaebi) August 18, 2015

The meetup wound down with a feedback and planning session lead by Anupriya Ramraj, also of HP. Thanks to everyone who attended and participated in the sessions. The Operators Meetup truly embodies the collaborative spirit of the OpenStack community. Special thanks goes out to the OpenStack Foundation staff, Tom Fifield and Allison Price, for organizing and running the meetup.

If you missed this one, you can get involved by signing up for the operators mailing list and sharing your own experiences with setting up and running your OpenStack cloud.

Cover Photo by Bigal101 // CC BY NC