Talal AlBakr explains how a $38 billion telecom company in Saudi Arabia used OpenStack to create a massive public cloud.

image

There are plenty of lessons to be learned when deploying a huge OpenStack-based cloud at a massive telecommunications company in the Middle East.

NetApp’s Chris Burnet and Saudi Telecom’s Talal AlBakr discussed their successes and challenges during the process, detailing their reasons for building it in the first place, the services offered through the deployment, and the operational parts of the business, like billing and service management at the OpenStack Summit in Sydney.

NetApp, known for its storage solutions,  has been transitioning to the data management cloud space recently. NetApp helps other companies build OpenStack-based clouds, as well as moving data to and from them. NetApp has been working with OpenStack since the early days, as a charter member that has sponsored every summit.

Saudi Telecommunications Company

Saudi Telecom’s vice president of cloud services, Talal AlBakr, said that his company’s experience with the cloud was an interesting one. “I believe that we’ve really approached it in a unique way,” he said. “We’ve made mistakes. It’s a learning process. It’s a journey. We’ve evolved. The team has really come a long way from where we started in the beginning and where we are today.”

Saudi Telecom Company (STC) is one of the largest companies in Saudi Arabia, said AlBakr. It’s also one of the largest telcos in the world, in the top 20 or 30 globally. “And part of that initiative of course for all telcos is they like to always change and start to venture into new technologies or what we call next generation ICT,” he said. Some of the core competences around information and communication technology are the cloud, the internet of things, big data, and artificial intelligence.

Even just three years ago, said AlBakr, there was no real push for a public cloud. As costs increased, the government wanted to find new ways to move from a capital expenditures model to a operating expenses model. “And that’s when we started to see an opportunity to work on developing what we needed to do and how we could approach the market in a different way,” AlBakr said.

Saudi Arabia doesn’t have many official regulations about data sovereignty, he said. “There are unofficial regulations around data sovereignty, and government entities had to be within the local jurisdiction to be able to benefit from multiple things and to be accessed by different law enforcement agencies,” said AlBakr. “So the idea was to build something around that and to help them with that.”

Recently, a new initiative was announced in the region for a city of the future, called NEOM. “It’s a five hundred trillion dollar initiative,” said AlBakr, “a city (shared) between Saudi Arabia, Jordan and Egypt. It’s supposed to reflect how the push for the future will be. We hope to be a very important aspect in delivering on that future and pushing for that aspect.”

The company aimed for a launch date of late 2015.  The team chose OpenStack to avoid vendor lock-in, said AlBakr, and an extremely attractive price point. STC was getting a lot of government initiatives, which helped. With such a new undertaking, however, STC needed to find a lot of talent, a scarce commodity in the current Saudi tech landscape, said AlBakr. “The open source community in Saudi was extremely small,” he said, “and the development community, or what we had with regards to open source, was around urbanization of traditional technologies and traditional projects..so venturing into this was for us an extremely big risk.”

STC found a partner in Mirantis after analyzing the offerings of multiple distribution vendors. “Mirantis offered a lot of flexibility,” said AlBakr. “Other vendors, or other distributions were becoming more closed, more tied into specific eco-system requirements. We wanted the agility and the flexibility to move in projects as we saw fit, based on business requirements and on needs that the industry pushed our way.” That Mirantis a top contributor to OpenStack and a gold member of the Foundation was also important to STC.

AlBakr and his team began with its first iteration of the STC public cloud in 2015, calling themselves Bluvalt to distinguish it from the main STC eco-system. “We built a cloud based on Mirantis OpenStack six, which was in closed beta mode,” said AlBakr. “We launched, invited specific government agencies and large corporations to come into the cloud for free and started to onboard them.” Blubalt did all of that without building requirements, pricing models, or other traditional startup activities.

A second shift brought STC to a more mature architecture, based on automation and integrated billing with STC. “It was based on an eco-system that we created for partners to support the cloud and build on it,” he said. “Right now, we’re at our third iteration  the cloud and it’s our mass-market production, we split our cloud into two different clouds, virtually.” One cloud serves the enterprise and the government sectors and the other serves small-to-medium enterprises.

The first iteration, however, used Neutron VLAN segmentation, leading-edge at the time. “It caused us some issues as we started to progress,” said AlBakr. “This is one of the other aspects that we see as a work-in-progress in the OpenStack (community).”

The second iteration used Mitaka, which the team found to be fairly stable. Once they moved from Neutron Contrail to OpenContrail, many pieces fell into place. “This is where we started to see our offering mature, and we started to see customers get more adapted into what we started to do,” said AkBakr. “In our iteration in deployment one, we had a lot of issues around networking, around structure, and one of the biggest aspect was, when we started working on our beta and onboarding customers, we reached a situation where we got so much demand that we couldn’t leave beta anymore, and we went full production on that beta environment, which in all honesty was, it was a good and a bad mistake at the same time.” Good because it let STC approach the market very quickly; bad because the design was extremely under-optimized.

The third iteration is set to launch in the first quarter of 2018 with a new Mitaka cloud. The initial cloud is unable to be upgraded anymore and STC has had to build a cloud next to it, then migrate everyone across to it. “This is one of the things that we’re seeing as a draw back within OpenStack,” said AlBakr, “but with the new iterations of the different technologies that we’re seeing with Mirantis, especially on the Mantis cloud platform, it’s becoming a lot more stabilized. Hopefully with containerization we’re going to see things get more streamlined and moving in a better manner.”

The STC cloud runs a ton of projects, according to AlBakr. “We have Swift, we have Neutron,” he said, “we have all these different solutions, OpenID, Keystone, all these different projects running in order to authenticate and to control the whole environment.” As the cloud evolved, the team started moving things around in order to optimize things. OpenContrail and Juniper Contrail were important in that regard, while routing and configuration management are based on the REST API for OpenStack.

The team began to do control analysis when customers started having performance and configuration issues. “With traditional technologies we had problems troubleshooting that,” said AlBakr. “When we started off with our initial customer set, it was manageable because it was a handful of customers, but as we started to progress, it started to blow out of control. And this is where we started to really relying on the updated technologies like OpenContrail.”

NetApp is one of STC’s biggest partners, in terms of OpenStack. Initially, the organization’s entire storage infrastructure was based on FAS and ONTAP technology. As the project progressed, several barriers presented themselves, which caused AlBakr’s team to consider newer technologies. That’s where SolidFire came into play. “SolidFire for us was extremely attractive based on the modular growth capabilities,” said AlBakr. “FAS was an excellent technology but the problem was (how) it limits you to the control of capability. With SolidFire, you just add more nodes as you progress, you get more capacity, more performance, and for a service provider that’s very important.”

A secondary issue arose. “Our Swift Object store was not optimal. It was providing only Swift capabilities. We were getting a lot of requirements from our customers for S3 capabilities and for NFS. So we were facing a lot of different issues around that and this is where we started looking into the StorageGRID.” Removing away from the Swift package to an isolated StorageGRID environment has allowed AlBakr and his team to take care of the S3 and Swift connectivity aspects of their project, offloading it, as it were.

To manage the environment, STC uses a lot of different technologies that come within OpenStack. They monitor their databases, their back end and front end with Grafana and other third-party applications. They have many in-house developed apps that they monitor different aspects of the environment with, including fault tolerance and performance management tools like Elasticsearch, Kibana and others.

One of the most important projects in the STC cloud is its marketplace, what AlBakr calls the “bloodstream of what our cloud is.” As a public cloud, STC needed something it could fully customize to be able to bring in a better price point for its customers, to make it more attractive than, say, Amazon or other providers. “So we launched our marketplace on OpenStack,” he said. “It’s in-house developed. We have a team of around 40 developers working on it continuously, coming up with a lot of updates. We’ve introduced billing per hour, instead of per month and to be more granular, it’s all done within our marketplace. “

The team has also introduced integration with the STC billing system as well as integration with credit card billing, adding even more layers to their system. Both internal teams and external customers utilize the service order management system in place. They also integrated a bus and API environment that can pull all the information from their compute, SDN, object storage, and block storage plans to send it out to STC’s internal or credit billing systems.

The team broke it down into three different segments. The first segment, cloud service providers, got a community for developers to onboard their applications onto the marketplace as a launching pad or app store. That helped empower the local developer community to approach the market with custom-made applications specific to the market. Secondly, STC built a cloud integration partner program that brought system integrated partners get onboard, get certified and then work with customers to build and integrate their own environments. STC even went one step further and do managed services. “Now all of this was under the guidance of our cloud,” said AlBakr. “We took care of the billing. We took care of the project management, and we made sure that the project continued and progressed in the right way. And, it allowed us to really cover a big reach of customers and it allowed us to touch base and to ensure a customer satisfaction ratio that was higher than normal, because we had an additional eco-system to support us.”

Finally, the team defined a segment it calls cloud technology partners. It’s based on technologies that STC has integrated into the marketplace to tackle specific different requirements, like backup, load balancing, and other such services.”This also allows to provide a more complete cloud experience for our customers instead of having them go and buy it from a third party and bring it in,” said AlBakr.

In essence STC created a program. “It gets integrated. It gets streamlined, and it’s all done through our marketplace,” said AlBakr. “Our approach to the marketplace is a little different than traditional approaches. We’re not really trying to just provide different solutions around databases and so on and so forth, we’re trying to provide actual on-premises replacement concepts.”

At the end of the day, STC now has an end-to-end cloud solution. The company provides the mechanism for mechanism, infrastructure, and platform as a service. “We’re launching a lot of services in the next few months, and we’re at a rate of around, I think, five services a month,” AlBakr said. “Most public cloud providers need to keep an eye on the big guys (and) what they’re doing. We’re trying to progress as fast and as quick as we can to keep up with them and to cover as (many) services as they cover.”

As an example, Saudi Telecom Company is able to provide layer three IPVPN. This gave the telco leverage with customers. “So when we sit with customers, we tell them,we’re going to sell you data, internet connectivity and access,” he said. “And in addition to that, we’re going to give you a VPN layer that allows your data center to be part of our data center without going over the internet layer. So this for a lot of customers is very critical because it allows for a more streamlined, integrated solution within their systems. It doesn’t feel like they’re going outside and into the cloud and things like that. For us that’s a very good differentiator and it makes a real different for a lot of our customers.”

OpenStack was a good experience for STC. It allowed the telco to reach extremely attractive price points, with flexibility and plenty of capabilities to offer, which can continue to expand as the company progresses. “We don’t know why people keep saying OpenStack is dying,” said AlBakr. “We see it today flourishing more than ever, and we are all in with OpenStack. We feel it’s the direction of the future. Our market place was a key differentiator for us, and it continues to be within the market. We believe that our solution is extremely unique.”

The marketplace was a huge success, as well, becoming so widely known and praised that STC is looking to commercialize it and sell it to other entities as well as use it for other groups within its own organization. In addition, AlBakr said, storage block, automation and the cloud service provider partnerships have gone very well for the company.

Difficulties

What didn’t go so well was that the initial compute calculations were off and prices were extremely high. “We didn’t understand the intricacies and the different dynamics of how to really price point different aspects in the right way,” said AlBakr. “But as we progressed, we were able to do pricing in the right way to be in line with global prices.” Not that they’re necessarily cheaper, but the prices are much more attractive now and allow a different approach with customers.

The team had issues with migrations around object and block storage, and performance issues as a result. “That’s why we’re addressing it right now with NetApp and StorageGRID,” said AlBakr, “and we’re trying to move to that.” The team is going all-in with Object and has the potential of six petabytes of storage capacity it can grow within the next few months, in addition to the existing storage it already has.

There were also I/O problems, including issues with getting logs. In addition, Microsoft licensing has been a problem as well. “Microsoft licensing, specifically for support is extremely hard on OpenStack,” said AlBakr, “and I think for all cloud vendors.”

Red Hat became another issue because of its lack of support for Mirantis. It took the team aback, said AlBakr. “Why? Red Hat is a leader in the OpenStack community,” he said. “It’s a leader in the open source community but it’s doing all these different aspects to leverage itself. For us it was extremely negative.”

Intially, STC didn’t have availability zones, which caused some problems, thought they’ve started to implement them. There were problems around Boot from image systems as well as customer image and size requirements. VLAN segmentation before they implemented SDN was an issue, too. The upgrade from Liberty to Mitaka wasn’t the smoothest upgrade in the world, said AlBakr, and caused a lot of problems.

A bigger issue, says AlBakr, is the perception of OpenStack as an enterprise play rather than one suited to a public cloud.

“So when we have discussions on downtime, people say, ‘Okay, give me a two or three hour downtime window.’ And I’m like, ‘No. This is a public cloud. I can’t afford a two or three hour downtime window. If I get two, three hour downtime window, that means I lose my customers, because some people have their whole business running on this cloud.’” He’d like to see this perception change and evolve with OpenStack in general.

You can catch the entire talk on the video below.

 

Cover image courtesy STC Instagram feed