News Archive


Limitations of the Layered Model of OpenStack

One model that many people have used for making sense of the multiple services in OpenStack is that of a series of layers, with the ‘compute starter kit’ projects forming the base. Jay Pipes recently wrote what may prove to be the canonical distillation (this post is an edited version of my response):

Nova, Neutron, Cinder, Keystone and Glance are a definitive lower level of an OpenStack deployment. They represent a set of required integrated services that supply the most basic infrastructure for datacenter resource management when deploying OpenStack. Depending on the particular use cases and workloads the OpenStack deployer wishes to promote, an additional layer of services provides workload orchestration and workflow management capabilities.

I am going to explain why this viewpoint is wrong, but first I want to acknowledge what is attractive about it (even to me). It contains a genuinely useful observation that leads to a real insight.

The insight is that whereas the installation instructions for something like Kubernetes usually contain an implicit assumption that you start with a working datacenter, the same is not true for OpenStack. OpenStack is the only open source project concentrating on the gap between a rack full of unconfigured equipment and somewhere that you could run a higher-level service like Kubernetes. We write the bit where the rubber meets the road, and if we do not there is nobody else to do it! There is an almost infinite variety of different applications and they will all need different parts of the higher layers, but ultimately they must be reified in a physical data center and when they are OpenStack will be there: that is the core of what we are building.

It is only the tiniest of leaps from seeing that idea as attractive, useful, and genuinely insightful to believing it is correct. I cannot really blame anybody who made that leap. But an abyss awaits them nonetheless.

Back in the 1960s and early 1970s there was this idea about Artificial Intelligence: even a 2 year old human can (for example) recognise images with a high degree of accuracy, but doing (say) calculus is extremely hard in comparison and takes years of training. But computers can already do calculus! Ergo, we have solved the hardest part already and building the rest out of that will be trivial, AGI is just around the corner, and so on. The popularity of this idea arguably helped created the AI bubble, and the inevitable collision with the reality of its fundamental wrongness led to the AI Winter. Because, in fact, though you can build logic out of many layers of heuristics (as human brains do), it absolutely does not follow that it is trivial to build other things that also require layers of heuristics out of some basic logic building blocks. (In contrast, the AI technology of the present, which is showing more promise, is called Deep Learning because it consists literally of multiple layers of heuristics. It is also still considerably worse at it than any 2 year old human.)

I see the problem with the OpenStack-as-layers model as being analogous. (I am not suggesting there will be a full-on OpenStack Winter, but we are well past the Peak of Inflated Expectations.) With Nova, Keystone, Glance, Neutron, and Cinder you can build a pretty good Virtual Private Server hosting service. But it is a mistake to think that cloud is something you get by layering stuff on top of VPS hosting. It is relatively easy to build a VPS host on top of a cloud, just like teaching someone calculus. But it is enormously difficult to build a cloud on top of a VPS host (it would involve a lot of expensive layers of abstraction, comparable to building artificial neurons in software).

That is all very abstract, so let me bring in a concrete example. Kubernetes is event-driven at a very fundamental level: when a pod or a whole kubelet dies, Kubernetes gets a notification immediately and that prompts it to reschedule the workload. In contrast, Nova/Cinder/&c. are a black hole. You cannot even build a sane dashboard for your VPS—let alone cloud-style orchestration—over them, because it will have to spend all of its time polling the APIs to find out if anything happened. There is an entire separate project, that almost no deployments include, basically dedicated to spelunking in the compute node without Nova’s knowledge to try to surface this information. It is no criticism of the team in question, who are doing something that desperately needs doing in the only way that is really open to them, but the result is an embarrassingly bad architecture for OpenStack as a whole.

So yes, it is sometimes helpful to think about the fact that there is a group of components that own the low level interaction with outside systems (hardware, or IdM in the case of Keystone), and that almost every application will end up touching those directly or indirectly, while each using different subsets of the other functionality… but only in the awareness that those things also need to be built from the ground up as interlocking pieces in a larger puzzle.

Saying that the compute starter kit projects represent a ‘definitive lower level of an OpenStack deployment’ invites the listener to ignore the bigger picture; to imagine that if those lower level services just take care of their own needs then everything else can just build on top. That is a mistake, unless you believe that OpenStack needs only to provide enough building blocks to build VPS hosting out of, because support for all of those higher-level things does not just fall out for free. You have to consciously work at it.

Imagine for a moment that, knowing everything we know now, we had designed OpenStack around a system of event sources and sinks that are reliable in the face of hardware failures and network partitions, with components connecting into it to provide services to the user and to each other. That is what Kubernetes did. That is the key to its success. We need to enable something similar, because OpenStack is still necessary even in a world where Kubernetes exists.

One reason OpenStack is still necessary is the one we started with above: something needs to own the interaction with the underlying physical infrastructure, and the alternatives are all proprietary. Another place where OpenStack can provide value is by being less opinionated and allowing application developers to choose how the event sources and sinks are connected together. That means that users should, for example, be able to customise their own failover behaviour in ‘userspace’ rather than rely on the one-size-fits-all approach of handling everything automatically inside Kubernetes. This is theoretically an advantage of having separate projects instead of a monolithic design—though the fact that the various agents running on a compute node are more tightly bound to their corresponding services than to each other has the potential to offer the worst of both worlds.

All of these thoughts will be used as fodder for writing a technical vision statement for OpenStack. My hope is that will help align our focus as a community so that we can work together in the same direction instead of at cross-purposes. Along the way, we will need many discussions like this one to get to the root of what can be some quite subtle differences in interpretation that nevertheless lead to divergent assumptions. Please join in if you see one happening!


The Expanding OpenStack Foundation

The OpenStack Foundation has begun the process of becoming an umbrella organisation for open source projects adjacent to but outside of OpenStack itself. However, there is no clear roadmap for the transformation, which has resulted in some confusion. After attending the joint leadership meeting with the Foundation Board of Directors and various Forum sessions that included some members of the board at the (2018) OpenStack Summit in Vancouver, I believe I can help shed some light on the situation. (Of course this is my subjective take on the topic, and I am not speaking for the Technical Committee.)

In November 2017, the board authorised the Foundation staff to begin incubation of several ‘Strategic Focus Areas’, including piloting projects that fit in those areas. The three focus areas are Container Infrastructure, Edge Computing Infrastructure, and CI/CD Infrastructure. To date, there have been two pilot projects accepted. Eventually, it is planned for each focus area to have its own Technical Committee (or equivalent governance body), holding equal status with the OpenStack TC—there will be no paramount technical governance body for the whole Foundation.

The first pilot project is Kata Containers, which combines container APIs and container-like performance with VM-level isolation. You will not be shocked to learn that it is part of the Container Infrastructure strategic focus.

The other pilot project, in the CI/CD strategic focus, is Zuul. Zuul will already be familiar to OpenStack developers as the CI system developed by and for the OpenStack project. Its governance is moving from the OpenStack TC to the new Strategic Focus Area, in recognition of its general usefulness as a tool that is not in any way specific to OpenStack development.

Thus far there are no pilot projects in the Edge Computing Infrastructure focus area, but nevertheless there is plenty of work going on—including to figure out what Edge Computing is.

If you attended the Summit then you would have heard about Kata, Zuul and Edge Computing, but this is probably the first time you’ve heard the terms ‘incubate’ or ‘pilot’ associated with them. Nor have the steps that come after incubation or piloting been defined. This has opened the door to confusion, not only about the status of the pilot projects but also that of unofficial projects (outside of either OpenStack-proper or any of the Strategic Focus Areas) that are hosted using on the same infrastructure provided by the Foundation for OpenStack development. It also heralds the return of what I call the October surprise—a half-baked code dump ‘open sourced’ the week before a Summit—which used to be a cottage industry around the OpenStack community until the TC was able to bed in a set of robust processes for accepting new projects.

Starting out without a lot of preconceived ideas about how things would proceed was the right way to begin, but members of the board recognise that now is the time to give the process some structure. I expect to see more work on this in the near future.

There is also a proposed initiative, dubbed Winterscale, to move governance of the foundation’s infrastructure out from under the OpenStack TC, to reflect its new status as a service provider to the OpenStack project, the other Strategic Focus Areas, and unofficial projects.


What are Clouds?

Like many in the community, I am often called upon to explain what OpenStack is to somebody completely unfamiliar with it. Usually this goes one of two ways: they turn out to be familiar enough with cloud computing to quickly grasp it by analogy, or their eyes glaze over at the mention of the words ‘cloud computing’ and no further explanation is sought or offered. When faced with someone who is persistently curious but not an industry insider, you immediately know you’re in trouble.

And so it came to pass that I found myself a couple of years ago wondering how exactly to explain to an economist why cloud computing is a big deal. I think I have actually figured out an answer: cloud computing can be seen as the latest development in a long trend of reducing the transaction costs that prevent us from allocating our resources efficiently.

(A live-action version of this post from the most recent OpenStack Summit in Barcelona is available on video.)

Cast your mind back to the days of physical hardware. When you wanted to develop and deploy a software service you first had to order servers, have them physically shipped to you, then installed and wired to the network. The process typically took weeks just from the vendor’s side, not to mention the time required to get your own ducks in a row first. As a result you had to buy more servers than you could fully utilise, and buy them earlier than you wanted them, because you could not rely on responding rapidly to changing demand.

Virtualisation revolutionised this cycle by cutting the slow purchasing, shipping and racking steps out of the loop. (These still had to happen, of course, but they no longer had to happen synchronously.) Instead, when you wanted a server you simply put in a request, somebody would create a virtual machine and allocate it to you. The whole process could easily be done in less than a day.

Yet as much as this was a huge leap forward, it was still slower than it needed to be, because there was still a human in the loop. The next step was to make the mechanism directly accessible to the developer—Infrastructure as a Service. That seemingly simple change has a number of immediate consequences, first amongst which is the need for robust multitenancy. This is the key difference between tools like OpenStack Nova and the preceding generation of virtualisation platforms, like oVirt. Transaction costs have dropped to near zero—where before allocating a new box took less than a day and you might do it every few weeks or so, now it takes seconds and you can easily do it 20 times a day without a second thought.

Before we congratulate ourselves too much though, remember that our goal was to remove humans from the loop… but we still have one: the developer (or sysadmin). Being able to adjust your resource utilisation 20 times a day is great, but mostly wasted if you can only do it during the 8 hours that somebody is parked in front of Horizon clicking buttons. For that reason, I don’t regard this use case as a ‘cloud’ at all, even though to hear some people talk you might think that this is the only thing that OpenStack is for. It could more accurately be described as a Virtual Private Server hosting service.

My working definition of a true ‘cloud’ service, then, is one where the application itself can control its own infrastructure. (Where ‘application’ includes not only software running on virtual compute infrastructure but also services built into the cloud itself that effectively form a part of it—a minimal description of such an application is likely a Heat template not a software package.) The developer might do the initial deployment, but from then on the application can manage itself autonomously.

You can actually go even further: if you use continuous deployment then you can eliminate the developer’s direct involvement altogether. There is now a Heat plugin for Jenkins to help you do this. Other options include the Ansible-based Zuul project, developed by the OpenStack Infra team, and the OpenStack Solum project.

Of course clouds of this type have been available for some years. However, the other thing we have learned since the 1990s is that writing your application to depend on a proprietary API now will often lead to wailing and gnashing of teeth later. As cloud services and APIs become part of the application, an Open Source cloud with a panoply of service provider options plus the ability to operate it yourself is your insurance against vendor lock-in. That’s why it is critical that OpenStack succeed, and succeed at delivering more than just Virtual Private Servers. Because there is no bigger transaction cost than having to rewrite your application to move to a better service provider.


A Vision for OpenStack

One of the great things about forcing yourself to write down your thoughts is that it occasionally produces one of those lightbulb moments of clarity, where the jigsaw pieces you have been mentally turning over suddenly all fit together. I had one of those this week while preparing my platform for the OpenStack Technical Committee election.

I want to talk a little about Keystone, the identity management component of OpenStack. Although Keystone supports a database back-end for managing users, the most common way to deploy it in a private cloud is with a read-only LDAP connection to the organisation’s existing identity management system. As a consequence, a ‘user’ in Keystone parlance typically refers to a living, breathing human user with an LDAP entry and an HR file and a 401(k) account.

That should be surprising, because once you have gone to the trouble of building a completely automated system for allocating resources with a well-defined API the very least interesting thing you can do next is to pay a bunch of highly-evolved primates to press its buttons. That is to say, the transformative aspect of a ‘cloud’ is the ability for the applications running in it to interact with and control their own infrastructure. (Autoscaling is the obvious example here, but it is just the tip of an unusually dense iceberg.) I think that deserves to stand alongside multi-tenancy as one of the pillars of cloud computing.

Now when I think back to all the people who have told me they think OpenStack should provide “infrastructure only” I still do not understand their choice of terminology, but I think I finally understand what they mean. I think they mean that applications should not talk back. Like in the good old days.

I think the history of Linux in the server market is instructive here. Today, Linux is the preferred target platform for server applications, but imagine for a moment that this had never come to pass: cast your mind back 15 years to when Steve Ballmer was railing about communists and imagine that .NET had gone on to win the API wars. What would that world look like for Linux? Certainly not a disaster. A great many legacy applications would still have been migrated to Linux from the many proprietary UNIX platforms that proliferated in the 1990s. (Remember AIX? HP/UX? Me neither.) When hardware vendors stopped maintaining their own entire operating systems to focus on adding hardware support to a common open source kernel, everybody benefited (they scaled back an unprofitable line of business, their customers stopped bleeding money, platform vendors still made a healthy profit and the technology advances accrued to the community at large). Arguably, that transition may have funded a lot of the development of Linux over the past 15 years. Yet if that is all that had happened, we could not call it fully successful either.

Real success for open source platforms means applications written against open implementations of open APIs. Moving existing applications over is important, and may provide the bridge funding to accelerate development, but new applications are written every day. Each one written for a proprietary platform instead of an open one represents a cost to society. Linux has come to dominate the server platform, but applications are bigger than a single server now. They need to talk back to the cloud and if OpenStack is to succeed—really succeed—in the long term then it needs to be able to listen.

MicroSoft understands this very well, by the way. The subject of Marxist theory and its similarities to the open source movement usually does not even come up when you launch a Linux VM on their cloud—the goal now is to lock you in to Azure, not .NET. Of course the other proprietary clouds (Amazon, Google) are doing exactly the same.

I am passionate about OpenStack because I think it is our fastest route to making an open source platform the preferred option for the applications of the (near) future. I hope you will join me. We can get started right now.

Having an application interact with the OpenStack APIs is really hard to do at the moment, because there is no way I am going to put the unhashed password that authenticates me to my corporate overlords on an application server connected to the Internet. The first step to fixing this actually already exists: Keystone now supports multiple domains, each with its own backend, so that application ‘user’ accounts in a database can co-exist with real meatspace-based user accounts in LDAP. The Heat project has cobbled together some workarounds that make use of this but they rely on Heat’s privileged position as one of the services deployed by the operator, and other projects do not automatically get the benefit either.

The next obstacle is that the authorisation functionality provided by Keystone is too simplistic: all rules must be predefined by the operator; by default a user does not need any particular role in a tenant to be granted permission for most operations; and, incidentally, user interfaces have no way of determining which operations should be exposed to any given user. We need to put authorisation under user control by allowing users to decide which operations are authorised for an account, including filtering on tenant-specific data. To get this to work properly, every OpenStack service will need to co-operate at least to some extent.

That gets us a long way toward applications talking back to the cloud, but when the cloud itself talks it must do so asynchronously, without sacrificing reliability. Fortunately, the Zaqar team has already developed a reliable, asynchronous, multi-tenant messaging service for OpenStack. We now need to start the work of adopting it.

These are the first critical building blocks on which we can construct a consistent user experience for application developers across projects like Zaqar, Heat, Mistral, Ceilometer, Murano, Congress, and probably others I am forgetting. There is no need to take anything away from other projects or make them harder to deploy. What we will need is consensus on what we are trying to achieve.


Three Flavours of Infrastructure Cloud

A curious notion that seems to be doing the rounds of the OpenStack traps at the moment is the idea that Infrastructure-as-a-Service clouds must by definition be centred around the provisioning of virtual machines. The phrase ‘small, stable core’ keeps popping up in a way that makes it sound like a kind of dog-whistle code for the idea that other kinds of services are a net liability. Some members of the Technical Committee have even got on board and proposed that the development of OpenStack should be reorganised around the layering of services on top of Nova.

Looking at the history of cloud computing reveals this is as a revisionist movement. OpenStack itself was formed as the merger of Nova and the object storage service, Swift. Going back even further, EC2 was the fourth service launched by Amazon Web Services. Clearly at some point we believed that a cloud could mean something other than virtual machines.

Someone told me a few weeks ago that Swift was only useful as an add-on to Nova; a convenience exploited only by the most sophisticated modern web application architectures. This is demonstrably absurd: you can use Swift to serve an entire static website, surely the least sophisticated web application architecture possible (though no less powerful for it). Not to mention all the other potential uses that revolve around storage and not computation, like online backups. Entire companies, including SwiftStack, exist only to provide standalone object storage clouds.

You could in theory tell a similar story for an asynchronous messaging service. Can you imagine an application in which two devices with intermittent network connectivity might want to communicate in a robust way? (Would it help if I said one was in your pocket?) I can, and in case you didn’t get the memo, the ‘Internet of Things’ is the new ‘Cloud’—in the sense of being a poorly-defined umbrella term for a set of loosely-related technologies whose importance stems more from the diversity of applications implemented with them than from any commonality between them. You heard it here first. What you need here is a cloud in the original sense of the term: an amorphous blob that is always available and abstracts away the messier parts of end-to-end communication and storage. A service like Zaqar could be a critical piece of infrastructure for some of these applications. I am not aware of a company which has been successful deploying a service of this type standalone, though there have certainly been attempts (StormMQ springs to mind). Perhaps for a reason, or perhaps they were just ahead of their time.

Of course things get even better when you can start combining these services, especially within the framework of an integrated IaaS platform like OpenStack, where things like Keystone authentication are shared. Have really big messages to send? Drop them into object storage and include a pointer in the message. Want to process a backlog of messages? Fire up some short-lived virtual machines to churn through them. Want tighter control of access to your stored objects? Proxy the request through a custom application running on a Nova server.

Those examples are just the tip of the iceberg of potential use cases that can be solved without even getting into the Nova-centric ones. Obviously the benefits to modern, cloud-native applications of accessing durable, scalable, multi-tenant storage and messaging as services are potentially huge as well.

Nova, Zaqar and Swift are the Peanut Butter, Bacon and Bananas of your OpenStack cloud sandwich: each is delicious on its own, or in any combination. The 300 pound Elvis of cloud will naturally want all three, but expect to see every possible permutation deployed in some organisation. Part of the beauty of open source is that one size does not have to fit all.

Of course providing stable infrastructure to move legacy applications to a shared, self-service model is important, and it is no surprise to see users clamouring for it in this still-early stage of the cloud transition. However if the cloud-native applications of the future are written against proprietary APIs then OpenStack will have failed to achieve its mission. Fortunately, I do not believe those goals are in opposition. In fact, I think they are complementary. We can, and must, do both. Stop the madness and embrace the tastiness.


OpenStack Orchestration Juno Update

As the Juno (2014.2) development cycle ramps up, now is a good time to review the changes we saw in Heat during the preceding Icehouse (2014.1) cycle and have a look at what is coming up next in the pipeline. This update is also available as a webinar that I recorded for the OpenStack Foundation, as are the other PTL updates. The RDO project is collecting a list of written updates like this one.

While absolute statistics are not always particularly relevant, a comparison between the Havana and Icehouse release cycles shows that the Heat project continues to grow rapidly. In fact, Heat was second only to Nova in numbers of commits for the Icehouse release. As well as building contributor depth we are also rotating the PTL position to build leadership depth, so the project is in very healthy shape.

Changes in Icehouse

The biggest change in Icehouse is the addition of software configuration and deployment resource types. These enable template authors to define software configurations separately from the servers on which they are to be deployed. This makes, amongst other things, for much easier re-usability of artifacts. Software deployments can integrate with your existing configuration management tools - in some cases the shims to do so are already available, and we expect to add more during the Juno cycle.

The Heat Orchestration Template format (Hot) is now frozen at version 2013-05-12. Any breaking changes we make to it in future will be accompanied by a bump in the version number, so you can start using the Hot format with confidence that templates should continue to work in the future.

In order to enable that, template formats and the intrinsic functions that they provide are now pluggable. In Icehouse this is effectively limited to different versions of the existing template types, but in future operators will be able to easily deploy arbitrary template format plugins.

Heat now offers custom parameter constraints - for example, you can specify that a parameter must name a valid Glance image - that provide earlier and better error messages to template users. These are also pluggable, so operators can deploy their own, and more will be added in the future.

There are now OpenStack-native resource types for autoscaling, meaning that you can now scale resource types other than AWS::EC2::Instance. In fact, you can scale not just OS::Nova::Server resources, but any type of resource (including provider resources). Eventually there will be a separate API for scaling groups along the lines of these new resource types.

The heat-engine process is now horizontally scalable (though not yet stateless). Each stack is processed by a single engine at a time, but incoming requests can be spread across multiple engines. (The heat-api processes, of course, are stateless and have always been horizontally scalable.)

The API is growing additions to help operators manage a Heat deployment - for example to allow a cloud administrator to get a list of all stacks created by all users in Heat. These improvements will continue into Juno, and will eventually result in a v2 API to tidy up some legacy cruft.

Finally, Heat no longer requires a user to be an administrator in order to create some types of resources. Previously resources like wait conditions required the admin role, because they involved creation of a user with limited access that could authenticate to post data back to Heat. Creating a user requires admin rights, but in Icehouse Heat creates the user itself in a separate domain to avoid this problem.

Juno Roadmap

Software configurations made their debut in Icehouse, and will get more powerful still in Juno. Template authors will be able to specify scripts to handle all of the stages of an application’s life-cycle, including delete, suspend/resume, and update.

Up until now if the creation of a stack or the rollback of an update failed, or if an update failed with rollback disabled, there was nothing further you could do with the stack apart from delete it. In Juno this will finally change - you will be able to recover from a failure by doing another stack update.

There also needs to be a way to cancel a stack update that is still in progress, and we plan to introduce a new API for that.

We are working toward making autoscaling more robust for applications that are not quite stateless (examples include TripleO and Platforms as a Service like OpenShift). The plan is to allow notifications prior to modifying resources to give the application the chance to quiesce the server (this will probably be extended to all resources managed by Heat), and also to allow the application to have a say in which nodes get removed on scaling down.

At the moment, Heat relies very heavily on polling to detect changes in the state of resources (for example, while a Nova server is being built). In Juno, Heat will start listening for notifications to reduce the overhead involved in polling. (Polling is unlikely to go away altogether, but it can be reduced markedly.) In the long term, beyond the Juno horizon, this is leading to continuous monitoring of a stack’s status, but for now we are laying down the foundations.

There will also be other performance improvements, particularly with respect to database access. TripleO relies on Heat and has some audacious goals for deployment sizes, so that is driving performance improvements for all users. We can now profile Heat using the Rally project, so that should help us to identify more bottlenecks.

In Juno, Heat will gain an OpenStack-native Heat stack resource type, and it will be capable of deploying nested stacks in remote regions. That will allow users to deploy multi-region applications using a single tree of nested stacks.

Adopting and abandoning stack resources makes it possible to transition existing applications to and from Heat’s control. These features are actually available already in Icehouse, but they are still fairly rough around the edges; we hope they will be cleaned up for Juno. This is always going to be a fairly risky operation to perform manually, but it provides a viable option for automatic migrations (Trove is one potential user).

Operations Considerations

There are a few changes in the pipeline that OpenStack operators should take note of when planning their future upgrades.

Perhaps the most pressing is version 3 of the Keystone API. Heat increasingly relies on features available only in the v3 API. While there is a v2 shim to allow basic functionality to work without it for now, operators should look to start testing and deploying the v3 API alongside v2 as soon as possible.

Heat has now adopted the released Oslo messaging library for RPC messages (previously it used the Oslo incubator code). This may require some configuration changes, so operators should be aware of it when upgrading to Juno.

Finally, we expect the Heat engine to begin splitting into multiple servers. The first one is likely to be an “observer” process tasked with listening for notifications, but expect more to follow as we distribute the workload more evenly across systems. We expect everything split out from the Heat engine to be horizontally scalable from the beginning.


OpenStack Orchestration and Configuration Management

At the last OpenStack Summit in Hong Kong, I had a chance meeting in the hallway with a prominent Open Source developer, who mentioned that he would only be interested in Heat once it could replace Puppet. I was slightly shocked by that, because it is the stated goal of the Heat team not to compete with configuration management tools—on the principle that a good cloud platform will not dictate which configuration management tool you use, and nor will a good configuration management tool dictate which cloud platform you use. Clearly some better communication of our aims is required.

There is actually one sense in which Heat could be seen to replace configuration management: the case where the configuration on a (virtual) machine never changes, and therefore requires no management. In an ideal world, cloud applications are horizontally scalable and completely stateless so, rather than painstakingly updating the configuration of a particular machine, you simply kill it and replace it with a new one that has the configuration you want. Preferably not in that order. However, I do not see this as a core part of the value that orchestration provides, although orchestration can certainly make the process easier. What enables this approach is the architecture of the application combined with the self-service, on-demand nature of an IaaS cloud.

Take a look at the example templates provided by the Heat project and you will find a lot of ways to spin up WordPress. WordPress makes for a great demo, because you can see the result of the process in a very tangible way. The downside is that it may be misleading people about what Heat is and how it adds value.

It would be easy to imagine that Heat is simply a service for provisioning servers and configuring the software on them, but that is actually the least-interesting part to me. There are many tools that will do that (Puppet, Juju, &c.); what they cannot do is to orchestrate the interactions among all of the OpenStack infrastructure in an application. That part is unique to Heat, and it is what allows you to treat your infrastructure configuration as code in the same way that configuration management allows you to treat your software configuration as code.

Diagram of the solution spaces covered by orchestration and configuration management tools.

I am sometimes asked “Why should I use Heat instead of Puppet?” If you are asking that question then my answer is that you should probably use both. (In fact, Heat is actually a great way to deploy both the Puppet master and any servers under its control.) Heat allows you to manage the configuration of your virtual infrastructure over time, but you still need a strategy for managing the software configuration of your servers over time. It might be that you pre-build golden images and just discard a server when you want to update it, but equally you might want to use a traditional configuration management tool.

With the addition of the Software Deployments feature in the recent Icehouse (2014.1) release, Heat has moved into the software orchestration space. This makes it easier to define and combine software components in a modular way. It also creates a cleaner interface at which to inject parameters obtained from infrastructure components (e.g. the IP address of the database server you need to talk to). That notwithstanding, Heat remains agnostic about where that data goes, with a goal of supporting any configuration management system, including those that have yet to be invented and those that you rolled yourself.

If you would like to hear more about this with an antipodean accent, I will be speaking about it at the OpenStack Summit in Atlanta on Monday, in a talk with Steve Hardy entitled ‘Introduction to OpenStack Orchestration’. I plan to talk about why you should consider using Heat to deploy your applications, and Steve will show you how to get started.

Our colleague Steve Baker will be speaking (also with an antipodean accent) about ‘Application Software Configuration Using Heat’ on Tuesday.


OpenStack and Platforms as a Service

The subject of Platforms as a Service and their long-term relationship with OpenStack has been the subject of much hand-wringing—most of it in the media—over the past month or so. The ongoing expansion of the project has many folks wondering where exactly the dividing line between OpenStack and its surrounding ecosystem will be drawn, and the announcement of the Solum related project has fuelled speculation that the scope will grow to encompass PaaS.

One particular clarification is urgently needed: Solum is not endorsed in any way by the OpenStack project. The process for that to happen is well-defined and requires, amongst other criteria, that the implementation is mature. Solum as announced comprised exactly zero lines of code, since the backers wisely elected to develop in the open from the beginning.

More subtly, my impression (after attending the Solum session at the OpenStack Summit two weeks ago and speaking to many of the folks involved in starting the project) is that Solum is not intended to be a PaaS as such. I have long been on record as saying that a PaaS is one of the few cloud-related technologies that do not belong in OpenStack. My reason is simple: OpenStack should not annoint one platform or class of platforms when there are so many possible platforms. Today’s PaaS systems offer many web application platforms as a service—you can get Ruby web application platforms and Java web application platforms and Python web application platforms… just about any kind of platform you like, so long as it’s a web application platform. That was the obvious first choice for PaaS offerings to target, but there are plenty of niches that could also use their own platforms. For example, our friends (and early adopters of Heat) at XLcloud are building an open source PaaS for high-performance computing applications.

Though Solum is still in the design phase, I expect it to be much less opinionated than a PaaS. Solum, in essence, is the ‘as-a-Service’ part of Platform as a Service. In other words, it aims to provide the building blocks to deliver any platform as a service on top of OpenStack with a consistent API (no doubt based on the Oasis Camp standard). It seems clear to me that, by commoditising the building blocks for a PaaS, this is likely to be a catalyst for many more platforms to be built on OpenStack. I do not think it will damage the ecosystem at all, and clearly neither do a lot of PaaS vendors who are involved with Solum, such as ActiveState (who are prominent contributors to and users of Cloud Foundry) and Red Hat’s OpenShift team.

Assuming that it develops along these lines, if OpenStack were to eventually reject Solum from incubation solely for reasons of scope it would call into question the relevance of OpenStack more than it would the relevance of Solum. Solum’s trajectory toward success or failure will be determined by the strength of its community well in advance of it being in a position to apply for incubation.

Finally, I would like to clarify the relationship between Heat and PaaS. The Heat team have long stated that one of our goals is to provide the best infrastructure orchestration with which to deploy a PaaS. We have no desire for Heat to include PaaS functionality, and we rejected a suggestion to implement Camp in Heat when it was floated at the Havana Design Summit.

One of the development priorities for the Icehouse cycle, the Software Configuration Provider blueprint is actually aimed at feature-parity with a different Oasis standard, Tosca. We are working on it simply because the Heat team went to the Havana Design Summit in Portland and every user we spoke to there asked us to. The proposed features promise to make Heat more useful for deploying enterprise applications, platforms as a service, Hadoop and other complex workloads.


An Introduction to Heat in Frankfurt

It was my privilege to attend the inaugural Frankfurt OpenStack Meetup last night in… well, Frankfurt (am Main, not the other one). It was great to meet a such a diverse set of OpenStack users, from major companies to students and everywhere in between.

I gave a talk entitled ‘OpenStack Orchestration with Heat’, and for those who missed it that link will take you to a handout which covers all of the material:

An introduction to the OpenStack Orchestration project, Heat, and an explanation of how orchestration can help simplify the deployment and management of your cloud application by allowing you to represent infrastructure as code. Some of the major new features in Havana are also covered, along with a preview of development plans for Icehouse.

Thanks are due to the organisers (principally, Frederik Bijlsma), my fellow presenter Rhys Oxenham, and especially everyone who attended and brought such excellent questions. I am confident that this was the first of many productive meetings for this group.


Hadoop on OpenStack

The latest project to be incubated in OpenStack for the Icehouse (2014.1) release cycle is Savanna, which provides MapReduce as a service using Apache Hadoop. Savanna was approved for incubation by the OpenStack Technical Committee in a vote last night.

In what is becoming a recurring theme, much of the discussion centred around potential overlap with other programs—specifically Heat (orchestration) and Trove (database provisioning). The main goal of Savanna should be to provide a MapReduce API, but in order to do so it has to implement a cluster provisioning service as well.

The Savanna team have done a fair amount of work to determine that Hadoop is too complex a workload for Heat to handle at present, but they have not approached the Heat team about closing the gap. That is unfortunate, because we are currently engaged in an effort to extend Heat to more complex workloads, and Hadoop is a canonical example of the kind of thing we would like to support. (It is doubly unfortunate, given that the obstacles cited appear comparatively minor.) This will have to change, because there was universal agreement that Savanna should move to integrating with Heat rather than roll-your-own orchestration.

The final form of any integration with Trove, however, remains unclear. The Savanna team maintain that there is no overlap because Trove provides a database as a service and Hadoop is not a database, but this is too glib for my liking. Trove is essentially a pretty generic provisioning service, and while its user-facing function is to provision databases, that would be a poor excuse for maintaining multiple provisioning implementations in OpenStack. And, while it would be wrong to describe Hadoop as a database per se, it would be fair to say that Hadoop has a database. Trove is already planning a clustering API. In my opinion, the two teams will need to work together to come up with a common implmentation, whether in the form of a common library, a common service or a direct dependency on Trove.

The idea of allowing Savanna to remain part of the wider OpenStack ecosystem without officially adopting it was, of course, considered. Hadoop can be considered part of the Platform rather than the Infrastructure layer, so naturally there was inquiry into whether it makes sense for OpenStack to annoint that particular platform rather than implement a more generic service (though it is by no means clear that the latter is feasible). Leaving aside that Amazon already implments Hadoop in the form of its Elastic MapReduce service, the Hadoop ecosystem is so big and diverse that worrying about locking users in to it is a bit like worrying about OpenStack locking users in to Linux. It does, of course, but there is still a world of choice there.

The final source of differing opinions simply related to timing. Some folks on the committee felt that an integration plan for Heat and/or Trove should be developed prior to accepting Savanna into incubation. Incubation confers integration with the OpenStack infrastructure (instead of StackForge) and Design Summit session slots, both of which would be highly desirable. The Technical Committee’s responsibility is to bless a team and a rough scope, so the issue was whether the latter is sufficiently clear to proceed.

This objection was overcome, as the committee voted to accept Savanna for incubation, albeit by a smaller margin than some previous votes. The team now has their work cut out to integrate with the other OpenStack projects, and nobody should be surprised if Savanna ends up remaining in incubation through the J cycle. Nonetheless, we welcome them to the OpenStack family and look forward to working with them before and during the upcoming Summit to develop the roadmap for integration.


Application Definition with Heat

Steve Hardy (OpenStack Heat PTL) and I gave a talk today about the past, present and future of defining cloud applications with Heat. Since this may be of general interest to the OpenStack community, we are making the handout available for download.

An introduction to Heat templates, how they are used to define the configuration—particularly the software configuration—of an application and future plans for the template format.

OpenStack Icehouse Incubation Roundup

The OpenStack Technical Committee met last night to consider the status of incubated projects for the upcoming Icehouse cycle. As a result, database provisioning will be an official part of the Icehouse (2014.1) release, while message queues and bare-metal server provisioning will be in incubation with a view to becoming official in the as-yet-unnamed J (2014.2) release. (Update 2013-09-25: MapReduce as a service will also be incubated.)

First up was the Marconi project, which was applying for incubation. Marconi is a message queue service that is comparable in scope to Amazon’s SQS. This application was discussed in detail at the meeting last week ahead of the final discussion and vote yesterday. Interestingly, the need for a queue service was readily accepted even by some folks who are on-record as favouring a very limited scope for OpenStack. We certainly have use cases waiting for something like this in Heat already.

Nevertheless, Marconi is in some ways breaking new ground for OpenStack because it is a complete implementation of a message queue in Python with plug-in backends for storage and transport. (In contrast, a project like Nova is effectively just an API to an underlying hypervisor that does the heavy lifting.) Message queues in general are difficult, complex and performance-sensitive pieces of software, so it’s not obvious that we want to commit the OpenStack project to maintaining one. To mitigate this the API has been designed to be flexible enough to allow a different implementation based on AMQP, and plugins using RabbitMQ and Proton are planned for the future. Of course, the success of the API design in this respect remains unproven for now.

Another topic of discussion was that the only production-capable storage plugin currently available is for MongoDB, which is licensed under the AGPL. (An SQLite plugin is used for testing.) Although the Python client library is permissively licensed—meaning the OpenStack Foundation’s obligation to distribute code only under the Apache License 2.0 is not even in question—this does not solve the problem of MongoDB itself being a required component in the system. The biggest drawback I see with the AGPL is that it effectively expands the group of stakeholders who need to understand and be comfortable with the implications of the license from all distributors of the code (hard) to all users of the code (intractable). The committee resolved this issue by indicating that a second storage plugin option would likely be considered a condition of graduation from incubation.

In the final vote, Marconi was accepted into incubation by a comfortable margin (11 in favour, 4 abstentions). I cannot speak for the committee members, but for me the project benefits from the presumption of competence that comes from being developed entirely in the open from the very beginning without anybody in the community coming forward to say that it’s a terrible idea. Anyone else looking to get a project accepted into incubation should take note.

The next thing on the agenda was a review of the Trove project (formerly known as RedDwarf), which was incubated in the Havana cycle, for graduation. Trove is a database provisioning service for both relational and non-relational databases.

One requirement imposed by the committee was that Trove provide an option to use Heat for orchestration instead of direct calls to Nova and Cinder. Progress on this has been good (a working patch is up for review), and Trove appears well-placed to take advantage of this feature in the future.

In future, projects will also be required to provide integration tests for the Tempest suite before graduation. This is a new requirement, though, and is not being strictly enforced retroactively. So while Trove is not yet integrated with Tempest, the existence of continuous integration tests that can be integrated with Tempest during the Icehouse development was considered sufficient.

The committee voted in favour of graduation, and Trove will be an official part of the OpenStack Icehouse release.

The final project to be reviewed was Ironic, the bare-metal provisioning service that was split out of Nova into incubation at the beginning of the Havana cycle. No vote was held, because the development team indicated that the project is not ready to graduate. However, it is still progressing and neither was there any interest in curtailing it. Therefore, Ironic will remain in incubation for the Icehouse cycle.


How Heat Orchestrates your OpenStack Resources

One of the great things about orchestration is that it automatically figures out the operations it needs to perform. So whenever you deploy (or modify) your infrastructure you need not write a script to do so; you simply describe the infrastructure you want and the orchestration engine does the rest. I am often asked about how this works in Heat, so here is a quick run-down.

The first thing to note is that the order in which you define resources in your template is completely ignored. Instead, Heat builds a dependency graph based on the relationships between resources expressed in the template. Dependencies are inferred in three ways.

Firstly, any resource whose body uses Ref to refer to another resource is deemed to be dependent on that resource. So if one resource (say, a Nova server) has a property that refers to another resource (say, a Cinder volume), then those resources will always be created in the correct order.

A similar story applies to Fn::GetAtt. Say you have a database server and a separate application server in your template. You might use {"Fn::GetAtt": ["DBServer", "PrivateIp"]} in the UserData property of the application server to get the IP address of the database server. Of course the IP address is not known until the database server has been created, so this too adds a dependency to ensure things always happen in the right order.

Finally, you can use the DependsOn entry in a resource definition to explicitly add a dependency on another resource.

In the Grizzly (2013.1) version of Heat, resources were created serially. We simply performed a topological sort of the dependency graph and started the resources in that order. In the current development stream and the upcoming Havana (2013.2) release, however, Heat will create resources in parallel. Each resource will be created as soon as all of its dependencies are complete.

No discussion of dependency ordering would be complete without a word on what ‘complete’ means. Heat considers the creation of a resource complete as soon as the underlying OpenStack API reports that it is. So in the case of Nova, for instance, a server resource will be marked complete as soon as Nova has finished building it—i.e. as soon as it starts booting. The stack itself is marked complete as soon all of its constituent resources are.

To allow you to wait for a server to boot and be configured, CloudFormation provides (and Heat supports) a WaitCondition resource type. Calling the cfn-signal script from within the server signals success or failure. You want the timeout period to start when the server starts booting, so you’ll indicate that the wait condition DependsOn the server. If another resource requires software on the server to be configured before it can be created, simply mark that it DependsOn the wait condition.

Putting all of that together, here is a quick example that creates a Nova server and waits for it to boot before marking the stack complete:

    Type: AWS:CloudFormation::WaitConditionHandle

    Type: AWS::EC2::Instance
      InstanceType: m1.small
      ImageId: F18-x86_64-cfntools
          - "\n"
          - - "#!/bin/bash -v"
            - "PATH=${PATH}:/opt/aws/bin"
            - Fn::Join:
              - ""
              - - "cfn-signal -e 0 -r Done '"
                - {"Ref" : "MyWaitHandle"}
                - "'"

    Type: AWS::CloudFormation::WaitCondition
    DependsOn: MyServer
      Handle: {"Ref": "MyWaitHandle"}
      Timeout: 300

Non-Relational Database-as-a-Service in OpenStack

The OpenStack Technical Committee voted last night to expand the scope of the Trove program, which is currently in incubation, to encompass non-relational as well as relational databases.

Trove (formerly known as RedDwarf) is a provisioning service for database servers that abstracts away the administrative parts of running the server, ‘including deployment, configuration, patching, backups, restores, and monitoring’. (In other words, it is comparable to Amazon’s RDS.) It was originally envisaged to encompass both relational and non-relational databases, but the Technical Committee limited the scope to relational databases only when it was accepted for incubation, pending a proof-of-concept to allow them to assess technical impact of supporting both. The minimal size of the resulting Redis implementation made a compelling case not to exclude non-relational databases any longer.

One trade-off to keep in mind when making such a decision is that maximising flexibility for users in the present does not always maximise flexibility for users in the future. A database being relational is not undesirable in itself; rather, trading away that valuable feature allows us to gain a different valuable feature: much easier scaling across multiple machines. This opens up an opportunity for a non-relational Database-as-a-Service that abstracts away the underlying servers entirely. Trove has no interest in being that service, so it is important that the existence of Trove does not eliminate the incentive for anybody else to create it.

Happily, Michael Basnight (the Trove PTL) has agreed to amend Trove’s mission statement to reflect the fact that it is a database provisioning service. In my opinion this offers the best of both worlds—flexibility now and keeping the door open to innovation in the future—and I was happy to support the change in scope.

My hope is that anybody contemplating a data-only non-relational DBaaS API will see this decision as confirmation that there is still a place for such a thing in OpenStack. I would also strongly encourage them to build it using Trove under the hood for server provisioning. I expect that some future Technical Committee would require this, much as Trove was required to use Heat instead of hand-rolled orchestration as a condition of graduation from incubation.

Speaking of which, the Technical Committee is planning to assess Trove in September for possible graduation in time for the Icehouse (2014.1) release cycle.


An Introduction to OpenStack Orchestration

The forthcoming Havana (2013.2) release of OpenStack will mark the debut of orchestration as part of the official release. It arrives in the form of the Heat project, which kicked off in early 2012 and graduated from incubation into OpenStack-proper after the Havana Summit in March this year. Heat is so named because that’s what raises clouds, and it isn’t an acronym for anything, so y’all can give your caps lock keys a well-earned rest now.

The term ‘orchestration’ is often a source of confusion, but you can think of it as configuration management for infrastructure. It’s about symbolically representing various pieces of infrastructure (known as resources) and the relationships between them directly, rather than modelling only the actions you can perform on them. It’s swapping imperative programming for declarative programming; verbs for nouns. When you need to make a change to your infrastructure (including the initial deployment), instead of doing so manually or writing a single-use script to manipulate it, you write a template that describes the end result and hand it off to the orchestration engine to make the required changes.

Heat began life as an attempt to provide compatibility with AWS’s CloudFormation service. That remains an important goal—we want to give users who have invested in creating CloudFormation templates the choice to run them on an open cloud—but Heat has evolved into much more. In addition to the CloudFormation compatibility API and templates, Heat also sports an OpenStack-native Rest API, native representations of OpenStack resources, and a richer template format.

One of the big advantages of orchestration is the ability to treat infrastructure as code and take advantage of all the tools that that entails. You can check the source code for your infrastructure into version control alongside the source code for your software. That is one of the reasons Heat uses Yaml markup for templates rather than JSON (as CloudFormation does) or, worse, XML—it’s easier to read, write and diff. If you want to write software to manipulate templates automatically, that remains equally easy since Yaml syntax is a strict superset of JSON and templates can be translated with full fidelity in both directions.

There are numerous example templates available in the Heat templates repository on GitHub. The simplest examples involve spinning up a Nova server running WordPress and MySQL; others add in more resource types to demonstrate how to integrate them.

With the Havana release coming up in October, now is a great time to start investigating how orchestration can simplify your workflow and make your deployments more repeatable. Check out the getting started guides and let us know on Ask OpenStack if you have any questions.