ZeroBanana: Software for Humans

Latest news

New essay posted: ‘Senior Engineers are Living in the Future

In which I explain why comparing yourself to the more-senior engineers around you is a mistake: they have a structural advantage of earlier access to information that can be hard to distinguish from prescience.

Limitations of the Layered Model of OpenStack

One model that many people have used for making sense of the multiple services in OpenStack is that of a series of layers, with the ‘compute starter kit’ projects forming the base. Jay Pipes recently wrote what may prove to be the canonical distillation (this post is an edited version of my response):

Nova, Neutron, Cinder, Keystone and Glance are a definitive lower level of an OpenStack deployment. They represent a set of required integrated services that supply the most basic infrastructure for datacenter resource management when deploying OpenStack. Depending on the particular use cases and workloads the OpenStack deployer wishes to promote, an additional layer of services provides workload orchestration and workflow management capabilities.

I am going to explain why this viewpoint is wrong, but first I want to acknowledge what is attractive about it (even to me). It contains a genuinely useful observation that leads to a real insight.

The insight is that whereas the installation instructions for something like Kubernetes usually contain an implicit assumption that you start with a working datacenter, the same is not true for OpenStack. OpenStack is the only open source project concentrating on the gap between a rack full of unconfigured equipment and somewhere that you could run a higher-level service like Kubernetes. We write the bit where the rubber meets the road, and if we do not there is nobody else to do it! There is an almost infinite variety of different applications and they will all need different parts of the higher layers, but ultimately they must be reified in a physical data center and when they are OpenStack will be there: that is the core of what we are building.

It is only the tiniest of leaps from seeing that idea as attractive, useful, and genuinely insightful to believing it is correct. I cannot really blame anybody who made that leap. But an abyss awaits them nonetheless.

Back in the 1960s and early 1970s there was this idea about Artificial Intelligence: even a 2 year old human can (for example) recognise images with a high degree of accuracy, but doing (say) calculus is extremely hard in comparison and takes years of training. But computers can already do calculus! Ergo, we have solved the hardest part already and building the rest out of that will be trivial, AGI is just around the corner, and so on. The popularity of this idea arguably helped created the AI bubble, and the inevitable collision with the reality of its fundamental wrongness led to the AI Winter. Because, in fact, though you can build logic out of many layers of heuristics (as human brains do), it absolutely does not follow that it is trivial to build other things that also require layers of heuristics out of some basic logic building blocks. (In contrast, the AI technology of the present, which is showing more promise, is called Deep Learning because it consists literally of multiple layers of heuristics. It is also still considerably worse at it than any 2 year old human.)

I see the problem with the OpenStack-as-layers model as being analogous. (I am not suggesting there will be a full-on OpenStack Winter, but we are well past the Peak of Inflated Expectations.) With Nova, Keystone, Glance, Neutron, and Cinder you can build a pretty good Virtual Private Server hosting service. But it is a mistake to think that cloud is something you get by layering stuff on top of VPS hosting. It is relatively easy to build a VPS host on top of a cloud, just like teaching someone calculus. But it is enormously difficult to build a cloud on top of a VPS host (it would involve a lot of expensive layers of abstraction, comparable to building artificial neurons in software).

That is all very abstract, so let me bring in a concrete example. Kubernetes is event-driven at a very fundamental level: when a pod or a whole kubelet dies, Kubernetes gets a notification immediately and that prompts it to reschedule the workload. In contrast, Nova/Cinder/&c. are a black hole. You cannot even build a sane dashboard for your VPS—let alone cloud-style orchestration—over them, because it will have to spend all of its time polling the APIs to find out if anything happened. There is an entire separate project, that almost no deployments include, basically dedicated to spelunking in the compute node without Nova’s knowledge to try to surface this information. It is no criticism of the team in question, who are doing something that desperately needs doing in the only way that is really open to them, but the result is an embarrassingly bad architecture for OpenStack as a whole.

So yes, it is sometimes helpful to think about the fact that there is a group of components that own the low level interaction with outside systems (hardware, or IdM in the case of Keystone), and that almost every application will end up touching those directly or indirectly, while each using different subsets of the other functionality… but only in the awareness that those things also need to be built from the ground up as interlocking pieces in a larger puzzle.

Saying that the compute starter kit projects represent a ‘definitive lower level of an OpenStack deployment’ invites the listener to ignore the bigger picture; to imagine that if those lower level services just take care of their own needs then everything else can just build on top. That is a mistake, unless you believe that OpenStack needs only to provide enough building blocks to build VPS hosting out of, because support for all of those higher-level things does not just fall out for free. You have to consciously work at it.

Imagine for a moment that, knowing everything we know now, we had designed OpenStack around a system of event sources and sinks that are reliable in the face of hardware failures and network partitions, with components connecting into it to provide services to the user and to each other. That is what Kubernetes did. That is the key to its success. We need to enable something similar, because OpenStack is still necessary even in a world where Kubernetes exists.

One reason OpenStack is still necessary is the one we started with above: something needs to own the interaction with the underlying physical infrastructure, and the alternatives are all proprietary. Another place where OpenStack can provide value is by being less opinionated and allowing application developers to choose how the event sources and sinks are connected together. That means that users should, for example, be able to customise their own failover behaviour in ‘userspace’ rather than rely on the one-size-fits-all approach of handling everything automatically inside Kubernetes. This is theoretically an advantage of having separate projects instead of a monolithic design—though the fact that the various agents running on a compute node are more tightly bound to their corresponding services than to each other has the potential to offer the worst of both worlds.

All of these thoughts will be used as fodder for writing a technical vision statement for OpenStack. My hope is that will help align our focus as a community so that we can work together in the same direction instead of at cross-purposes. Along the way, we will need many discussions like this one to get to the root of what can be some quite subtle differences in interpretation that nevertheless lead to divergent assumptions. Please join in if you see one happening!


The Expanding OpenStack Foundation

The OpenStack Foundation has begun the process of becoming an umbrella organisation for open source projects adjacent to but outside of OpenStack itself. However, there is no clear roadmap for the transformation, which has resulted in some confusion. After attending the joint leadership meeting with the Foundation Board of Directors and various Forum sessions that included some members of the board at the (2018) OpenStack Summit in Vancouver, I believe I can help shed some light on the situation. (Of course this is my subjective take on the topic, and I am not speaking for the Technical Committee.)

In November 2017, the board authorised the Foundation staff to begin incubation of several ‘Strategic Focus Areas’, including piloting projects that fit in those areas. The three focus areas are Container Infrastructure, Edge Computing Infrastructure, and CI/CD Infrastructure. To date, there have been two pilot projects accepted. Eventually, it is planned for each focus area to have its own Technical Committee (or equivalent governance body), holding equal status with the OpenStack TC—there will be no paramount technical governance body for the whole Foundation.

The first pilot project is Kata Containers, which combines container APIs and container-like performance with VM-level isolation. You will not be shocked to learn that it is part of the Container Infrastructure strategic focus.

The other pilot project, in the CI/CD strategic focus, is Zuul. Zuul will already be familiar to OpenStack developers as the CI system developed by and for the OpenStack project. Its governance is moving from the OpenStack TC to the new Strategic Focus Area, in recognition of its general usefulness as a tool that is not in any way specific to OpenStack development.

Thus far there are no pilot projects in the Edge Computing Infrastructure focus area, but nevertheless there is plenty of work going on—including to figure out what Edge Computing is.

If you attended the Summit then you would have heard about Kata, Zuul and Edge Computing, but this is probably the first time you’ve heard the terms ‘incubate’ or ‘pilot’ associated with them. Nor have the steps that come after incubation or piloting been defined. This has opened the door to confusion, not only about the status of the pilot projects but also that of unofficial projects (outside of either OpenStack-proper or any of the Strategic Focus Areas) that are hosted using on the same infrastructure provided by the Foundation for OpenStack development. It also heralds the return of what I call the October surprise—a half-baked code dump ‘open sourced’ the week before a Summit—which used to be a cottage industry around the OpenStack community until the TC was able to bed in a set of robust processes for accepting new projects.

Starting out without a lot of preconceived ideas about how things would proceed was the right way to begin, but members of the board recognise that now is the time to give the process some structure. I expect to see more work on this in the near future.

There is also a proposed initiative, dubbed Winterscale, to move governance of the foundation’s infrastructure out from under the OpenStack TC, to reflect its new status as a service provider to the OpenStack project, the other Strategic Focus Areas, and unofficial projects.


What are Clouds?

Like many in the community, I am often called upon to explain what OpenStack is to somebody completely unfamiliar with it. Usually this goes one of two ways: they turn out to be familiar enough with cloud computing to quickly grasp it by analogy, or their eyes glaze over at the mention of the words ‘cloud computing’ and no further explanation is sought or offered. When faced with someone who is persistently curious but not an industry insider, you immediately know you’re in trouble.

And so it came to pass that I found myself a couple of years ago wondering how exactly to explain to an economist why cloud computing is a big deal. I think I have actually figured out an answer: cloud computing can be seen as the latest development in a long trend of reducing the transaction costs that prevent us from allocating our resources efficiently.

(A live-action version of this post from the most recent OpenStack Summit in Barcelona is available on video.)

Cast your mind back to the days of physical hardware. When you wanted to develop and deploy a software service you first had to order servers, have them physically shipped to you, then installed and wired to the network. The process typically took weeks just from the vendor’s side, not to mention the time required to get your own ducks in a row first. As a result you had to buy more servers than you could fully utilise, and buy them earlier than you wanted them, because you could not rely on responding rapidly to changing demand.

Virtualisation revolutionised this cycle by cutting the slow purchasing, shipping and racking steps out of the loop. (These still had to happen, of course, but they no longer had to happen synchronously.) Instead, when you wanted a server you simply put in a request, somebody would create a virtual machine and allocate it to you. The whole process could easily be done in less than a day.

Yet as much as this was a huge leap forward, it was still slower than it needed to be, because there was still a human in the loop. The next step was to make the mechanism directly accessible to the developer—Infrastructure as a Service. That seemingly simple change has a number of immediate consequences, first amongst which is the need for robust multitenancy. This is the key difference between tools like OpenStack Nova and the preceding generation of virtualisation platforms, like oVirt. Transaction costs have dropped to near zero—where before allocating a new box took less than a day and you might do it every few weeks or so, now it takes seconds and you can easily do it 20 times a day without a second thought.

Before we congratulate ourselves too much though, remember that our goal was to remove humans from the loop… but we still have one: the developer (or sysadmin). Being able to adjust your resource utilisation 20 times a day is great, but mostly wasted if you can only do it during the 8 hours that somebody is parked in front of Horizon clicking buttons. For that reason, I don’t regard this use case as a ‘cloud’ at all, even though to hear some people talk you might think that this is the only thing that OpenStack is for. It could more accurately be described as a Virtual Private Server hosting service.

My working definition of a true ‘cloud’ service, then, is one where the application itself can control its own infrastructure. (Where ‘application’ includes not only software running on virtual compute infrastructure but also services built into the cloud itself that effectively form a part of it—a minimal description of such an application is likely a Heat template not a software package.) The developer might do the initial deployment, but from then on the application can manage itself autonomously.

You can actually go even further: if you use continuous deployment then you can eliminate the developer’s direct involvement altogether. There is now a Heat plugin for Jenkins to help you do this. Other options include the Ansible-based Zuul project, developed by the OpenStack Infra team, and the OpenStack Solum project.

Of course clouds of this type have been available for some years. However, the other thing we have learned since the 1990s is that writing your application to depend on a proprietary API now will often lead to wailing and gnashing of teeth later. As cloud services and APIs become part of the application, an Open Source cloud with a panoply of service provider options plus the ability to operate it yourself is your insurance against vendor lock-in. That’s why it is critical that OpenStack succeed, and succeed at delivering more than just Virtual Private Servers. Because there is no bigger transaction cost than having to rewrite your application to move to a better service provider.


A Vision for OpenStack

One of the great things about forcing yourself to write down your thoughts is that it occasionally produces one of those lightbulb moments of clarity, where the jigsaw pieces you have been mentally turning over suddenly all fit together. I had one of those this week while preparing my platform for the OpenStack Technical Committee election.

I want to talk a little about Keystone, the identity management component of OpenStack. Although Keystone supports a database back-end for managing users, the most common way to deploy it in a private cloud is with a read-only LDAP connection to the organisation’s existing identity management system. As a consequence, a ‘user’ in Keystone parlance typically refers to a living, breathing human user with an LDAP entry and an HR file and a 401(k) account.

That should be surprising, because once you have gone to the trouble of building a completely automated system for allocating resources with a well-defined API the very least interesting thing you can do next is to pay a bunch of highly-evolved primates to press its buttons. That is to say, the transformative aspect of a ‘cloud’ is the ability for the applications running in it to interact with and control their own infrastructure. (Autoscaling is the obvious example here, but it is just the tip of an unusually dense iceberg.) I think that deserves to stand alongside multi-tenancy as one of the pillars of cloud computing.

Now when I think back to all the people who have told me they think OpenStack should provide “infrastructure only” I still do not understand their choice of terminology, but I think I finally understand what they mean. I think they mean that applications should not talk back. Like in the good old days.

I think the history of Linux in the server market is instructive here. Today, Linux is the preferred target platform for server applications, but imagine for a moment that this had never come to pass: cast your mind back 15 years to when Steve Ballmer was railing about communists and imagine that .NET had gone on to win the API wars. What would that world look like for Linux? Certainly not a disaster. A great many legacy applications would still have been migrated to Linux from the many proprietary UNIX platforms that proliferated in the 1990s. (Remember AIX? HP/UX? Me neither.) When hardware vendors stopped maintaining their own entire operating systems to focus on adding hardware support to a common open source kernel, everybody benefited (they scaled back an unprofitable line of business, their customers stopped bleeding money, platform vendors still made a healthy profit and the technology advances accrued to the community at large). Arguably, that transition may have funded a lot of the development of Linux over the past 15 years. Yet if that is all that had happened, we could not call it fully successful either.

Real success for open source platforms means applications written against open implementations of open APIs. Moving existing applications over is important, and may provide the bridge funding to accelerate development, but new applications are written every day. Each one written for a proprietary platform instead of an open one represents a cost to society. Linux has come to dominate the server platform, but applications are bigger than a single server now. They need to talk back to the cloud and if OpenStack is to succeed—really succeed—in the long term then it needs to be able to listen.

MicroSoft understands this very well, by the way. The subject of Marxist theory and its similarities to the open source movement usually does not even come up when you launch a Linux VM on their cloud—the goal now is to lock you in to Azure, not .NET. Of course the other proprietary clouds (Amazon, Google) are doing exactly the same.

I am passionate about OpenStack because I think it is our fastest route to making an open source platform the preferred option for the applications of the (near) future. I hope you will join me. We can get started right now.

Having an application interact with the OpenStack APIs is really hard to do at the moment, because there is no way I am going to put the unhashed password that authenticates me to my corporate overlords on an application server connected to the Internet. The first step to fixing this actually already exists: Keystone now supports multiple domains, each with its own backend, so that application ‘user’ accounts in a database can co-exist with real meatspace-based user accounts in LDAP. The Heat project has cobbled together some workarounds that make use of this but they rely on Heat’s privileged position as one of the services deployed by the operator, and other projects do not automatically get the benefit either.

The next obstacle is that the authorisation functionality provided by Keystone is too simplistic: all rules must be predefined by the operator; by default a user does not need any particular role in a tenant to be granted permission for most operations; and, incidentally, user interfaces have no way of determining which operations should be exposed to any given user. We need to put authorisation under user control by allowing users to decide which operations are authorised for an account, including filtering on tenant-specific data. To get this to work properly, every OpenStack service will need to co-operate at least to some extent.

That gets us a long way toward applications talking back to the cloud, but when the cloud itself talks it must do so asynchronously, without sacrificing reliability. Fortunately, the Zaqar team has already developed a reliable, asynchronous, multi-tenant messaging service for OpenStack. We now need to start the work of adopting it.

These are the first critical building blocks on which we can construct a consistent user experience for application developers across projects like Zaqar, Heat, Mistral, Ceilometer, Murano, Congress, and probably others I am forgetting. There is no need to take anything away from other projects or make them harder to deploy. What we will need is consensus on what we are trying to achieve.