News Archive

OpenStack

OpenStack and Platforms as a Service

The subject of Platforms as a Service and their long-term relationship with OpenStack has been the subject of much hand-wringing—most of it in the media—over the past month or so. The ongoing expansion of the project has many folks wondering where exactly the dividing line between OpenStack and its surrounding ecosystem will be drawn, and the announcement of the Solum related project has fuelled speculation that the scope will grow to encompass PaaS.

One particular clarification is urgently needed: Solum is not endorsed in any way by the OpenStack project. The process for that to happen is well-defined and requires, amongst other criteria, that the implementation is mature. Solum as announced comprised exactly zero lines of code, since the backers wisely elected to develop in the open from the beginning.

More subtly, my impression (after attending the Solum session at the OpenStack Summit two weeks ago and speaking to many of the folks involved in starting the project) is that Solum is not intended to be a PaaS as such. I have long been on record as saying that a PaaS is one of the few cloud-related technologies that do not belong in OpenStack. My reason is simple: OpenStack should not annoint one platform or class of platforms when there are so many possible platforms. Today’s PaaS systems offer many web application platforms as a service—you can get Ruby web application platforms and Java web application platforms and Python web application platforms… just about any kind of platform you like, so long as it’s a web application platform. That was the obvious first choice for PaaS offerings to target, but there are plenty of niches that could also use their own platforms. For example, our friends (and early adopters of Heat) at XLcloud are building an open source PaaS for high-performance computing applications.

Though Solum is still in the design phase, I expect it to be much less opinionated than a PaaS. Solum, in essence, is the ‘as-a-Service’ part of Platform as a Service. In other words, it aims to provide the building blocks to deliver any platform as a service on top of OpenStack with a consistent API (no doubt based on the Oasis Camp standard). It seems clear to me that, by commoditising the building blocks for a PaaS, this is likely to be a catalyst for many more platforms to be built on OpenStack. I do not think it will damage the ecosystem at all, and clearly neither do a lot of PaaS vendors who are involved with Solum, such as ActiveState (who are prominent contributors to and users of Cloud Foundry) and Red Hat’s OpenShift team.

Assuming that it develops along these lines, if OpenStack were to eventually reject Solum from incubation solely for reasons of scope it would call into question the relevance of OpenStack more than it would the relevance of Solum. Solum’s trajectory toward success or failure will be determined by the strength of its community well in advance of it being in a position to apply for incubation.


Finally, I would like to clarify the relationship between Heat and PaaS. The Heat team have long stated that one of our goals is to provide the best infrastructure orchestration with which to deploy a PaaS. We have no desire for Heat to include PaaS functionality, and we rejected a suggestion to implement Camp in Heat when it was floated at the Havana Design Summit.

One of the development priorities for the Icehouse cycle, the Software Configuration Provider blueprint is actually aimed at feature-parity with a different Oasis standard, Tosca. We are working on it simply because the Heat team went to the Havana Design Summit in Portland and every user we spoke to there asked us to. The proposed features promise to make Heat more useful for deploying enterprise applications, platforms as a service, Hadoop and other complex workloads.

Tags:

An Introduction to Heat in Frankfurt

It was my privilege to attend the inaugural Frankfurt OpenStack Meetup last night in… well, Frankfurt (am Main, not the other one). It was great to meet a such a diverse set of OpenStack users, from major companies to students and everywhere in between.

I gave a talk entitled ‘OpenStack Orchestration with Heat’, and for those who missed it that link will take you to a handout which covers all of the material:

An introduction to the OpenStack Orchestration project, Heat, and an explanation of how orchestration can help simplify the deployment and management of your cloud application by allowing you to represent infrastructure as code. Some of the major new features in Havana are also covered, along with a preview of development plans for Icehouse.

Thanks are due to the organisers (principally, Frederik Bijlsma), my fellow presenter Rhys Oxenham, and especially everyone who attended and brought such excellent questions. I am confident that this was the first of many productive meetings for this group.

Tags:

Hadoop on OpenStack

The latest project to be incubated in OpenStack for the Icehouse (2014.1) release cycle is Savanna, which provides MapReduce as a service using Apache Hadoop. Savanna was approved for incubation by the OpenStack Technical Committee in a vote last night.

In what is becoming a recurring theme, much of the discussion centred around potential overlap with other programs—specifically Heat (orchestration) and Trove (database provisioning). The main goal of Savanna should be to provide a MapReduce API, but in order to do so it has to implement a cluster provisioning service as well.

The Savanna team have done a fair amount of work to determine that Hadoop is too complex a workload for Heat to handle at present, but they have not approached the Heat team about closing the gap. That is unfortunate, because we are currently engaged in an effort to extend Heat to more complex workloads, and Hadoop is a canonical example of the kind of thing we would like to support. (It is doubly unfortunate, given that the obstacles cited appear comparatively minor.) This will have to change, because there was universal agreement that Savanna should move to integrating with Heat rather than roll-your-own orchestration.

The final form of any integration with Trove, however, remains unclear. The Savanna team maintain that there is no overlap because Trove provides a database as a service and Hadoop is not a database, but this is too glib for my liking. Trove is essentially a pretty generic provisioning service, and while its user-facing function is to provision databases, that would be a poor excuse for maintaining multiple provisioning implementations in OpenStack. And, while it would be wrong to describe Hadoop as a database per se, it would be fair to say that Hadoop has a database. Trove is already planning a clustering API. In my opinion, the two teams will need to work together to come up with a common implmentation, whether in the form of a common library, a common service or a direct dependency on Trove.

The idea of allowing Savanna to remain part of the wider OpenStack ecosystem without officially adopting it was, of course, considered. Hadoop can be considered part of the Platform rather than the Infrastructure layer, so naturally there was inquiry into whether it makes sense for OpenStack to annoint that particular platform rather than implement a more generic service (though it is by no means clear that the latter is feasible). Leaving aside that Amazon already implments Hadoop in the form of its Elastic MapReduce service, the Hadoop ecosystem is so big and diverse that worrying about locking users in to it is a bit like worrying about OpenStack locking users in to Linux. It does, of course, but there is still a world of choice there.

The final source of differing opinions simply related to timing. Some folks on the committee felt that an integration plan for Heat and/or Trove should be developed prior to accepting Savanna into incubation. Incubation confers integration with the OpenStack infrastructure (instead of StackForge) and Design Summit session slots, both of which would be highly desirable. The Technical Committee’s responsibility is to bless a team and a rough scope, so the issue was whether the latter is sufficiently clear to proceed.

This objection was overcome, as the committee voted to accept Savanna for incubation, albeit by a smaller margin than some previous votes. The team now has their work cut out to integrate with the other OpenStack projects, and nobody should be surprised if Savanna ends up remaining in incubation through the J cycle. Nonetheless, we welcome them to the OpenStack family and look forward to working with them before and during the upcoming Summit to develop the roadmap for integration.

Tags:

Application Definition with Heat

Steve Hardy (OpenStack Heat PTL) and I gave a talk today about the past, present and future of defining cloud applications with Heat. Since this may be of general interest to the OpenStack community, we are making the handout available for download.

An introduction to Heat templates, how they are used to define the configuration—particularly the software configuration—of an application and future plans for the template format.


OpenStack Icehouse Incubation Roundup

The OpenStack Technical Committee met last night to consider the status of incubated projects for the upcoming Icehouse cycle. As a result, database provisioning will be an official part of the Icehouse (2014.1) release, while message queues and bare-metal server provisioning will be in incubation with a view to becoming official in the as-yet-unnamed J (2014.2) release. (Update 2013-09-25: MapReduce as a service will also be incubated.)


First up was the Marconi project, which was applying for incubation. Marconi is a message queue service that is comparable in scope to Amazon’s SQS. This application was discussed in detail at the meeting last week ahead of the final discussion and vote yesterday. Interestingly, the need for a queue service was readily accepted even by some folks who are on-record as favouring a very limited scope for OpenStack. We certainly have use cases waiting for something like this in Heat already.

Nevertheless, Marconi is in some ways breaking new ground for OpenStack because it is a complete implementation of a message queue in Python with plug-in backends for storage and transport. (In contrast, a project like Nova is effectively just an API to an underlying hypervisor that does the heavy lifting.) Message queues in general are difficult, complex and performance-sensitive pieces of software, so it’s not obvious that we want to commit the OpenStack project to maintaining one. To mitigate this the API has been designed to be flexible enough to allow a different implementation based on AMQP, and plugins using RabbitMQ and Proton are planned for the future. Of course, the success of the API design in this respect remains unproven for now.

Another topic of discussion was that the only production-capable storage plugin currently available is for MongoDB, which is licensed under the AGPL. (An SQLite plugin is used for testing.) Although the Python client library is permissively licensed—meaning the OpenStack Foundation’s obligation to distribute code only under the Apache License 2.0 is not even in question—this does not solve the problem of MongoDB itself being a required component in the system. The biggest drawback I see with the AGPL is that it effectively expands the group of stakeholders who need to understand and be comfortable with the implications of the license from all distributors of the code (hard) to all users of the code (intractable). The committee resolved this issue by indicating that a second storage plugin option would likely be considered a condition of graduation from incubation.

In the final vote, Marconi was accepted into incubation by a comfortable margin (11 in favour, 4 abstentions). I cannot speak for the committee members, but for me the project benefits from the presumption of competence that comes from being developed entirely in the open from the very beginning without anybody in the community coming forward to say that it’s a terrible idea. Anyone else looking to get a project accepted into incubation should take note.


The next thing on the agenda was a review of the Trove project (formerly known as RedDwarf), which was incubated in the Havana cycle, for graduation. Trove is a database provisioning service for both relational and non-relational databases.

One requirement imposed by the committee was that Trove provide an option to use Heat for orchestration instead of direct calls to Nova and Cinder. Progress on this has been good (a working patch is up for review), and Trove appears well-placed to take advantage of this feature in the future.

In future, projects will also be required to provide integration tests for the Tempest suite before graduation. This is a new requirement, though, and is not being strictly enforced retroactively. So while Trove is not yet integrated with Tempest, the existence of continuous integration tests that can be integrated with Tempest during the Icehouse development was considered sufficient.

The committee voted in favour of graduation, and Trove will be an official part of the OpenStack Icehouse release.


The final project to be reviewed was Ironic, the bare-metal provisioning service that was split out of Nova into incubation at the beginning of the Havana cycle. No vote was held, because the development team indicated that the project is not ready to graduate. However, it is still progressing and neither was there any interest in curtailing it. Therefore, Ironic will remain in incubation for the Icehouse cycle.

Tags:

How Heat Orchestrates your OpenStack Resources

One of the great things about orchestration is that it automatically figures out the operations it needs to perform. So whenever you deploy (or modify) your infrastructure you need not write a script to do so; you simply describe the infrastructure you want and the orchestration engine does the rest. I am often asked about how this works in Heat, so here is a quick run-down.

The first thing to note is that the order in which you define resources in your template is completely ignored. Instead, Heat builds a dependency graph based on the relationships between resources expressed in the template. Dependencies are inferred in three ways.

Firstly, any resource whose body uses Ref to refer to another resource is deemed to be dependent on that resource. So if one resource (say, a Nova server) has a property that refers to another resource (say, a Cinder volume), then those resources will always be created in the correct order.

A similar story applies to Fn::GetAtt. Say you have a database server and a separate application server in your template. You might use {"Fn::GetAtt": ["DBServer", "PrivateIp"]} in the UserData property of the application server to get the IP address of the database server. Of course the IP address is not known until the database server has been created, so this too adds a dependency to ensure things always happen in the right order.

Finally, you can use the DependsOn entry in a resource definition to explicitly add a dependency on another resource.

In the Grizzly (2013.1) version of Heat, resources were created serially. We simply performed a topological sort of the dependency graph and started the resources in that order. In the current development stream and the upcoming Havana (2013.2) release, however, Heat will create resources in parallel. Each resource will be created as soon as all of its dependencies are complete.

No discussion of dependency ordering would be complete without a word on what ‘complete’ means. Heat considers the creation of a resource complete as soon as the underlying OpenStack API reports that it is. So in the case of Nova, for instance, a server resource will be marked complete as soon as Nova has finished building it—i.e. as soon as it starts booting. The stack itself is marked complete as soon all of its constituent resources are.

To allow you to wait for a server to boot and be configured, CloudFormation provides (and Heat supports) a WaitCondition resource type. Calling the cfn-signal script from within the server signals success or failure. You want the timeout period to start when the server starts booting, so you’ll indicate that the wait condition DependsOn the server. If another resource requires software on the server to be configured before it can be created, simply mark that it DependsOn the wait condition.

Putting all of that together, here is a quick example that creates a Nova server and waits for it to boot before marking the stack complete:

Resources:
  MyWaitHandle:
    Type: AWS:CloudFormation::WaitConditionHandle

  MyServer:
    Type: AWS::EC2::Instance
    Properties:
      InstanceType: m1.small
      ImageId: F18-x86_64-cfntools
      UserData:
        Fn::Base64:
          Fn::Join:
          - "\n"
          - - "#!/bin/bash -v"
            - "PATH=${PATH}:/opt/aws/bin"
            - Fn::Join:
              - ""
              - - "cfn-signal -e 0 -r Done '"
                - {"Ref" : "MyWaitHandle"}
                - "'"

  MyWaitCondition:
    Type: AWS::CloudFormation::WaitCondition
    DependsOn: MyServer
    Properties:
      Handle: {"Ref": "MyWaitHandle"}
      Timeout: 300
Tags:

Non-Relational Database-as-a-Service in OpenStack

The OpenStack Technical Committee voted last night to expand the scope of the Trove program, which is currently in incubation, to encompass non-relational as well as relational databases.

Trove (formerly known as RedDwarf) is a provisioning service for database servers that abstracts away the administrative parts of running the server, ‘including deployment, configuration, patching, backups, restores, and monitoring’. (In other words, it is comparable to Amazon’s RDS.) It was originally envisaged to encompass both relational and non-relational databases, but the Technical Committee limited the scope to relational databases only when it was accepted for incubation, pending a proof-of-concept to allow them to assess technical impact of supporting both. The minimal size of the resulting Redis implementation made a compelling case not to exclude non-relational databases any longer.

One trade-off to keep in mind when making such a decision is that maximising flexibility for users in the present does not always maximise flexibility for users in the future. A database being relational is not undesirable in itself; rather, trading away that valuable feature allows us to gain a different valuable feature: much easier scaling across multiple machines. This opens up an opportunity for a non-relational Database-as-a-Service that abstracts away the underlying servers entirely. Trove has no interest in being that service, so it is important that the existence of Trove does not eliminate the incentive for anybody else to create it.

Happily, Michael Basnight (the Trove PTL) has agreed to amend Trove’s mission statement to reflect the fact that it is a database provisioning service. In my opinion this offers the best of both worlds—flexibility now and keeping the door open to innovation in the future—and I was happy to support the change in scope.

My hope is that anybody contemplating a data-only non-relational DBaaS API will see this decision as confirmation that there is still a place for such a thing in OpenStack. I would also strongly encourage them to build it using Trove under the hood for server provisioning. I expect that some future Technical Committee would require this, much as Trove was required to use Heat instead of hand-rolled orchestration as a condition of graduation from incubation.

Speaking of which, the Technical Committee is planning to assess Trove in September for possible graduation in time for the Icehouse (2014.1) release cycle.

Tags:

An Introduction to OpenStack Orchestration

The forthcoming Havana (2013.2) release of OpenStack will mark the debut of orchestration as part of the official release. It arrives in the form of the Heat project, which kicked off in early 2012 and graduated from incubation into OpenStack-proper after the Havana Summit in March this year. Heat is so named because that’s what raises clouds, and it isn’t an acronym for anything, so y’all can give your caps lock keys a well-earned rest now.

The term ‘orchestration’ is often a source of confusion, but you can think of it as configuration management for infrastructure. It’s about symbolically representing various pieces of infrastructure (known as resources) and the relationships between them directly, rather than modelling only the actions you can perform on them. It’s swapping imperative programming for declarative programming; verbs for nouns. When you need to make a change to your infrastructure (including the initial deployment), instead of doing so manually or writing a single-use script to manipulate it, you write a template that describes the end result and hand it off to the orchestration engine to make the required changes.

Heat began life as an attempt to provide compatibility with AWS’s CloudFormation service. That remains an important goal—we want to give users who have invested in creating CloudFormation templates the choice to run them on an open cloud—but Heat has evolved into much more. In addition to the CloudFormation compatibility API and templates, Heat also sports an OpenStack-native Rest API, native representations of OpenStack resources, and a richer template format.

One of the big advantages of orchestration is the ability to treat infrastructure as code and take advantage of all the tools that that entails. You can check the source code for your infrastructure into version control alongside the source code for your software. That is one of the reasons Heat uses Yaml markup for templates rather than JSON (as CloudFormation does) or, worse, XML—it’s easier to read, write and diff. If you want to write software to manipulate templates automatically, that remains equally easy since Yaml syntax is a strict superset of JSON and templates can be translated with full fidelity in both directions.

There are numerous example templates available in the Heat templates repository on GitHub. The simplest examples involve spinning up a Nova server running WordPress and MySQL; others add in more resource types to demonstrate how to integrate them.

With the Havana release coming up in October, now is a great time to start investigating how orchestration can simplify your workflow and make your deployments more repeatable. Check out the getting started guides and let us know on Ask OpenStack if you have any questions.