The business environment has never been more competitive and disruptive than it is today. Businesses need to come to terms with three realities:
Just ask Kodak who has seen the camera business transform from a standalone device to a feature on every mobile phone with new players like Snapfish, Shutterfly, and Chatbooks creating new ways of engaging with markets. If you don’t have a way of continually developing new competitive advantages you will not be relevant for long.
Bank of America is not just a bank, they are a transaction processing company. Exxon Mobil, is not only an oil and gas company, they are a GIS company. With each passing day Walgreens business is more reliant on electronic health records.
Ten years ago if I asked you who the biggest competitive threats were to Fedex names like UPS, and DHL might come to mind. Increasingly Fedex, UPS, and DHL face threats from Uber, Walmart, Amazon, and others who may enter their market of logistics with new ways of reaching customers.
What do businesses need to do given these three realities?
To quote Mark Zuckerberg, they need to “Move fast and be stable”. Moving fast and being stable can be translated to more quickly developing new services that could be scaled to meet fast growing demand if needed but also with an extremely low cost of failure should they not work. In other words, cheap experiments need to be able to become global successes.
The scientists conducting these cheap experiments are software developers. Lines of business naturally turn towards their development teams to request new services at an increasingly faster rate. The problem is, developers can’t obtain those environments fast enough from operations because traditional processes and non-flexible infrastructure and applications stand in the way. It’s no surprise then, according to a 2012 McKinsey and Company study , that software delivery in the enterprises surveyed was 45% over budget, 7% over on time, and has 56% less value than expected when delivered.
This is no secret to businesses and they are looking to new methods and designs to help improve these metrics. In fact:
Businesses are turning to new development and operations processes, new cloud infrastructures, and application methodologies that are conducive to these new processes and infrastructures. Looking at one of the leaders in public cloud, Amazon Web Services, we see they use these same principles and designs to achieve upwards of 10,000 releases per hour (as much as a release every 12 seconds) with a very low outage rate caused by these releases.
At first glance it would appear that enterprises could simply yell “DevOps and Cloud to the Rescue!” and solve their problem of deploying faster on scalable infrastructure, but the reality is far from that. Enterprises have existing assets and investments, and many of these are not going away anytime soon. In fact, the existing systems and processes most likely power the very core of the business and cannot simply be replaced over night nor would they fit the paradigm of moving quickly and experimenting. Gartner coined the term Bi-modal to describe this approach of two modes of delivery for IT – one focused on agility and speed and the other on stability and accuracy.
Gartner has also recognized an approach that enterprises can take that would allow them to maximize the use of their existing assets. In their research “DevOps in the Bimodal Bridge”  they suggest an approach where the patterns and practices of DevOps can be applied to existing assets (mode 1) to make it more agile and efficient.
I have observed this trend and I believe most organizations are trying to address four key problems across their emerging bi-modal world.
In mode-1 they are looking to increase relevance and reduce complexity. In order to increase relevance they need to deliver environments for developers in minutes instead of days or weeks. In order to reduce complexity they need to implement policy driven automation to reduce the need for manual tasks.
In mode-2 they are looking to improve agility and increase scalability. In order to improve agility they need to create more agile development and operations processes and embrace new application architectures that allow for greater rates of change through decreased dependencies. In order to increase scalability they need to implement infrastructure that utilizes an asynchronous design and is entirely API driven in order to change the admin to host ratio from a linear to an exponential model in order to increase scalability.
In order to make these examples more concrete, let’s look at each of them in more detail.
Increasing Relevance by Accelerating Service Delivery
Delivering development and test environments to developers in many enterprises generally starts with either a request to a service management system or a tap on the shoulder of a system administrator. This usually depends on the size of the organization and maturity of the IT department. Either way, once requests fall into a service management system there are often many teams that need to perform tasks to deliver the environment to the developer. These might include virtual infrastructure administrators, systems administrators, and security operations. In larger organizations you could expect to see disaster recovery teams, networking teams, and many others involved in this process too. Again, depending on the maturity of the organization how all of this is coordinated could range from taps on shoulders to passing tickets around in a service management system.
At best each team takes minutes or hours to respond and perform some manual tasks and often the person who requests the service must be asked follow up questions (“Are you sure you need 16GB of RAM?”, “What version of Java do you need for this?”). The result is lots of highly skilled people spending lots of time and very slow delivery of this environment to the developer. Multiply this by the number of developers in an organization and the number of requests for environments and you can understand why traditional IT processes and systems are struggling to maintain relevance.
A solution for this problem is to introduce a service designer into the process (you may be familiar with this from ITIL) that can enable self-service consumption of everything developers need. The designer works with all stakeholders including virtual infrastructure administrators, system administrators, and security operations to obtain requirements. Then, the designer builds the necessary configuration management content and couples it with a service catalog item. By invoking this catalog item the environment can be deployed automatically across any number of providers including virtualization providers, private, or public cloud.
The result of this solution is that all the teams responsible for delivering an environment are now free to do more valuable work (like working with development to design operations processes that work as part of development instead of being bolted on after). It also removes human error from the equation, and most importantly, it delivers the environment in significantly less time. We have seen upwards of a 95 percent improvement in delivery times in many of our customers .
Reducing Complexity by Optimizing IT
Speeding up delivery of environments to developers or end users is a great way to make IT more relevant, but a lot of what IT is spending their time on is the day-to-day management of those environments. If IT is spending so much time on day-to-day tasks how can we expect them to deploy the next generation of scalable and programmable infrastructure or have time to work with development teams during early stages of development to increase agility?
I have found that many virtual infrastructure administrators spend time on several common tasks that should be largely automated through policy.
First, are policies around workload placement. Often one virtual infrastructure cluster will be running hot while another one is completely cold. This leads to operations teams being inundated by calls from the owners of applications running on the hot cluster asking why response times are poor. Automating this balancing through control policies can alleviate this problem and keep virtual infrastructure administrators free to other things.
Next is the ability to quickly move workloads between different infrastructures. This has become increasingly important as organizations looks to adopt scale-out IaaS clouds. Operations leadership realizes if they can identify workloads that do not need to run on (typically) more expensive virtual infrastructure they could save money by moving those workloads to their IaaS private cloud. This migration is typically a manual process and it’s also difficult to even understand what workloads can be moved. By having a systematic and automated way of identifying and migrating workloads enterprises can save time and move workloads quickly to reduce costs.
Yet another issue is ensuring compliance and governance requirements are met, particularly with workloads running on new infrastructures, like an OpenStack based private cloud. Not knowing what users, groups, data, applications, and packages reside on systems running across a heterogeneous mix of infrastructure presents a large risk and operations teams often have the responsibility and obligation of ensuring this risk is minimized. By being able to introspect workloads across platforms operations teams can gain insight into exactly what users, data, and packages are running on systems and leverage the migration capabilities I mentioned previously to make sure systems are running on appropriate providers.
Finally, since IT has often become a broker of public cloud services it’s important that they can account for costs and place workloads on appropriate regions in the public cloud to control costs while also ensuring service levels for end users are maintained. If developers are based in Singapore then we should leverage public cloud infrastructure in that location instead of deploying to a more expensive and more latent public cloud infrastructure in Tokyo.
By implementing policy based automations our customers have seen large improvements in their resource utilization and a reduction in CapEx and OpEx per workload managed .
Improving Agility by Modernizing Development and Operations
With resources now free from handling each and every inbound request for an environment and being confident that those environments are running efficiently and securely on the right providers operations teams can begin to work with development teams to design new processes for their cloud native applications.
These newly designed processes and cross-functional team structure combined with a platforms that supports running the broadest amount of languages and frameworks within microservices based architectures will enable the development and operations teams to achieve higher release frequencies. By utilizing microservices and standardized platforms and configurations these new applications will allow for independent release and scaling of components of the application.
This results in an increased success rate of change, faster cycle time, and the ability to scale specific services independently, making the life of both development and operations teams easier and allowing them to meet the needs of the line of business. We have experience doing this with very large software development organizations .
Scalability with Programmable Infrastructure
As agility of development and operations processes is improved and release frequency increased, so to does the demand for more scalable infrastructure to run those releases on. Operations teams face the challenge of delivering infrastructure that will scale to meet the demand of this ever-growing number of applications. The last thing the head of operations would like to have to explain to the management team of a company is why an extremely successful new application was hitting a wall as to the maximum number of users it could support. This simply can’t happen. Unfortunately, the current infrastructure is not scalable, neither from a financial nor technical standpoint.
One option might be to build out a scale-out infrastructure, perhaps based on OpenStack, the leading open source project for infrastructure-as-a-service. However, the operations team doesn’t want to spend it’s time taking open source code and making it consumable and sustainable for the enterprise. It doesn’t have the resources to test and certify that OpenStack will work with each new piece of hardware it brings in. It also can’t afford to maintain the code base for long periods of time with the resources available. Finally, OpenStack is missing key features that operations needs and they don’t want to develop those in house as well.
What operations really needs is a way to minimize cost and increase scale through the use of commodity hardware and a massively scalable distributed architecture coupled with the enterprise management features required to operate that infrastructure and a stable, tested, certified way of consuming the open source projects that make up that infrastructure. By having this, operations can deploy scale-out infrastructure in multiple locations and still aggregate management functions like chargeback, utilization, governance, and workflows into a single logical location. Many of our customers have found this solution beneficial in reducing cost and ensuring stability at scale .
Introducing Red Hat Cloud Suite
Red Hat Cloud Suite is a family of suites from Red Hat that brings together all the award winning products from Red Hat in a consistent way to solve specific problems. It allows IT to accelerate service delivery and optimize their existing assets while allowing them to build their next generation infrastructure and application platforms to support massive salability and more agile development and operations processes. In other words, it meets them where they are and lays the foundation for where they want to go.
A Different Approach
It should come as no surprise to you that Red Hat is not the only company solving these problems. Red Hat is, however, one of the few companies that can solve all of these problems because of its broad portfolio of technologies and expertise. Most think of Red Hat as having the largest percentage of the paid Linux market share. That is true, but Red Hat has been adding to its portfolio and has grown acquired expertise and industry leading technology from Software Defined Storage  to Mobile Development Platforms . These offerings place Red Hat alone with only Microsoft in terms of depth of capability.
An Important Difference
Along with this depth of expertise and capability comes an approach that sets Red Hat apart. Red Hat is the only vendor that uses an open source development model for all of the solutions it delivers. This is important for customers because the world of cloud infrastructure and applications and DevOps is built entirely on open source software. By having a strict open source only mentality customers can have access to the greatest amount of innovation and be ensured that as technologies change they could adopt them more easily because Red Hat can adopt and deliver these technologies. Two great examples of this are how Red Hat adopted the KVM hypervisor  and embraced and delivered it’s container platform with support for Docker and Kubernetes  – leading open source projects that become popular in a short amount of time. Red Hat is committed to the open source development model, so much so that it even creates communities when it acquires non-open source licensed technologies . Customers should know that when they leverage a solution from Red Hat it is based entirely on open source, leading to greater access to innovation and lower exit costs.
Technical Capabilities are Important Too
While philosophical differences are important for ensuring that the right long term decisions have been made, Red Hat is also at the forefront of innovation in cloud infrastructure, applications, and DevOps tools.
True Hybrid Support
The term hybrid cloud has often been over used and abused, but it is important. Enterprises need to be able to run workloads across the four major deployment models that exist today: physical, virtual, private, and public cloud. Equally as important to the deployment model is the ability to support multiple service models, such infrastructure-as-a-service, platform-as-a-service, or even bare metal, virtual machines running on scale-up virtual infrastructure, and public cloud services. When most vendors claim they support “hybrid” cloud they are typically limited to only managing hybrid deployment models. Red Hat supports both hybrid deployment models and hybrid service models. This is important to both Development and Operations teams. For developers, it means being able to develop on the broadest choice of languages and frameworks. They could use an Oracle database running on bare-metal or virtual machine, JBoss EAP running on virtual machines on OpenStack, combined with Node.js and Ruby running in Containers on OpenShift. They are not constrained to a single service model that doesn’t give them everything they need.
Using Big Data to Optimize IT
Red Hat has been supporting Linux for a long time. In fact, we’ve been supporting Red Hat Enterprise Linux for over 13 years since RHEL AS 2.1’s release in 2002 . There are over 700 Red Hat Certified Engineers in our support organization and they’ve documented over 30,000 solutions while resolving over 1 million technical issues. The Red Hat customer portal has won plenty of awards for helping connect customers searching for resolution to an issue to the right technical solution. With Red Hat Access Insights, Red Hat’s new predictive analytics service, connecting support data to recommendations is going to reach a new level of ease of use. Users can send small amounts of data about their environment back to Red Hat and it will be compared to optimal configurations to find opportunities to improve security, reliability, availability, and performance. This service is already available for Red Hat Enterprise Linux and will soon be available for all the technologies in Red Hat’s portfolio through Red Hat Cloud Suite.
An Easy On-ramp and Consistent Lifecycle
Deploying a private cloud is not an easy task. The list of platforms that need to come together from configuration management, to storage, to infrastructure-as-a-service, to platform-as-a-service is large. Each of these has dependencies on sub-components within each of these platforms. For example, to generate new docker images need secure content and that takes integration between the content management system and the image building services. Literally hundreds of these integrations are needed to build a fully functional private cloud. This usually results in one of two options:
Neither of these options are an optimal results for IT. Red Hat Cloud Suite provides an easy on-ramp that allows a single person in operations to deploy a private cloud and it provides the path for ongoing management of that private cloud. This allows developers to begin using the private cloud more quickly and helps operations deliver a private cloud more quickly.
A Quick Summary
Here is a quick summary for those that just want the cliff notes.
The World is Changing
Red Hat Helps
Only Red Hat Delivers
 DevOps, Open Source, and Business Agility. Lessons Learned from Early Adopters. An IDC InfoBrief, sponsored by Red Hat | June 2015
I’m often asked for a more in-depth overview of Red Hat Cloud Infrastructure (RHCI), Red Hat’s fully open source and integrated Infrastructure-as-a-Service offering. To that end I decided to write a brief technical introduction to RHCI to help those interested better understand what a typical deployment looks like, how the components interact, what Red Hat has been working on to integrate the offering, and some common use cases that RHCI solves. RHCI gives organizations access to infrastructure and management to fit their needs, whether it’s managed datacenter virtualization, a scale-up virtualization-based cloud, or a scale-out OpenStack-based cloud. Organizations can choose what they need to run and re-allocate their resources accordingly.
RHCI users can choose to deploy either Red Hat Enterprise Virtualization (RHEV) or Red Hat Enterprise Linux OpenStack Platform (RHEL-OSP) on physical systems to create a datacenter virtualization-based private cloud using RHEV or a private Infrastructure-as-a-Service cloud with RHELOSP.
RHEV comprises a hypervisor component, referred to as RHEV-H, and a manager, referred to as RHEV-M. Hypervisors leverage shared storage and common networks to provide common enterprise virtualization features such as high availability, live migration, etc.
RHEL-OSP is Red Hat’s OpenStack distribution that provides massively scalable infrastructure by providing the following projects (descriptions taken directly from the projects themselves) for use on one of the largest ecosystems of certified hardware and software vendors for OpenStack:
Nova: Implements services and associated libraries to provide massively scalable, on demand, self service access to compute resources, including bare metal, virtual machines, and containers.
Swift: Provides Object Storage.
Glance: Provides a service where users can upload and discover data assets that are meant to be used with other services, like images for Nova and templates for Heat.
Keystone: Facilitate API client authentication, service discovery, distributed multi-tenant authorization, and auditing.
Horizon: Provide an extensible unified web- based user interface for all integrated OpenStack services.
Neutron: Implements services and associated libraries to provide on-demand, scalable, and technology-agnostic network abstraction.
Cinder: Implements services and libraries to provide on-demand, self-service access to Block Storage resources via abstraction and automation on top of other block storage devices.
Ceilometer: Reliably collects measurements of the utilization of the physical and virtual resources comprising deployed clouds, persist these data for subsequent retrieval and analysis, and trigger actions when defined criteria are met.
Heat: Orchestrates composite cloud applications using a declarative template format through an OpenStack-native ReST API.
Trove: Provides scalable and reliable Cloud Database as a Service functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.
Ironic: Produces an OpenStack service and associated python libraries capable of managing and provisioning physical machines, and to do this in a security-aware and fault-tolerant manner.
Sahara: Provides a scalable data processing stack and associated management interfaces.
Red Hat CloudForms, a Cloud Management Platform based on the upstream ManageIQ project, provides hybrid cloud management of OpenStack, RHEV, Microsoft Hyper-V, VMware vSphere, and Amazon Web Services. This includes the ability to provide rich self-service with workflow and approval, discovery of systems, policy definition, capacity and utilization forecasting, and chargeback among others capabilities. CloudForms is deployed as a virtual appliance and requires no agents on the systems it manages. CloudForms has a region and zone concept that allows for complex and federated deployments across large environments and geographies.
Red Hat Satellite is a systems management solution for managing the lifecycle of RHEV, RHEL-OSP, and CloudForms as well as any tenant workloads that are running on RHEV or RHEL-OSP. It can be deployed on bare metal or, as pictured in this diagram, as a virtual machine running on either RHEV or RHEL-OSP. Satellite supports a federated model through a concept called capsules.
CloudForms is a Cloud Management Platform that is deployed as a virtual appliance and supports a federated deployment. It is fully open source just as every component in RHCI is and is based on the ManageIQ project.
One of the key technical benefits CloudForms provides is unified management of multiple providers. CloudForms splits providers into two types. First, there are infrastructure providers such as RHEV, vSphere, and Microsoft Hyper-V. CloudForms discovers and provides uniform information about these systems hosts, clusters, virtual machines, and virtual machine contents in a single interface. Second, there are cloud providers such as RHEL-OSP and Amazon Web Services. CloudForms provides discovery and uniform information for these providers about virtual machines, images, flavors similar to the infrastructure providers. All this is done by leveraging standard APIs provided from RHEV-M, SCVMM, vCenter, AWS, and OpenStack.
Red Hat Satellite provides common systems management among all aspects of RHCI.
Red Hat Satellite provides content management, allowing users to synchronize content such as RPM packages for RHEV, RHEL-OSP, and CloudForms from Red Hat’s Content Delivery Network, to an on-premises Satellite reducing bandwidth consumption and providing an on-premises control point for content management through complex environments. Satellite also allows for configuration management via Puppet to ensure compliance and enforcement of proper configuration. Finally, Red Hat Satellite allows users to account for usage of assets through entitlement reporting and controls. Satellite provides these capabilities to RHEV, RHEL-OSP, and CloudForms, allowing administrators of RHCI to maintain their environment more effectively and efficiently. Equally as important is that Satellite also extends to the tenants of RHEV and RHEL-OSP to allow for systems management of Red Hat Enterprise Linux (RHEL) based tenants. Satellite is based on the upstream projects of Foreman, Katello, Pulp, and Candlepin.
The combination of CloudForms and Satellite is very powerful for automating not only the infrastructure, but within the operating system as well. Let’s look at an example of how CloudForms can be utilized with Satellite to provide automation of deployment and lifecycle management for tenants.
The automation engine in CloudForms is invoked when a user orders a catalog item from the CloudForms self-service catalog. CloudForms communicates with the appropriate infrastructure provider (in this case RHEV or RHEL-OSP pictured) to ensure that the infrastructure resources are created. At the same time it also ensures the appropriate records are created in Satellite so that the proper content and configuration will be applied to the system. Once the infrastructure resources are created (such as a virtual machine), they are connected to Satellite where they receive the appropriate content and configuration. Once this is completed, the service in CloudForms is updated with the appropriate information to reflect the state of the users request allowing them access to a fully compliant system with no manual interaction during configuration. Ongoing updates of the virtual machine resources can be performed by the end user or the administrator of the Satellite dependent on the customer needs.
This is another way of looking at how the functional areas of the workflow are divided in RHCI. Items such as the service catalog, quota enforcement, approvals, and workflow are handled in CloudForms, the cloud management platform. Even still, infrastructure-specific mechanisms such as heat templates, virtual machine templates, PXE, or even ISO-based deployment are utilized by the cloud management platform whenever possible. Finally, systems management is used to provide further customization within the operating system itself that is not covered by infrastructure specific provisioning systems. With this approach, users can separate operating system configuration from the infrastructure platform thus increasing portability. Likewise, operational decisions are decoupled from the infrastructure platform and placed in the cloud management platform allowing for greater flexibility and increased modularity.
Common management is a big benefit that RHCI brings to organizations, but it doesn’t stop there. RHCI is bringing together the benefits of shared services to reduce the complexity for organizations. Identity is one of the services that can be made common across RHCI through the use of Identity Management (IDM) that is included in RHEL. All components of RHCI can be configured to talk to IDM which in turn can be used to authenticate and authorize users. Alternatively, and perhaps more frequently, a trust is established between IDM and Active Directory to allow for authentication via Active Directory. By providing a common identity store between the components of RHCI, administrators can ensure compliance through the use of access controls and audit.
Similar to the benefits of shared identity, RHCI is bringing together a common network fabric for both traditional datacenter virtualization and infrastructure-as-a-service (IaaS) models. As part of the latest release of RHEV, users can now discover neutron networks and begin exposing them to guest virtual machines (in tech preview mode). By building a common network fabric organizations can simplify their architecture. No longer do they need to learn two different methods for creating and maintaining virtual networks.
Finally, Image storage can now be shared between RHEV and RHEL-OSP. This means that templates and images stored in Glance can be used by RHEV. This reduces the amount of storage required to maintain the images and allows administrators to update images in one store instead of two, increasing operational efficiency.
One often misunderstood area is around what capabilities are provided by which components of RHCI. RHEV and OpenStack provide similar capabilities with different paradigms. These focus around compute, network, and storage virtualization. Many of the capabilities often associated with a private cloud include features found in the combination of Satellite and CloudForms. These include capabilities provided by CloudForms such as discovery, chargeback, monitoring, analytics, quota Enforcement, capacity planning, and governance. They also include capabilities that revolve around managing inside the guest operating system in areas such as content management, software distribution, configuration management, and governance.
Often organizations are not certain about the best way to view OpenStack in relation to their datacenter virtualization solution. There are two common approaches that are considered. Within one approach, datacenter virtualization is placed underneath OpenStack. This approach has several negative aspects. First, it places OpenStack, which is intended for scale out, over an architecture that is designed for scale up in RHEV, vSphere, Hyper-V, etc. This gives organizations limited scalability and, in general, an expensive infrastructure for running a scale out IaaS private cloud. Second, layering OpenStack, a Cloud Infrastructure Platform, on top of yet another infrastructure management solution makes hybrid cloud management very difficult because Cloud Management Platforms, such as CloudForms, are not designed to relate OpenStack to a virtualization manager and then to underlying hypervisors. Conversely, by using a Cloud Management Platform as the aggregator between infrastructure platforms of OpenStack, RHEV, vSphere, and others, it is possible to achieve a working approach to hybrid cloud management and use OpenStack in the massively scalable way it is designed to be used.
RHCI is meant to complement existing investments in datacenter virtualization. For example, users often utilize CloudForms and Satellite to gain efficiencies within their vSphere environment while simultaneously increasing the cloud-like capabilities of their virtualization footprints through self-service and automation. Once users are comfortable with the self-service aspects of CloudForms, it is simple to supplement vSphere with lower cost or specialized virtualization providers like RHEV or Hyper-V.
This can be done by leveraging the virt-v2v tools (shown as option 1 in the diagram above) that perform binary conversion of images in an automated fashion from vSphere to other platforms. Another approach is to standardize environment builds within Satellite (shown as option 2 in the diagram above) to allow for portability during creation of a new workload. Both of these methods are supported based on an organization’s specific requirements.
For scale-out applications running on an existing datacenter virtualization solution such as VMware vSphere RHCI can provide organizations with the tools to identify (discover), and move (automated v2v conversion), workloads to Red Hat Enterprise Linux OpenStack Platform where they can take advantage of massive scalability and reduced infrastructure costs. This again can be done through binary conversion (option 1) using CloudForms or through standardization of environments (option 2) using Red Hat Satellite.
So far I have focused primarily on the integrations between the components of Red Hat Cloud Infrastructure to illustrate how Red Hat is bringing together a comprehensive Infrastructure-as-a-Service solution, but RHCI integrates with many existing technologies within the management domain. From integrations with configuration management solutions such as Puppet, Chef, and Ansible, and many popular Configuration Management Databases (CMDBs) as well networking providers and IPAM systems, CloudForms and Satellite are extremely extensible to ensure that they can fit into existing environments.
And of course, with Red Hat Enterprise Linux forming the basis of both Red Hat Enterprise Virtualization and Red Hat Enterprise Linux OpenStack Platform leading to one of the largest ecosystems of certified compute, network, and storage partners in the industry.
RHCI is a complete and fully open source infrastructure-as-a-service private cloud. It has industry leading integration between a datacenter virtualization and openstack based private cloud in the areas of networking, storage, and identity. A common management framework makes for efficient operations and unparallelled automation that can also span other providers. Finally, by leveraging RHEL and Systems Management and Cloud Management Platform based on upstream communities it has a large ecosystem of hardware and software partners for both infrastructure and management.
I hope this post helped you gain a better understanding of RHCI at a more technical level. Feel free to comment and be sure to follow me on twitter @jameslabocki
OpenStack is a thing of beauty, isn’t it? Just look at all those cleanly defined services, perfectly atomic, able to run standalone … it’s simply amazing. What more could developers and operators ask for in a cloud?
Except, that it’s not exactly like that. All those services heavily rely on each other and given the rate of change OpenStack is experiencing the degree of complexity only stands to increase. The problem is that OpenStack has many services that are dependent on one another and managing the lifecycle is difficult and inefficient because of this.
Let’s look at an example of updating the keystone service, OpenStack’s identity management service. It is difficult to know whether or not deploying a new version of Keystone into an existing OpenStack deployment will cause problems because of compatibility with others services. It’s also difficult to move backwards and expensive to roll back a deployment of a new keystone service with today’s tools. Operators don’t want to use extra racks of hardware to test an upgrade of a service if they can avoid it and no lifecycle management tools that try to imperatively deploy and roll back can do so as reliability as we’d like between OpenStack releases.
At this point you might conclude that I have a personal vendetta against OpenStack. Although this could be justified after the many nights I’ve spent installing, configuring, and upgrading OpenStack I can assure you that’s not the case. In truth, OpenStack is not a beautiful and unique snowflake. Lots of different infrastructure platforms face this same problem and so do many application platforms.
Today, there are many ways to manage the lifecycle of OpenStack services, but the two most prevalent can be loosely grouped into two categories: build based and image based deployments.
Build based lifecycle management uses a build service, such as PXE, and is typically coupled together with a bunch of lifecycle management tools and almost always uses some type of configuration management whether that’s Puppet, Chef, Ansible, or others.
This approach is generally inefficient because each OpenStack service is placed onto a different physical piece of hardware or at least a different operating system.
It is possible to combine multiple services on a single operating system, but this can get tricky. How does the lifecycle management tool know that OpenStack Service A in the image above won’t conflict with OpenStack Service B in terms of resources required, ports required, file systems, etc? It takes an awful lot of logic in a lifecycle management tool to know this and given the rate of change experienced in a community like OpenStack, lifecycle management tools have a hard time keeping up and delivering what users would like to deploy. Could virtual machines be used here? Possibly, but virtual machine are heavyweight and also lack rich metadata or require large infrastructures and agents loaded into those virtual machines to get metadata. In other words, VMs are too heavy and they also lack the concept of inheritance.
Finally, build based deployments can be slow. Copying each package back and forth over the wire is not the most efficient way of deploying at scale.
Image based deployments solve the problem of slow performance that build based systems have by not requiring each package to be installed. Typically an image based system has some sort of image building tool that stores images in a repository and these images are then streamed down to physical hardware.
However, even while using images, incremental updates can be slow due to the large size of images. Also, the expense of pushing a large image around for small incremental updates doesn’t seem appropriate.
Even more importantly, image based deployments don’t solve the fundamental problem of complexity that understanding the relationships between OpenStack services presents. This problem is only moved earlier in the process and must be solved when building the images themselves instead of at run-time.
There is one other consideration that should be taken when looking at building a lifecycle management solution for OpenStack and that is that OpenStack doesn’t live alone. The last thing most operators want is yet another way to manage the lifecycle of a new platform. They’d like something that they can use across platforms from bare metal, to IaaS, and possibly even in a PaaS.
Wouldn’t it be great if there was a solution for managing the lifecycle of Openstack services that was:
That’s exactly what the combination of Docker, Kubernetes, and Atomic can provide to the existing lifecycle management solutions.
Docker provides a level of abstraction for Linux Containers through APIs and an “Engine”. It also provides an image format for sharing that supports a base and child image relationship allowing for layering. Finally, Docker provides a registry for sharing docker images. This is important because it allows developers to ship a portable image that operators can deploy on a different platform.
Kubernetes is an open source container cluster manager. It provides scheduling of Linux Containers using a master/minion construct. It uses a declarative syntax to express desired state. This is important because it allows developers to provide a description of the relationships between different Linux Containers and let’s the cluster manager do the scheduling.
Atomic provides just enough of an operating system to run containers in a secure, stable, and high performance manner. It includes Kubernetes and Docker and allows for users to update using newly developed update mechanisms such as OSTree. Here is a quick video that shows how easy it is to deploy atomic (in this case on OpenStack) and also how easy it is to upgrade Atomic. Watch OGG
So when you put these pieces together what you end up with is something that looks (at a high level) like the diagram above. OpenStack developers are free to develop on a broad choice of platforms (Linux/Vagrant/Libvirt pictured) and can publish completed images to a registry. Operators on the other side would pull the kubernetes configurations into their lifecycle management tools and the tools would launch the pods and services. This would trigger Docker running on Atomic to pull the images locally and deploy containers with the OpenStack services. Services are isolated and (we are fairly certain given our experience with our OpenShift PaaS) lots and lots of containers could be run on a single operating system to maximize density of Openstack services. There are LOTS of other benefits including ease of rollback, deployment and update speed, etc, but this alone should be enough for anyone looking at running an OpenStack cloud at scale to be interested.
Here are several demonstrations that illustrate the scenario above. These are a demonstration of the OpenStack Kolla project and were produced in 2 weeks time by a group of amazing developers who saw the potential these technologies had.
First there is building the images and pushing them to a registry. Watch OGG
Second there is deploying a few pods and services manually to see how they connect and what Kubernetes and Docker are actually doing. Watch OGG
Finally, there is an example of deploying all the OpenStack services that were completed in milestone-1 all with a single command. Watch OGG
After deploying OpenStack countless times I can say that when you see each schema automatically created in MariaDB and endpoints, services, etc automatically created all in under a minute it is an amazing feeling!
In the end, the combination of Docker, Atomic, and Kubernetes show the promise of alleviating some of the pain OpenStack developers and operators have experienced. There are still a lot of unanswered questions, but we feel that this combination of technologies shows promise and are excited that they have found a home in the TripleO project through Kolla.
If you are interested in learning more or participating please:
If you want to learn more about some of the other projects related to this post please check out the following: