Auto Scaling OpenShift Enterprise Infrastructure with CloudForms Management Engine

OpenShift Enterprise, Red Hat’s Platform as a Service (PaaS), handles the management of application stacks so developers can focus on writing code. The result is faster delivery of services to organizations. OpenShift Enterprise runs on infrastructure, and that infrastructure needs to be both provisioned and managed. While provisioning OpenShift Enterprise is relatively straightforward, managing the lifecycle of the OpenShift Enterprise deployment requires the same considerations as other enterprise applications such as updates and configuration management. Moreover, while OpenShift Enterprise can scale applications running within the PaaS based on demand the OpenShift Enterprise infrastructure itself is static and unaware of the underlying infrastructure. This is by design, as the mission of the PaaS is to automate the management of application stacks and it would limit flexibility to tightly couple the PaaS with the compute resources at both the physical and virtual layer. While this architectural decision is justified given the wide array of computing platforms that OpenShift Enterprise can be deployed upon (any that Red Hat Enterprise Linux can run upon) many organizations would like to not only dynamically scale their applications running in the PaaS, but dynamically scale the infrastructure supporting the PaaS itself. Organizations that are interested in scaling infrastructure in support of OpenShift Enterprise need not look further then CloudForms, Red Hat’s Open Hybrid Cloud Management Framework. CloudForms provides the capabilities to provision, manage, and scale OpenShift Enterprise’s infrastructure automatically based on policy.

For reference, the two previous posts I authored covered deploying the OpenShift Enterprise Infrastructure via CloudForms and deploying OpenShift Enterprise Applications (along with IaaS elements such as Virtual Machines) via CloudForms. Below are two screenshots of what this looks like for background.

Operations User Deploying OpenShift Enterprise Infrastructure via CloudForms

Self-Service User Deploying OpenShift Application via CloudForms

Let’s examine how these two automations can be combined to provide auto scaling of infrastructure to meet the demands of a PaaS. Today, most IT organizations monitor applications and respond to notifications after the event has already taken place – particularly when it comes to demand upon a particular application or service. There are a number of reasons for this approach, one of which is a result of the historical “build to spec” systems that existed in historical and currently designed application architectures. As organizations transition to developing new applications on a PaaS, however, they are presented with an opportunity to reevaluate the static and often oversubscribed nature of their IT infrastructure. In short, while applications designed in the past were not [often] built to scale dynamically based on demand, the majority of new applications are, and this trend is accelerating. Inline with this accelerating trend the infrastructure underlying these new expectations must support this new requirement or much of the business value of dynamic scalability will not be realized. You could say that an organizations dynamic scalability is bounded by their least scalable layer. This also holds true for organizations that intend to run solely on a public cloud and will leverage any resources at the IaaS layer.

Here is an example of how scalability of a PaaS would currently be handled in many IT organizations.

The operations user is alerted by a monitoring tool that the PaaS has run out of capacity to host new or scale existing applications.

The operations user utilizes the IaaS manager to provision new resources (Virtual Machines) for the PaaS.

The operations user manually configures the new resources for consumption by the PaaS.

Utilizing CloudForms to deploy manage, and automatically scale OpenShift Enterprise alleviates the risk of manual configuration from the operations user while dynamically reclaiming unused capacity within the infrastructure. It also reduces the cost and complexity of maintaining a separate monitoring solution and IaaS manager. This translates to lower costs, greater uptime, and the ability to serve more end users. Here is how the process changes.

By notification from the PaaS platform or in monitoring the infrastructure for specific conditions CloudForms detects that the PaaS Infrastructure is reaching its capacity. Thresholds can be defined by a wide array of metrics already available within CloudForms, such as aggregate memory utilized, disk usage, or CPU utilization.

CloudForms examines conditions defined by the organization to determine whether or not the PaaS should receive more resources. In this case, it allows the PaaS to have more resources and provisions a new virtual machine to act as an OpenShift Node. At this point CloudForms could require approval of the scaling event before moving forward. The operations user or a third party system can receive an alert or event, but this is informational and not a request for the admin to perform any manual actions.

Upon deploying the new virtual machine CloudForms configures it appropriately. This could mean installing the VM from a provisioning system or utilizing a pre-defined template and registering it to a configuration management system such as one based on puppet or chef that configure the system.

Want to see a prototype in action? Check out the screencast I’ve recorded.

This same problem (the ability to dynamically scale a platform) exists between the IaaS and physical layer. If the IaaS layer runs out of resources it is often not aware of the physical resources available for it to consume. This problem is not found in a large number of organizations because dynamically re-purposing physical hardware has a smaller and perhaps more specialized set of use cases (think HPC, grid, deterministic workloads). Even though this is the case it should be noted that CloudForms is able to provide a similar level of policy based automation to physical hardware to extend the capacity of the IaaS layer if required.

all things open

having fun with open source