Reduce EC2 Instance launch time using Warm Pools

Use Warm Pools to save cost and reduce launch time for instances in EC2 Auto Scaling group. Read about how to configure a warm pool, its benefits, and pitfalls.

Reduce EC2 Instance launch time using Warm Pools

One of the biggest selling points of the cloud is the fact that it is elastic. That means you can horizontally scale applications when the load increases, and reduce the number of instances during periods of low utilization. This works very well for a lot of applications, especially stateless applications that can be easily initialized and terminated. However, there are applications with a longer boot time. For instance, consider an application with a co-located database, that requires downloading multiple GB of data before startup. Or applications running on Windows-based instances that inherently have a much longer instance provisioning time. Because of the longer initialization times, these applications have traditionally not been suited for horizontal scaling. They end up being scaled for the peak traffic. This means you are paying a lot more for the compute during periods of low utilization.‌

Turns out this is not a new problem. It has been solved by pre-initializing instances with the data / static setup beforehand and turning off the instances. The instance is spun back up when more instances are needed. This set of stopped instances is known as a Warm Pool. AWS recently launched native support for Warm Pools with EC2 Auto Scaling. I will talk about this feature, its benefits, and its pitfalls in this blog.

What are Warm Pools?

The warm pool is a feature in EC2 Auto Scaling that allows for a near-constant launch latency for new instances. ASG maintains a set of pre-initialized instances that do not contribute towards the group's desired capacity. These instances can be quickly added to the group when more instances are needed. The heavy initialization of the data / other resources is done beforehand, so to get these servers to start serving traffic is very quick

You can put the warm pool instances in Stopped, Running, or Hibernated state. Instances in Stopped state are not billed, so you are saving the cost of the instance. Running state instances do not contribute towards the group's capacity - the reasons why you may want to do this are discussed below. The last option is to keep instances in Hibernated state - the warm pool will also save the memory to disk before stopping.

How much does it cost?

As with most AWS services, you only pay for what you use. There is no overhead for configuring this with an Auto Scaling Group. You are however paying for the resources that are running. This includes:

  • EBS volumes
  • Elastic IPs (for running instances)
  • Compute cost during initial warmup.

Overall if compute is the biggest part of your spending, you can save a lot of it without risking running out of capacity using Warm pools.

Why have Running Instances in Warm Pool

This is a mind-boggling concept. Why would you want to have instances Running but not serving traffic - you are already paying for them, right? Turns out this can be more useful than you think. Here are some of the use cases where it may be helpful:

  • Running time-insensitive batch jobs. The instances are ready to start serving applications as soon as the traffic increases. In the mean time, they are running processing jobs.
  • Use of reserved instances. If you are using Reserved instance or ODCR, you are already paying for the instances even if they are in the stopped state.
  • Stepping stone towards using Stopped instances. You may have an existing complex setup and are not fully comfortable when letting Auto Scaling manage the warm pool completely. Or you may want to have the cache updated regularly. In this case, you can put the instances in Running state by not serving traffic, and migrated to Stopping them when you are comfortable enough
  • Avoid launch failures - I talk about this in the next section, but launch from warm to live pool may not always succeed. You may want to keep instances running to avoid launch failures.

What to keep in mind when using Warm Pools?

As with everything in life, Warm Pools is not a silver bullet. It comes with its set of pitfalls that you should keep in mind when using this feature

  • You are not guaranteed a launch for warm instances. There are lots of reasons why launch could fail:
    • Instance type not being available, resulting in ICE. In this case, your EBS volume will be terminated, and ASG will retry the launch in a different AZ.
    • Not enough IPs available in the subnet. Again the EBS volume will be terminated and ASG will retry launching in a different AZ.
    • AZ outages. Meaning the instances are not being launched at all in a given AZ, and launches will be targetted to a different AZ.
  • It is still not possible to "re-warm" instances. If your application is a cache that gets updated frequently, you may need to update a large portion of your warmed cache on startup, resulting in longer boot times.
  • Warm pools cannot be used with groups with multiple instance types and multiple instance market options. This means you cannot avail the cost savings you can get with spot instances.

Configuring Warm Pools

The configuration of warm pools is very straightforward. EC2 Auto Scaling has a new put-warm-pool API, that takes in the group name and pool configuration. The parameters are described in the official documentation, but as a quick overview:

  • max-group-prepared-capacity: Max number of instances in the warm pool
  • min-size: Min number of instances in the warm pool
  • pool-state: The state of instances in the warm pool
  • instance-reuse-policy: At the time of writing, it only controls if the instance should be put back in the warm pool

How is the size of the warm pool calculated?

If you look at the parameter description in the official documentation, you will realize all parameters are optional. On the face of it, this looks like a very easy configuration, but it is not. Calculating the size of warm pools is not straightforward. There are two cases:

  • If max-group-prepared-capacity is not specified:
  • If max-group-prepared-capacity is specified:

For example, if the group max size is 10 and desired size is 5, there will be 5 instances in the warm pool. If you do expect to reach the max size of your group in normal scaling, this works pretty well. However, that is rarely the case. If your group max size is 100, but you expect to have let's say 50 instances during normal operations, your warm pool size will be 50.  You will be paying for the EBS volumes that will rarely be used.

For this reason, it is important to specify max-group-prepared-capacity and min-size. This way you have better controls over warm pool size, while still having the cushion of a large max-size. Let's revisit the previous example with max size 100 and desired size 50. If you specify max-group-prepared-capacity to 60 and min-size to 6, you will have 10 instances in the warm pool for normal scaling. If your load increases above expected thresholds, you will still always have 6 instances in the warm pool. Eg. if your desired size increases to 70 instances, your warm pool still maintains 6 instances.

How does scaling for warm pools work?

A big part of the instance configuration for the warm pool is controlled using lifecycle hooks. In particular, it is done by the Origin and Destination parameters in the lifecycle hook event. Three separate workflows can happen:

  1. Adding instances to warm pool

In this case, the Origin will be EC2 and Destination will be WarmPool. This lifecycle hook is where you run the initialization part - download data, setup configuration, etc.

2. Launching a warm instance into the live pool

In this case, the Origin will be WarmPool and Destination will be AutoScalingGroup. This lifecycle hook is where you need to update the instances' data - check for any updates that may have happened since the instance was originally initialized. You may also need to run other workflows to make the instance ready to serve production traffic - registering to schedulers, updating service mesh, etc.

3. Transitioning from live pool to warm pool

In this case, Origin will be AutoScalingGroup and Destination will be WarmPool. For this transition to happen, you will need to set the instance-reuse-policy parameter to ReuseOnScaleIn=True. This lifecycle hook is where you will do things like deregistering from schedulers, etc. - anything needed to stop the instance from serving production traffic.

Conclusion

Warm pools are a useful tool to reduce costs while not risking service availability. You can pre-initialize instances by downloading large data sets or application code and stopping the compute to save cost. You only pay for running EBS volume. The launch happens in near-constant time load on the application increases.

Check this feature out and let me know what you think. As always, you can subscribe for more blogs on AWS features and services that can help you with your improving your cloud setup. I am always open to requests for things you would like me to write about :)