Google Professional Data Engineer – Managed Instance Groups and Load Balancing part 5
- Autoscalers
A discussion of auto scaling goes hand in hand along with load balancing. Now that we’ve understood the different kinds of load balancers that Google Cloud platform has to offer, let’s now discuss how auto scaling works. When we set up external global load balancers, we often use instance groups of instance groups. The one we prefer is the managed instance instance group because managed instance groups have the ability to scale instances based on changes in traffic. So managed instance groups can automatically add more machines as the traffic increases. This is scaling up. It can remove instances if the traffic false. This is called scaling down.
The applications that you host on VMs within a managed instance group can gracefully handle increases in traffic because of the auto scaling feature available. Auto scaling also removes instances from managed instance groups if they are idle, if the capacity is not needed. This helps reduce cost when the load on the system is lower. If you’ve provisioned 100 machines and just ten machines are enough to serve traffic, you don’t want the 90 machines to be up and running because that will simply add to your cloud bill. You’d rather that they be removed from your managed instance group. Setting up and configuring autoscalers on the Google Cloud platform is very simple.
All you have to do is to specify what auto scaling policy you want to use and the auto scalar will take care of the rest. The very important point that you have to remember when we talk of auto scaling is the fact that it is a feature of managed instance groups. Unmanaged instance groups are not supported. Unmanaged instance groups have VMs which are of different types. There is no commonality between the VMs. There is no template from which VMs can be created. It’s not possible for them to support auto scaling. The Google Container engine or the Google Kubernetes engine also has its own version of auto scaling that is different from the auto scaling that we are going to discuss as a part of managed instance groups.
Gke auto scaling is called cluster autoscaling. In order to set up auto scaling for your managed instance group, there are two things that you need to define. The first is the auto scaling policy. What metric will you use to auto scale your instances? And the second is the target utilization level. How will you measure this metric and what threshold does it have to fall? Below? The auto scaling policy of your managed instance group can be based on four different metrics. The first of these is the average CPU utilization of all instances within a group. You can also choose to scale up and scale down your instances based on Stackdriver monitoring metrics. The Stackdriver metrics that you can use for auto scaling decisions can be built in metrics, or you can define your own custom metrics.
Not all Stackdriver metrics, though, are suitable for auto scaling. You need to watch out and take some stuff into consideration which we’ll discuss in a little bit. Autoscaling of managed instance groups can also be based on Https load balancing server capacity. Load balancing server capacity is measured in terms of CPU utilization or requests per second per instance. Auto scaling is also possible using Pub sub Queuing workload. This is an Alpha though, so we won’t discuss it further here. Once you’ve figured out what Auto Scaling policy you want to use, you’ll next set a Target Utilization Level. This is the level at which you want to maintain your VMs. If the metric that you’ve chosen for your Auto Scaling Policy goes beyond this Target Utilization Level, that’s when the Autoscaler will add VMs.
If it falls much below this level, that’s when the Autoscaler will remove VMs. The Target Utilization Level that you’ll configure will be different based on the Auto Scaling Policy that you’ve chosen. Let’s say you’ve chosen a Stackdriver Custom Metric which is based on the number of users that are connected to your system. In that case, the Target Utilization Level might be an integer representing the number of users that are currently active. Let’s consider an example of average CPU utilization as our Auto Scaling Policy. Now let’s say that we want a Target Utilization Level of zero point 75. This means that the average CPU utilization across our Managed Instance group should be at 75%. This is across all instances in the Managed Instance group.
The Auto Scaler will be constantly monitoring the utilization of the managed instance group as at any point in time, if the utilization exceeds this Target Level, more CPUs will be added. This will automatically bring the utilization down below the Target Utilization Level. The Autoscaler will add instances, monitor the CPU utilization, and keep adding instances till it falls below the Target Utilization Level. Now, it’s totally possible that requests come in in a burst and suddenly the traffic to your back end becomes very, very high. If utilization reaches 100% during times of heavy usage, the Autoscaler might increase the number of CPUs by 50% of the existing CPUs or four instances. The Autoscaler will choose between the two, whichever one is larger.
It makes no sense to incrementally add CPUs if you are dealing with a sudden increase in traffic and the Autoscaler takes this into account. We’ll now discuss how Stackdriver monitoring metrics work as an auto scaling policy. Stackdriver provides a bunch of common metrics that is available for any VM instance. These are standard metrics that are always available. You can also configure Stackdriver Monitoring to track some custom metrics that you may be interested in, such as the number of users that are active on your site at any point in time. Both standard and custom metrics can be used by the Autoscaler in order to make scaling decisions. It’s important to note, though, that not all Stackdriver Monitoring Metrics are valid utilization metrics that the Autoscaler can use.
They have to have specific characteristics. Only then can the autoscaler use them. Because auto scalers add additional VM instances to cope with increased traffic, the metric that you choose on Stackdriver must contain data for a single VM instance. It can’t contain data just for a group of instances. It must contain data for one instance. In addition, the metric must in some form define how busy the VM resource is. That is, the metric value increases or decreases proportional to the number of instances in the group. That is, if you add more instances, the metric value must fall because now the group is less busy. And if you reduce the number of instances, the metric value must increase because the group will presumably be more busy.
Both of these are necessary conditions for a Stackdriver monitoring metric to be used as an auto scaling policy. Let’s see how Https load balancing serving capacity works as an auto scaling policy. Within that, you can use CPU utilization or requests per second per instance. Here is an example a block diagram of how an auto scalar works with a back end service for Https load balancing. Within the autoscaler, you’ll specify a target utilization level. Here it has been set to 0. 8. This auto scalar interacts with the instance group manager, which manages the instance group that is part of the back end service. Within the back end service, we configure one of the metrics either max rate per instance that’s what is done here or a CPU utilization level.
We’ve mentioned this earlier several times, but it’s worth repeating. Https load balancing server capacity only works with CPU utilization and maximum requests per second per instance. Within the managed instance groups. These are the only settings that can be controlled by adding and removing instances. When we’re using load balancing, let’s take an example of a setting or a configuration that does not work for auto scaling maximum requests per group. This is not a parameter that changes when you add and remove instances from a particular group. The setting is independent of the number of instances in that group. This metric cannot be used for auto scaling. Auto scalers can make scaling decisions based on multiple policies as well.
It doesn’t need to rely on just a single policy in such a situation. The auto scaler will scale based on the policy which provides the largest number of VMs in the group. You can go ahead and configure multiple policies based on different metrics. Maybe a CPO Utilization Metric, a Stackdriver Monitoring Metric and so on. The autoscaler will calculate the number of VM instances based on all of these policies and choose the largest number. This basically ensures that we always have enough machines to handle our workload, and autoscaler can work with a maximum of five policies at a time. Let’s take a specific example and see how autoscaling works with multiple policies. Let’s say we set the CPU utilization auto scaling policy with a target of 0. 8 or 80% average CPU utilization.
We set the Load Balancing utilization with a target of 0. 6. We set up some custom metric with Stackdriver monitoring. Let’s call it Metric One, and that has a target of 1000. This is totally made up. And then we set up another Custom Metric with Stackdriver Monitoring with a target of 2000 once again made up. And the current utilization of your managed instance group is say CPU utilization is 0. 5, which is green. It’s less than the Target utilization level. The load balancing utilization is 0. 4, again, green below the target level. The Custom Metric utilization is however, 1100, which is above our target level of 1000. And the second Custom Metric is at 2700. Once again above our target level.
Based on the two custom metrics from Stackdriver Monitoring, the Auto Scalar will decide that scaling has to kick in. It has to add more virtual machines to this managed instance group. For each of these policies, it will calculate the number of additional machines that have to be added. So in the case of CPU Utilization, it decided that it needs seven more machines. Same for load balancing. For Custom Metric One, it needs an additional eleven machines before we can get that metric below the target level. The number of additional machines needed in order to get the second Custom Metric below the target level is 14. And this is the number that the Auto Scalar will choose. It’ll add 14 machines to the managed instance group.
- Lab: Autoscaling with Managed Instance Groups
In this lab on autoscaling we will first create a custom image for a web server which we will then use to create an instance template. Following that we will create a managed instance group which will have auto scaling enabled and we will go on to attach that to a load balancer. Finally we will test our autoscaling feature by applying some stress on the load balancer. Let us begin though by provisioning a VM instance with a web server in order to create a custom image out of it. So we navigate to the VM instances page and when we create this instance let us name it web Server. We can pick a zone for our instance. In my case I’m going to pick Asia South.
Let’s choose a micro machine type and in the Firewall section choose to allow both Http and Https traffic and in the Management Disks Networking Ssh keys section navigate to disks and uncheck the box for delete boot disk when instance is deleted. So when we do get rid of this instance we want the disk to remain and we will use that to create our custom image. So we hit Create and we’re now ready with our virtual machine instance. Next install Apache web server on this instance. So we Ssh into the host, we first run an app get update in order to update all the packages and we then install Apache two and once that is done, just restart Apache. So we now navigate back to our instances page and we want to see if the home page for Apache is accessible.
So we just click on the external IP address and we can see here that we do indeed see the Apache homepage. Since we will be using this instance to create an image for a web server, let us set it up so that the Apache service comes up at boot time. So for that we just go back to the terminal and run this command and once it is done go back to our instance and reset it. What this will do is it will reboot the machine with the new settings and it will allow us to test whether the Apache service does indeed start up at boot time. So once the reset is complete we navigate back to the home page and we have confirmed that the Apache service does indeed come up automatically after a reboot.
At this point we are ready to create an image out of our instance for which we can delete our instance but first confirm that the disk will be retained after the instance is terminated. So once we do that, just go ahead and delete the instance. Once it is deleted again we just go and confirm that the disk is still available since it has all our Apache installation and the configuration. And in the disk section we do indeed see that our disk is still here. So we are now ready to create a custom image out of it. So let us navigate to images and choose to create an image. We will call this my web server one. The source for our image is going to be a disk and it’s going to be our web server disk.
So we just hit Create at this point and once our image is ready, we can use this to create an instance template. So let us just go ahead and do that. At this point I would quickly like to emphasize the difference between an image and an instance type. So an image is merely a boot disk with a set of tools installed perhaps, whereas an instance type comprises an image, a machine type, a zone and a few other instance properties. All right, now let us proceed creating our instance Template so we navigate to the instance templates page and when we create this new instance template, name this web server template.
For a machine type, let us select a micro and for the boot disk opt to change the default value. And we go into custom images over here and we select the web server one which we just provisioned. So once that is selected, check the boxes to allow Http and Https traffic and when we hit Create, we are ready with our instance type which we can use to create an instance group. An instance group is essentially a group of identical instances based on a template which will sit behind a load balancer. So we navigate to the instance groups page and when we create this group, call this My Web Server group for the location select multi zone and we select a region.
In my case I’m going to pick Asia South and the instance template will be the one which we just provisioned. So it’s web server template. Since this is a lab on auto scaling, we do want auto scaling to be on and the autoscale is based on Http load balancing usage. Also let us change the number of instances from ten to five and for the health check opt to create a new health check. So we give our health check a name and since it’s already using our Http port 80, we just leave all the values as default and save this health check. And finally for the initial delay. So this essentially specifies how long the instance group will wait to perform a health check after initializing a new VM.
So for the purposes of this lab, just change this to 60 because we do not want to wait five minutes. And once that is done, we just hit Create. We will be warned that the auto scaling configuration will only be complete once we attach this managed instance group to a load balancer, but we will be doing that shortly. So we just hit OK and we wait for our instance group to be ready. While this is happening, you will see that under the instances column it is transitioning from zero to one instance. But let us navigate to the VM instances page and at this point we see that one of the instances has been provisioned. Let us see if the Apache homepage is accessible.
So we just use the IP address and we see that the Apache homepage is up. So we are now ready to use our instance group and attach. It to our load balancer navigate to network services and load balancing and we create a new load balancer. And for the type, pick an Http load balancer and when configuring it begin with the front end configuration. We just give this a name and all the other values we can leave as the default. And once this is complete, let us move on to the back end configuration over here. We will need to create a new back end. Service again, we provide a name for this one and we can leave the protocol at http. And for the instance group we will be using the one which we just provisioned.
So with that backend added let us configure this backend service to use the health check which we created earlier in this lab. Okay, so we just hit create to provision this back end service. And now for the final step in creating this load balancer. Let us just give it some name. So we call this web server load balancer. And we just go ahead and hit create. And now the load balancer is coming up. Note that this might take a couple of minutes, but eventually that green symbol will come up and the load balancer is more or less ready. And I say more or less because it will still take another couple of minutes for the back end services and the health checks to be up and running. But for now, let us take a closer look at our load balancer.
We can see that the IP address for the front end is available. And we can also see that in the back end services. Zero out of zero instances are healthy. So that’s what we need to wait for. First, let us navigate to the instance groups. And yes, this is clearly not up yet, but eventually it will be ready for us. To use. And at that point, let us just hit the IP address of our front end and we can see that the Apache home page is reachable. So our load balancer is now ready. And this is also reflected in the status of our instance group at this point. We are now ready to go ahead and stress test our autoscaler. So we navigate to VM instances and provision a new one. Call this one stress test we pick. Some zone. I’m just going to pick Asia South again.
For the machine type, let us choose a micro and once again change the boot disk from what the default value is. So we pick a custom image from our own project and we pick the one which we just created earlier. With that we may choose to allow Http and Https traffic to our instance and then let us just go ahead and create it once the instance is provisioned ssh into it and from the terminal perform our stress tests. So we go into the terminal and from here we will be using this Apache benchmarking tool. So what this command essentially says is that we will be sending 10,000 requests and 1000 of those will be concurrent and we’ll be hitting our load balancer. And when we run this we might see this connection reset by clear error message which seems to be some kind of issue with the Apache web server configuration.
But for the purposes of this lab we will not be debugging it, we will just be running the benchmarking tool again and we will just increase the number of requests to 50,000. So we run this a few times until we’re satisfied that there has been enough stress on our load balancer and we navigate back to the console. And we can see here that for our instance group two new instances are now being provisioned. So we effectively refresh that page and we can see that though this is not quite up to date, the number of instances have come up to one. And we see over here that there are three new instances in our instance group. So the auto scaling feature has worked. With that we conclude our lab on auto scaling.