Amazon AWS DevOps Engineer Professional – Incident and Event Response (Domain 5) & HA, Fault T… part 2

ASG – Scheduled Actions

Okay, so now say we have our otto Scaling Group created and we want to schedule some actions. So for example, we know that on Thursday night our users are going to come to our website because we know in advance that Thursday night is going to be a great sale. Or maybe we know that every morning on Wednesday there is a lot of users and that’s just based on how our website works. So if we know these kind of patterns in advance events, we may want to schedule some actions for auto scaling groups to scale in or scale out based on their desires we have. So we can create a schedule action.

And here I call it increase instances on Wednesday, okay? And we can change some values for the otto Scaling Group configuration. For example, we can change the min, change the max, or change the desired capacity. So we can say, okay, now we have one, one, but now I want to have one, three, and maybe I want to set the desired capacity to two. But it is definitely possible for you not to specify everything. So you can say one, three, or just set the max to three. This is the kind of situations you can do. And then you need to set the occurrence. So do you want this to be every five minutes, 30 minutes, 1 hour, every day, every week? Once? So you can have this only once or cron.

And using cron, you can select whatever you want. So this would say, okay, every night at 11:00 p. m. , monday through Friday, all that kind of stuff. So you have different ways of specifying this. And this gives you cron, gives you obviously the most freedom to define how you want your schedule actions to work. So maybe you would just want this to happen once and then you need to set the start time. So today is not Wednesday, this is Monday. So I’ll just have Monday. So I’ll say Monday and it seems like this is not working. So this is great. Okay, let’s say it’s tomorrow. Okay, today, here we go, that’s working. And then we have to specify the start time in utc.

So what’s the time utc? And we get the time is 342 in utc. So I’ll say it for 344 in utc, here we go. And create it. And so we have created now it’s saying it’s in the past for some reason. So interesting. 342, obviously, because you said 15, here we go, was created and now has been created. So now hopefully in less than two minutes, the max of my auto scaling group should get to three. Right now it’s one. So let me pause the video and see if that works. Okay, so the time had passed past, so desired one, main one, max one, and if I refresh now it says max three. So this is perfect.

This schedule action has worked. And as we can see, because it was only once, then it has disappeared from here. So if you create schedule actions but they’re recurring, for example, every week, every day or so, or in a cron, then they will obviously stay here in this tab. So what’s important to note here is that using schedule actions, you are able to schedule in advance how your auto scaling group should behave based on the patterns you can predict in advance. And that’s all you need to know going into the exam. But it is quite a very handy feature. All right. Hope you like this. I will see you in the next lecture.

ASG – Scaling Policies

Okay, so now let’s get into scaling policies. So we are going to set some scaling policies in here. And to do this, you click on Add policy. But first I want to show you some important settings that you need to understand. Absolutely. Going into the exam. So the first thing is called default cooldown. And right here the value is 300. So if I wanted to edit it, I click on the edit scroll all the way down and here it is, value 300 for default cooldown. So the default cooldown is an number of seconds after a scaling activity completes, before another can begin. So the idea is that if you have a scaling activity and that activity adds one instance, we need to wait.

I mean the auto scaling group will wait 300 seconds to five minutes until the next activity for scaling will happen. So if you set this to your low value, you may get a lot of instances created very, very quick. If you set this to your higher value, it will take time for your scaling policies to keep on adding instances. So this is really based on your use case and your workload and how fast you want to scale in or scale out. But default cooler 300 is a good value for production. I’ll just set it to 60 just for this use case. So here that means that every 1 minute there can be a scaling activity. So now let me go to my scaling policy and let’s start defining our first scaling policy.

So we’ll add a policy and I’ll just call it target cpu 60. And we’re going to track the average cpu utilization. But you can track about four different metrics networking, networkouts or application load, balancer, requests per count, targets, and the target value will be 60. Finally, we need to say how long the instances need to warm up after scaling. So the idea is that if we look at the cp utilization and we have four instances in the group, if we are over 60% of the cput utilization, what will happen is that our new instance in will join in, right? And when our new instance joined in, at first the Cputilization will be zero.

And so this warm up thing is saying how long do you need to wait before this instance metrics goes into the average cpu utilization with the four other instances that are already there. So if you set this to a high number, for example, for example 300, that means that the instance has five minutes to warm up and then after warm up, its metrics will be rolled back into the average cpu utilization. So you need to be really careful about this setting and the cooldown setting, because if you said this to a number that’s higher than the cooldown, it’s likely that you’ll have more scale in that will happen. So here we’ll just say, okay, instance needs 10 seconds to warm up after scaling.

And the question is, do you want to disable scale in? So do you want to make sure that this scaling policy will never terminate instances, only create instances. So you could say this and it will never terminate instances. Or you could say this and it will terminate instances. So that the average cpu utilization for this group will be 60. So we’ll use this one and we’ll obviously look at simple scaling policy and step policies as well. But for now we’ll just create this target policy. So we’ll create it and here we go. So now we’re saying as required to maintain average cp utilization at 60 and you can add or remove instances and you have 10 seconds to warm up. Okay, so now let’s go back into our instance.

And here it is. And what I’m going to do is connect to this instance using easy to instance connects and I’m going to make sure the cpu utilization goes up like crazy. So to do so, I need to go in here and install two things. I need to install IPO. So I’ll just paste this first command in here and we are done. And then the second command is to install the stress package. So we’ll do pseudo yum, install stress and we should be all set. So pseudo install stress and we’re now good to go. Okay, so we can now launch the stress utility. And so for this we do pseudo stress minus minus cpu and we’ll say two and minus minus timeout and we’ll say 180.

So that will be running a stress on the cpu all the way to 100% for three minutes. So I’ll say okay, enter. And now the cpu is going to get started and be stressed. So that’s excellence. This is running fine. So now what we should be seeing is that when we go to our instance monitoring at some point the utilization will go up. And so that means that our auto scaling policy should scale in for us. So what we’ll do is that I will just wait for things to happen and we can go to the monitoring here as well to see some metrics. So I can enable group metric collection and this will take some time, but the group metric collection will be collected here and so on.

And so if we go to activity history, we should also see the fact that the scaling policy did create some instances as a result of increased cpu utilization. So let’s wait a little bit for this to happen and see if we get anything in the activity history. So the test has completed and I’m going to give it a bit more time to run again because I think there wasn’t enough time for Cloud Watch to see things. So I’m just going to give it a timeout again of 200 and see if I get any results in monitoring for Cloud Watch. We still don’t have any information on the cpu. So I’ll just wait a little bit. Okay, so now we see that the cp utilization of the instance is actually going up. It is at 68.

So hopefully what we should be seeing very, very soon in the scaling policy, because the average cput utilization should be 60 and we are over 60, then a new instance should be launched. So we need to wait a little bit and hopefully the scaling policy will kick in and create a new instance. And here we go. Our new instance is being launched and we can see that it is because there has been a difference between the desired and actual capacity. So the scaling policy did increase the desired capacity from one to two. And it says, okay, this alarm right here, target was in state alarm. And that triggered the policy Target cpu 60 in changing the desired capacity from one to two.

So here’s the full explanation as to why. Another easy to instance is being created in our oto Scaling group. And so now that we have two instances in it, then we will have more capacity for the cpu. And so, as we have two instances in this group, now, what will happen is that when this, when this function completes, so when I stop it eventually, because the cpu utilization of both instances will be zero, then the average will be zero. And the Auto Scaling Group should go ahead and terminate one of these two instances for us. Something that’s very interesting to see as well is to go to Cloud Watch.

So if we go to Cloud Watch in here and we’ll look at the alarms, as we can see in here, this target Tracking demo ASG launch templates, okay, this one right here was created as a result of the target policy that we have created. And it being an alarm was what triggered this change of desired capacity. So we can look at this underlying metric as well and see how it went. And we can look at a lot of different things. The fact that we should not delete or edit because this is managed directly through the Auto Scaling Group and so on. But it does give us some information around the actions of what will happen in case the state is an alarm, which we are right now.

So it does give us some good insight into what a real Cloud Watch alarm may look like. And we can look at the history of this alarm to really understand and get some information around when this was created and how often this was created and so on. So if we can get back to the alarms and look at Targets target Tracking demo, here we go. We can see there’s a second one as well that was created. And it’s okay for this one if you are under 42%. So the 42% was a value that the Auto Scaling Group did create for us. If you are less than 42% for 15 data points within 15 minutes, then what we should be doing is scale in. And so this won’t happen right now, but this will happen as this new instance get created.

And now wait a little bit, but it’s cool to see. So as a result of creating one scaling policy, we had two cloudwatch alarms tracking two different, the same metric for two different values that will kick in and make sure that the scaling can happen smoothly. Okay, so now let’s remove this policy. So we’ll delete it and we’ll have a look at the other type of policies we have in our accounts. So we’ll create a scaling policy and this time it will be a simple scaling policy. So with this simple scaling, I can say a simple demo. And you need to select a policy that is linked to an alarm. So we need to create an alarm in advance.

And then we say, okay, when this alarm is triggered, then you add two instances and then wait 60 seconds before allowing another scaling activity. So this is the kind of things that you can do with the simple scaling and you can have as many simple scaling policies. So I’m just going to add a random alarm in here. You can add another policy that is going to be simple scaling again, and you can save for this alarm, then just another demo. Then you’re going to remove two instances and then wait 10 seconds before doing something. So here, using scaling policies, simple scaling policies, we are able to use any kind of alarm that we create through cloudwatch alarms in advance.

So if you create them through cloudwatch alarms in advance, we can use them into this simple scaling policy and we can say what happens. How many instances should I remove or add based on this? So this is more involved, okay, than the first one, but it does give us more flexibility because any cloudwatch alarm based on any metrics could be a trigger for these scaling policies. So I’m going to delete these two. And then finally, the last kind of scaling policy that we can see in here is going to be steps. So here we’re saying, okay, demo steps, you need to execute the policy when this alarm is breached.

And then you need to add one instance in case you have cp utilization being greater than say, 40%. And then you need to add two instances. And we’re going to set the target value to be 60, 70%. And so here we’re saying, okay, we have as many steps as we want and say if the cpu utilization is really, really high over 90%, then you should go ahead and directly and add three instances. So it does look very close to the simple scaling policy. But here we are able to add steps. And so we can define intervals of the value of the metric in this alarm. What is it to be to add one, two or three type of instances. And that’s all you should know for scaling policies.

So I’m going to cancel this, and then in this odd scaling group, I’m just going to set the desired capacity to being one now and click on save and we’re done. Okay, so that’s it for everything about scaling. Remember, three kind of scaling policies. And they do allow you to either track a metric being cpu station, networking network out or alb connections, or you can set simple or step scaling policies to directly autoscale based on an alarm that you choose to create in advance. All right, that’s it for this lecture. I will see you in the next lecture.

Amazon AWS DevOps Engineer Professional – Incident and Event Response (Domain 5) & HA, Fault T… part 2

Related Posts