Amazon AWS SysOps – Monitoring, Auditing and Performance part 2
- CloudWatch Alarms
Now let’s talk about cloudwatch alarms. So, alarms will be used to trigger notification for any kind of metric that used to specify and the alarms can trigger auto scaling. If you do action and SNS notifications, you get various options. You can choose sampling a percentage, you can choose the max, the Min, the average. You have different ways of basically computing your alarm. And the alarm can be in different states, it can be in OK, that means everything is good and that your metric is under your threshold. Insufficient data, which is when you miss data, and finally alarm, which is when the alarm actually is triggered because you’ve crossed a threshold.
Now you can choose an alarm period and that’s basically how long you want your alarm to be evaluated for. So it’s in seconds. And for a high resolution metric, you can choose 10 seconds or 30 seconds, otherwise you can choose 1 minute and so on. Now, the alarm targets, as I said, we can stop, terminate, reboot or recover an easy to instance. So alarms can be used basically to recover your instances. If anything goes wrong, we can trigger auto scaling actions directly from your Cloud Watch alarm. So for example, we could say okay, this alarm, when it gets triggered, increase my auto scaling group size. And finally we could also send notifications to SNS.
And by sending notifications to SNS, we can do pretty much anything we want in terms of integration. For example, we can link SNS to lambda and so on. But so the really cool thing you have to remember is that you can directly operate on these two instances or do auto scaling or SNS, that is for the Cloud Watch alarm targets. Now, going into the exam, what’s good to know? Well, you can create alarms based on the Cloud Watch logs filter metrics filter, which we created from before. The Cloud Watch doesn’t test or validate the action that is assigned to it. So if you assign the wrong action to Cloud Watch, you won’t know about it, you won’t get an error for this.
And then if you want to test alarms and notifications, we could set the alarm state to use the cli, and this is a cli command that we would use. So we use AWS, Cloud Watch set alarm state and then we give an alarm name and the state value and the reason will be based, for example, for testing purposes. So, let’s have a look at alarms right now. So I am back in my Ireland region and my alarms. And so, as we can see already we have two alarms that were created for our Aura database. They haven’t been deleted. This is when we set up auto scaling for Aura. And so, as we can see, these alarms are in unsufficient data because basically we have deleted our Aura cluster, so we don’t have any data for this metrics.
So the idea is that here we get some alarms that basically help us autoscale our Aura database. And so say, okay, if here the CPU utilization is greater than 60% for three data points within three minutes, then scale up. And for the Oral Low, it says if the CP utilization is less than 54% for 15 data points within 15 minutes, then scale down. And so this is a kind of action basically that would trigger auto scaling. And it’s a very complete type of auto scaling policy. Now, we could go ahead and also create our own alarms. So for this we could click on Create Alarm and we could choose a metric. It could be our EC Two instance.
And we could choose the CPU Utilization Metric. And I’ll search for it. Here we go. I’ll select this one, which is the one that we have from before. I’ll select this metric and say, okay, maybe the alarm is going to be called CPU Utilization Terminate Instance. And so we’re saying terminates the instance if the CPU is 100% all the time. So Terminates instance if CPU 100% all the time. And so we say, okay, if the CPU utilization is greater or equal to a larger, say 99% for, for example, we’ll say ten data points in ten minutes, then basically that means that our instance is pretty much stuck. Maybe something is wrong is going on in our instance. And in this case, we want to be deleting it. So when we have missing data, we’ll just treat it as Ignore.
So that means that we maintain the Alarm state, or we can say Good not Breaking Threshold or Bad Breaching Threshold or just missing. We can leave it as missing for now and the action is okay. Whenever the state is alarm, we could send a notification, or we can just trigger an EC Two action. And so I’ll delete the first one and I’ll say, okay, the State is an alarm. What do I want to do? Well, we can recover the instance, stop it, terminate it, or reboot it. And so maybe reboot it is a fair choice because maybe when my instance is having a very high CPU for 100% for ten minutes, something’s going wrong. So I’ll reboot it and I’ll say, okay, this is good. It will create an Im policy basically to being able to reboot my instance, which works.
And I’ll say, okay, this is great, create the alarm. Now, if I wanted to basically test out this alarm, I would have two ways of doing it right. Number one way would be to actually go into my EC Two instance and launch something to have 100 CPU Utilization for ten minutes. But that would take a lot of time. Or we can do something really cool and use the Cli to basically change this alarm to an Alarm state. So for this, I’m going to go to my cli and type AWS cloud Watch set alarm state. Maybe we can get the help of this command so we could read and say okay, we can set the alarm to whatever we want and we just need to basically specify the alarm name, the state value and the state reason. So let’s just do this.
We’ll do? Set alarm state. Now, the alarm name I have to get from my UI. So here is the name. CPU utilization Terminate Instance. It’s actually not. Terminate instances. Stop. Reboot instance. I’m very sorry for this mishap, but whatever, we’ll just deal with it. So it’s actually to reboot the instance, not terminate it. And then we need to specify the status of the state, the reason. So we’ll do state reason to be equals to testing it’s, just what I want to name it. And then finally State Value so we can set the alarm to be in the alarm state. So we’ll just say alarm. So here we say okay, for this Alarm CP Utilization Terminate Instance, even though it’s wrongly named, we’re going to set the state to alarm.
And the reason is because we’re just testing it to see if it works. I press on Enter and now this completed. So let’s go back to our UI and refresh this. As we can see now my alarm is in the Alarm states and the reason is, well, we just basically triggered it on purpose using the Cli. And so what will happen is that our instance should reboot automatically. So if we go to our EC Two instance right here and go to our instances, well, basically if I were Ssh in it, it would have triggered the CSC an alarm right here. And so this will trigger a reboot for us. So this is quite neat because directly from the EC Two console we get to see the alarm status and so we’ll go ahead and just reboot the instance for us. So now if we go back to our alarms, we see that this one is getting back to OK.
So it’s been after a while and so if we click on not details but History, the really cool thing we see is that we get a whole history of the action. So when we created the instance the alarm, it was going from insufficient data to okay. And then basically we triggered the alarm. So the alarm went from OK to alarm. And so what happened is that our EC Two instance automatically starting a reboot and then the reboot was succeeded. And after the reboot was succeeded, our alarm switched back from the alarm states to the OK state. So using the alarm history, we get a lot of information around what happened and why it happened, which is also super cool. So that’s it for cloudwatch alarms. I hope you enjoyed it and I will see you in the next lecture.
- CloudWatch Events
So Cloud Watch events, you don’t need to know a lot about them, just know what they are at high level. Basically you choose a source and you apply a rule to it and you give a target. And basically based on an event happening in the source and the rule you decide the target will change in some ways. So the schedule can either be a cron job, so that means that from time to time, every five minutes, or every day, or every day at 02:00 p. m. , you will get an event being triggered. Or you can set an event pattern, for example, reacting to a service, doing something, for example, say an easy to instance has been created, or for example, code, pipeline, state changes, whatever you want.
You can trigger lambda functions, SNS, sqs, kinesis messages, so you can pretty much set any target you want. And then cloudwatch events as of the event will generate a small Json document basically to give more context and information about the change to the target service. So that’s all you need to know. Let’s just create a quick cloudwatch event just to try it out. So we go to the left hand side on events and here we can get started and create a kind of event. So you can either set up a schedule and we’ll say a fixed rate of x number of minutes or crown expression. And here are some sample events that just being shown right here. Or you can select an event pattern and this is where you select a service that you want the event to come from and the event type.
So as you can see in the service name, we have so many services available to us. We have API, gateway, code, Star, auto, scaling, and that’s just for the as. You can go all the way down and find pretty much any service in AWS triggering some data in Cloud Watch events. So it’s really cool. We can start basically automating a lot of our infrastructure using Cloud Watch events. For now I’ll just use EC Two. And I’ll just say, for example, we can do an EC Two instance state change notification and we can say okay, we can have specific states and so we can say okay, the states I want to have is going to be pending.
And this is basically anytime someone creates an instance, we’re first going to be into the pending states. So the cool thing is that now I’m going to want to receive a notification anytime someone creates an instance. So pending is fine and then I can say okay, any instance. Or you could specify instance ID if the instance were already created, but we’ll just say any instance because we’re talking about new instances. And so the event pattern we get as a preview is this. We get the detailed state which is pending, okay? Now, the simple events we get out of it is, for example, we’re going to receive an easy to instance state change notification and it’s going to give us the region the time.
And it’s going to give us the resources it’s going to be applied to so we’ll know right away which instance is being created in which region. So that looks good. I’ll create a target, and a target could be a lot of things as well. As you can see, there’s so many things you can select to target, including some easy to actions such as create snapshot, reboot instances, stop instances, terminates, et cetera, et cetera. But for me, what I want to do is just send myself an email. And so for this I’ll just say, okay, an SNS topic looks like the right way, but then we don’t have any topics yet, so I have to go ahead and create one. So I’ll go to SNS and quickly create a topic. So for this, I’ll click on Get started. I’ll create a topic and I’ll just call it my sample SNS topic and create the topic.
Here we go. Now we get an ARN and we get a region, that’s perfect. And from there we can create a subscription. This is basically saying, okay, what happens when some message is going to the topic? So we can say email, and we can send an email to whatever address you set. So it could be Stefan@example. com, create the subscription. And now you basically have to validate this confirmation of the email just to say that the subscription is active. So I’m going to do this right now. So I receive a notification, I confirm the subscription to my notification, and now say, okay, the subscription has been confirmed, I just put Stefan at Mailing network. com as a subscriber. Perfect. So now we can go back to Cloud Watch and select the topic.
So I’ll just add a target, it’s going to be an SMS topic. And here we probably have to refresh the whole page, so let’s just go ahead and refresh the whole page at once. Here we go. The target is going to be an SNS topic, my simple SNS topic. And the event pattern is going to be for EC Two. Event type is again, we want to get an instant change state and we want to go for pending excellence and this looks good. And then finally for the outputs, we’ll just send the match event. So the entire Json documents right here will be directly sent to my email. Okay, configure details, everything looks good. I’ll say new instance notification and the state will be enabled. Create the rule. And now the status has been created and my rule has been done.
So now what I can do is go to my EC Two management console and I can go ahead and launch a new instance. So launch instance, I’ll select this one, review and launch launch and I acknowledge it. Launch instances. And here we go. Now my instance has been launched and so as we can see, the instance site isn’t pending. So when my instance site isn’t pending, my rule, my event rule. So cloud event rules should automatically get triggered and send me an email to my mailing addresser. So let’s have a look. Okay, what I can do is just go on mailing at her and just wait a little bit. And here we get a notification message just right now which says hey, look like this Json document is saying that right now in Eus one, this instance with this ARN has been created.
So I just received a new email because I have created a new EC two instance. So it’s kind of neat because again, you can start organizing a lot of your infrastructure and for example, you could monitor abusive instance being created. And finally, we can also get even a metric for the rule. So we could trigger a graph how long, how many times this the rule has been invoked. And so here it’s one. But maybe we want to trigger it for 20. So if someone launches 20 easy to instance, maybe I want to receive an email from it as a sys ups and that could be really, really helpful. So that’s it. That’s how you use Cloud Watch event rules. I hope you liked it and I will see you in the next lecture.
- CloudTrail
So let’s talk about Cloud trail. And Cloud Trail is so important for the exam, but the question is actually pretty easy. Cloud Trail is used anytime you want to provide governance, compliance and audits for your AWS account. Basically, it will track every API call made to your account. And so it could be from the console, it could be from a Cli, it could be from SDK, it could be from whatever. By the way, Cloud Trail is enabled by default. So we’ll get a history tree of all the API calls made within our account. And as I said, it comes from all these various sources. And the really cool thing about it is that from there we’re able to say and see who did what and when, which is quite helpful.
You get to say, now all the Cloud trails can be put into Cloud Watch Logs, so we can get an information of all the API calls that were made straight into Cloud Watch Logs and maybe query them from there. And if a resource is, for example, delete in AWS, that is a very common exam questions, then the first place we need to look into is Cloud Trail because we will be able to see who does a Cloud Trail API call for Delete right away. Now, Cloud Trail will show only the 90 days passive activity. So you need to basically store the data somewhere. After that, it could be Cloud Watch Logs or somewhere else.
And the default will only show the create, modify or delete events. So events that change things. But you are able to create Cloud Trail trails and these Cloud Trail trails are more detailed. Basically, you can choose the kind of events you want to lug, and then you can store this trail directly into S Three. If you wanted to analyze it further, maybe you want to use Amazon Athena on top of it to query these Cloud Trail Trails logs. Now, Cloud Trails can be either region specific or global. So you have these options and when you store them into S Three, automatically they will have SSE S Three encryption applied to them whenever it plays into S Three.
So it’s quite neat. Then if you want to protect obviously these trails for whatever reason, you would use IAM or bucket policies or whatever, you want to protect them. So that’s it for Cloud trail. Now let’s go have a play with it. So, Cloud Trail is a different service. So for this, we’ll just type Cloud Trail in there, and here we get information about the user activity and the API calls being made. So, as we can see here, we already get a dashboard of all the recent events that happened within our Cloud Trail. So we get some describe Alarms, describe Trails, and we can click on View Events, basically to see all the events that have been triggered right now.
So you see there’s put Targets, put Rules, create security Group authorized security Group ingress. So all these things are happening quite quick. We have NSNS descriptions, we have the event patterns, we have the reboot instances. So you’re able to see a lot of things that are happening straight from this dashboard and getting basically all the API calls made within your account on every kind of different resources, which is really, really handy. But from the dashboard we’ll both you create a trail and a trail basically can be a more detailed information stream around the API call being made.
So if we type EC Two all trail, so we’ll say apply the trail to all the regions. Okay, that’s fine, that’s perfect. And then we’ll say, okay, which kind of events do we want to capture? All events, read only, write only or none. And so write only is going to be basically when you do create, modify, delete, read is when you do some describes and all means everything. So we’re going to select all events just to get a lot of information. This part is if you wanted to capture events from S Three or Lambda, but we don’t want that from now, so we’ll just skip it. And then finally the storage location.
Where do we want to store our logs? And we can say create a new bucket, yes or no. And I’ll just call it Stefan Cloud Trail Trails. Okay. And we can set advanced, we can set even a log prefix so we can say whatever prefix we want. So we’ll call it Easy to all logs and that’s it. Okay, now we can encrypt the log files with Ssekms if we wanted to, and in which case we need to specify a Kms key. But I’ll just click on no for now. And basically that means that we’ll use the default Sses Three encryption. We can enable log file validation and we could enable SMS notification for every lock file delivery. But I don’t need that, so I’ll just say no.
Okay, let’s click on create. And now we get a new whole trail that is being created to capture all the actions done on our EC Two service. So what I could do now is basically go to my EC Two service and do a bunch of stuff. So let’s go to EC Two. I’m going to go to my instances. Maybe the new one here, I’m going to terminate it. Maybe this one I want to stop. Maybe I want to go to my security groups and describe one security group. Maybe this one I want to change the rule and add Http just for fun. Click on save. Here we go. It’s been added. Get back to my instances. And we have done quite a lot of things now, so let’s just wait a little bit of time.
And so now if we’re back in our trails, we can click on this EC to all trail and we can review the settings. Obviously, you see logging is on in the bottom and the top right. But what we can see is that there was a last log file being delivered already at 02:17 P. m. Into my s three bucket. So I can click here and get my S Three bucket right away. Now we get some information around the S Three bucket settings. We could also enable cloudwatch logs here to deliver all the trail events directly into Cloud Watch logs, but we don’t need that. Now, if I go back to EC Two, my S Three bucket and click on there, I should be able to see my log files.
So here it is. There’s a log file and it’s a massive Json file. So let’s just download it and explore the content of it. And so after opening it, I counted there’s about 91 events happening right here. We can see all the stuff that was logged back Cloud Trail and they basically represent all the API calls I’ve done. So that 91 by just clicking everywhere. So here we get a describe instance status API call. If I scroll down a bit more, we get a describe instance credit specification API call. So a lot of API calls are made and it gives us some information around on which instance it was done. So, for example, here is the instance ID that I have done this API call on.
You can scroll down and basically look at all the things we have. We have a describe alarm, basically, and you get an error message with a validation exception. You can just scroll down, see describe alarms, describe metric filters and so on. So I’m not going to read everything to you, but basically what you see is that every single API call I do is being logged directly into S Three, into files. And so that’s really handy because now we have a lot of compliance and we can show exactly what happened from which user. So we can see here that the root account did this, on which account they did it, and what was the username that I did it. And so we get a lot of information around security.
For example, was MFA enabled or not, when was this done? And so in case anything goes wrong, or in case you need to show regulators that you’ve done all you could and that you wanted to trace back who did what in your accounts, cloud trail trials would be beautiful for this. Just remember that if you do enable it for all access on EC Two, for example, you may get a lot of events being generated and so the cost may be a little bit high, but that’s it for cloud trail trails. I think it’s really handy overall to know how they work, and it’s quite handy to know how to set them up. So make sure you do set up trails for the important resources in your account. And I will see you in the next lecture.