Amazon AWS SysOps – Monitoring, Auditing and Performance part 1
- Section Intro
We’re finally getting to one of the most important sections for the exam. It is around monitoring, audit and performance. We’ll have a deep look at Cloud Watch, cloud Trail and Config. Now, Cloud Watch is one of the most important topics for the Sysaps exam, but alongside this whole course, as soon as we’ve seen a new technology, we’ve looked at Cloud Watch metrics. So in that regard, we’ve covered many of the services. Cloudwatch Metrics we’ll still look at some important details in Cloud Watch such as Cloud Watch dashboards, then we’ll understand how Cloud Trail works in depth and also AWS config. And we’ll compare the three to understand the key differences going through some scenario using for example, an elv. I hope you’re excited and I will see you in the next lecture.
- CloudWatch Metrics
Okay, so let’s talk about Cloud Watch metrics in details. Now, alongside the course, usually at the exam, they will ask you about, hey, we have this metric and it’s pretty high, or it’s pretty low, how can we use it to do this and that? So all along that I’ve given you tips around which EC Two metrics to look out for, which RDS metrics to look out for, a Elb, etc. This is what you need to know. Now, you need to know a little bit about the basics of Cloud Watch, but I believe that if you, you come into this course, you already know them, but let’s just go over them quickly, just so we are on the same page. So, Cloud Watch provides metrics for every services in AWS. Monitoring is at the core center of AWS, and a metric is going to be a variable to monitor.
So it could be CPU utilization, networking, number of connections, et cetera, et cetera. Now, metrics will belong to namespaces, so they’ll be grouped and a dimension, it will be an attribute of a metric. For example, which instance ID sends the CPU utilization. For EC Two, that could be a dimension. Now, you can have up to ten dimensions per metric, and metrics will have timestamps. So obviously, because it’s a time series, we get a metric over time, so they will have timestamp and you can create a Cloudwatch dashboards of metrics. Now, we’ll see dashboards in much greater details in the next lecture. So that’s it for the basics. Now, we have some metrics called the EC Two detailed monitoring. It’s something that we have to enable.
So if you remember EC Two instances metrics, they’re there every five minutes. But if you enable detailed monitoring for a cost, you get data every 1 minute. And we already know this. And so if we use detailed monitoring, we can get faster auto scaling for our SG. It could be a way of improving the scalability of our application. Now, the AIDS free tier will allow us to get ten detailed monitoring metrics. So it’s quite a nice tier. And this is all you should know. Remember, for EC Two, we’ve seen this before, memory usage is not pushed by default, it must be pushed from the inside of your instance as a custom metric. And for this, we’ve used the monitoring scripts, and that’s the way to do it. Okay? Now finally you can get custom metrics in Cloud Watch, and you should know about them.
So by default, you can send your own custom metrics to Cloud Watch using the Cli or whatever SDK you want. And you can also send obviously dimensions that attributes to segment your metrics, so up to ten, just like before. So we can send instance type, D, environment, name, etc, etc. To just give dimensions to our metrics. And then the metric resolution by default is going to be this time 1 minute. So we have a standard resolution, 1 minute, but you can have high volume, high resolution metrics and you can send them up to 1 second. So every 1 second, obviously, when you do use that higher resolution, you’re going to pay a higher cost. So this is if you need extreme detail and extreme information around how your application is faring every 1 second.
Now to just send a metric to Cloud Watch, you used to use the API called Put Metric Data, and that is for custom metrics, obviously. And then in case you get any errors, the thing to do in AWS is to use a throttle, an exponential backup. So that means that if you get an error at first, you’re just going to retry every 1 second, and then 2 seconds, and then 4 seconds up until it works. So now let’s have a look at Cloud Watch metrics in the AWS console. Okay, so for Cloud Watch, we’re just going to find the services and type in Cloud Watch. And in there I’m going to open the Cloud Watch console. We’ll go to Metrics, and in Metrics we can see there’s a bunch of metrics groups.
So based on the services you use, you will see some metrics being populated right here. So we get EC Two, RDS, Elastic Cache, s three logs, there’s a bunch of stuff you can have. And so just to get one metric working right now, because everything is being deleted all the time, I’m going to launch an EC Two instance very quickly. I’ll just launch an instance, I’ll select Amazon Linux Two, I’ll select T two micro, and I’ll just click directly on Review and Launch Launch, and then say I acknowledge that it’s okay. So right now I’ve just launched an instance in Ireland, and so it’s an EU one C and an instance that is pending. And what I’m going to do is wait just a few seconds until the instance metrics appear right here.
Okay? So now I can click for example on EC Two, and I can click on per instance metric. I can look for CPU and see what we have. So we have a CPU, credit usage, CPU, utilization, et cetera. I’ll just use the instance I just launched, so 73 E at the end. So let me click on it. Here we go. CPU utilization over time for this metric. And here we just get a graph, a line over time for this metric. So, not very fascinating, I have to admit, but it is something, and we can get some kind of graph happening. So the idea is that here we could access the monitoring as we know already stirring from this console, but also we could access all the metrics straight from Cloud Watch. And the reason we would do it is to create Cloudwatch Dashboards. And we’ll see this in the very next lecture.
- CloudWatch Dashboards
So cloudwatch dashboards are actually unsurprisingly asked at the exam. They will be one or two questions, and I’ll just give you the tips right now. So dashboards are a great way to get access to your key metrics and to get a good overview of how your application is running based on how you design it. Now, the dashboards, the thing to understand is that they’re global. You can access the dashboard in any region, and we’ll see this in a second. On top of it, you can make a dashboard include graphs from different regions, and that’s really, really cool, because that means that you can have a graph from Ireland, and a graph from the USA, and a graph from Asia. And all these things can appear together on the same cloudwatch dashboard.
And that’s one of the main questions you will be asked from at the exam. You can also change, as we’ll see in a second, the time zone and the time range of the dashboards straight from the dashboards themselves. Finally, you can set up automatic refresh to be 10 seconds, 1 minute to minute, up to 15 minutes, and that’s it. The pricing for dashboards is something you need to be aware of. You get three dashboards up to 15 minutes for free. So we’ll be in the free tier when we do this, hands on, but then afterwards it’s $3 per dashboard per month. So dashboards can get quite pricey over time, but they’re very, very useful if you have an application to monitor in a very certain way.
Okay, let’s go ahead with the hands on now. So let’s go ahead and create a dashboard. So we click on dashboards, and we can click on create dashboard, and I’ll call it my sys ups dashboard. Click on create dashboard. And here we go. Now, in there we’re able to add different kind of widgets, and it could be a line, a stackdria, a number, a text, or a query. Results from logging sites. For now, we’ll just add a line, configure it, and we’ll need to select a graph. So we’ll do CPU utilization, because that’s our favorite metric. And we’ll go into EC two, and we’ll get the one from our instance that is running. So here we go. We get the CPU utilization graph happening right here, create a widget, and here we go.
So now we can see is that this widget has been created, and right now we are in Ireland. So I’m going to click on save dashboard, because you have to save the dashboard every time you do any kind of changes. And so here we go, we’re ready. Now, let’s look at some very interesting stuff for this. First of all, we can select any kind of time frame we want, so we can set up, okay, absolutes, where we can choose some kind of date, frames or relative. We say, okay, I want to get the last 6 hours, and we get this graph, or we want to get the last 30 minutes and we get a much less narrow graph. Okay, next we can click on this button to refresh our data and hopefully get more data in, but also we can click on the top right and enable auto refresh.
And the reason we would do auto refresh is to basically not click on that button all the time, so we can say every 10 seconds, I want to auto refresh. Okay, the other thing you can set is choose the time that you want. So either you want to be in UTC time zone, which is going to show 1130, 511, 4011, 45, or if I want to get my local time zone, we get a different timeline. So twelve, 4535-4845. So this is something you can set up on, on the top right. The annoying thing is that you have to do it all the time. Every time you get back to cloud watch, you’ll have to set up the time zone again. But as we can see, we can get options to get local time zone or UTC.
Now, the really interesting thing to do with these dashboards is to look at how to make them work on the global level. And so for this, let’s do a quick experiment and switch our region. So I’m in Ireland right now. Maybe I want to go to US east North Virginia. So I’ll click on that, and as you can see now, our dashboard is still there. We still get mysis ups dashboard that’s available, and we still see our metric. So what happened? Well, when we created this graph, it was automatically assigned to where we created it. So we created this in Ireland. So I can say CPU Utilization in Ireland. And that graph is actually made from Ireland. Don’t you say? From Ireland? But I just want to specify it right here.
So the cool thing is that we can get global dashboards if I go to my EC two console now, and I am in the other region. So I’m in North Virginia, and I’m going to create a new instance. So I’ll launch an instance, and I’ll do this very quickly. Again, t two micro, and I’ll say, okay, everything acknowledged. So now we have a second instance, and this one is in a different region. It’s in USD one a and the AZ USD one a. And so I’ll wait for it to get started. And what I’m going to do is add yet another graph here and we’ll get a global dashboard. So it’s been a while now. I’ve waited, and so my instance has been started in North Virginia. If I go to monitoring, I can see some metrics are appearing already.
So what I can do now is basically add a widget, and here we can add a line, configure it, and this time I’ll go again, look for the CPU utilization. But I’m in North Virginia, so it’s going to be giving me. So let’s go to EC Two, for instance metric, and then we look for CPU utilization for this one. And press on Create Widget. And so now, as we can see, let’s just zoom in on 1 hour. As we can see, these graphs are slightly different. This one is the CPU utilization for my instance in US East One. So in us east one. And this one is for Ireland. And so these graphs are different. Although they look similar, they’re different. So this graph right here is as peak at twelve five. Well, this one is a peak at 1220.
So this is basically representing my CPU, my easy two instance in Ireland and US East One. And the cool thing is that it is a global dashboard. So if I change to Oops, I need to save the dashboard. Obviously. If I go to say, Oregon for example, just a completely different region, I will see the exact same two graphs. And so effectively, what I’ve done is that I’ve created a global dashboard, and I could go ahead and just do this again and again in different, different regions, and I get a global Dashboard. Now, the reason I’m showing you this is that at the exam they will ask you how to do it, how to build a global collar dashboard. And so this is how I wanted you to see how to do it firsthand. So that’s it. I hope you enjoyed this lecture, and I will see you in the next lecture.
- CloudWatch Logs
So Cloudwatch is also very important to understand your applications can send logs to Cloud Watch using the SDK and that’s a way to log your current application log if it was something very custom but if you want to collect a log from an AWS service, then it is very straightforward. Elastic beanstalk will directly collect the log from your applications and send them to Cloud Watch. ECS will get the collection from the container logs at a lambda. You get the collection from the function log for Vpc flow logs. We’ll get to see this in the next future section. But we’ll get the Vpc specific logs. We’ll get API gateway logs, cloud trail if you set up a filter cloud watch. If you set up log agents on these two machines, and you can get rid of can log all the DNS queries that are made all around your infrastructure.
So overall, you get a lot of services that can directly send logs to Cloud Watch out of the box and then Cloud Watch himself can send the logs to whatever you want. It could be S three if you wanted to archive it from time to time. Or it could be elasticsearch cluster, for example, if you wanted to perform further analytics. Because elasticsearch can have nice searching capabilities for logs. So cloud watch logs though there is one thing to know you need to store. Logs in two things you need to have a log group and that’s a name. Whatever you want. Usually it’s an application name, but it’s really free naming. So you can do whatever you want. And then within the log groups. You can have many log streams. And this is what a stream of a specific file or application or a container will be.
So this is usually when you have one log stream per container, one log stream per application or per log file, that kind of thing. And then once you have defined a log group and many log streams, you can define a log expiration policy, whether or not your log never expires. Expires in 30 days. Et cetera, because you get to pay for data retention in Cloud Watch. So the more data you store in Cloud Watch, the more you’re going to pay. So this is a good strategy, maybe, to have it for 30. Days, export the logs to S three and then delete in Cloud Watch from time to time. And then you can even use the AWS cli if you want to tell Cloud Watch logs, that means follow the logs as they appear.
Which is a nice way of seeing how an application is behaving in real time. Finally, if you are sending the logs to Cloud Watch some common mistakes is not to get the IAM permissions, right? And so when you don’t have the IAM permissions, right? Obviously things won’t work and it could be quite tricky to debug sometimes. Now, logs can be encrypted. You can use Kms for encryption. And that works really, really well. Now, a bit of insider on the new features of Cloud Watch. They’re not necessarily going to appear in the exam today, but you need to know about them anyway because I want to give you real life skills. And so you can use Cloud Watch logs and you can use filter expressions.
And so with these filter expressions, for example, we can search for a specific IP inside of a log. So you can basically search for whatever you want. And then on top of it you can set up a metric filter, basically to give a metric based on the filter you define. And you can use that metric filter to trigger Cloud Watch alarms. The idea is that for example, you’re looking for a specific IP, you set up a metric filter and then anytime that IP appears, it will trigger an alarm and you will know about it right away. Could be to detect an attacker or some shady behavior or whatever you want really. And then there is this new feature which I’m really really excited about, which is called Cloudwatch Log Insights. It’s out of november 2018 was announced at Reinvent Conference.
And you can basically use this to query logs using a common language that’s easy to use and you can add queries directly into your cloudwatch dashboard. And so there are some sample queries that are given in the UI. And you can see for example, the common query says I want a 25 most recently added log events, or the number of exceptions logged every five minutes, or the list of events that are not exceptions. So you can have queries that are common, but also cloud Trail Lambda, route 53, Vpc flow logs and you can even write your own. But I’m really excited because I think it brings a lot of usability and ease to query Cloudwash logs that was not easy to do before we filter expressions. So that’s it for Cloudwash. Now let’s go into quick hands on Cloudwatch logs on the left hand side.
And as you can see, we have different log groups where he logged some information in this course. So we can use for example, the VAR log messages and see this log stream that was done right here. Or we can even look at for example, the secure shelf outputs whenever we launch SSM and see whatever was created during these secure shelf sessions. So all these things are super cool. So what we need to see is that there is a log group, so it’s a top level group of logs, and then if we click on one, we get different log streams. And so here, for example, we get a log stream for every session that was made using SSM, which is quite nice for VAR log messages, we’ll get a log stream for every EC two instance that we use. So here’s an easy to instance ID.
So that’s very nice as we can see, we can get an expiration of events. So we can say okay, I want to expire events after either never expire or you can say all the way up to ten years. You can say one month or whatever, which is very nice. And then you can set up Metrics Filter. So let’s click on for example, VAR log messages. Let’s just look at something that’s inside, for example, Reloading, we’ll look at the Reloading keyword. So we’ll click on Metrics filter zero Filters and here we’re able to add a metric filter to this log group. So I’ll add the metric filter and here we can say whatever we want to basically look for a pattern, so we can look for the pattern saying Reloading and we’ll contest the pattern and it says okay, I found two matches out of 15 events in the sample log.
So maybe we don’t know, it’s just a random thing right now, but maybe Reloading means that something is wrong on our machine and if Reloading happens too many times, maybe we want to lug that out and have a metric. So I’ll use Reloading and I’ll say okay, this is great, I have two lines containing Reloading. I think this is the kind of things I want to have with this pattern, but we can have much more detailed patterns as you can see here. And then I’m going to assign a metric to it. Now I’m going to say, okay, the filter name is Reloading and the metric name space is Log Metrics. We can say Reloading errors and we’ll say okay, create the filter and now this is a metric on its own.
So we could create an alarm from this metric and we can get information around how this metric is behaving over time thanks to this log filter. So log filters are quite nice as you can see. Now we have one metric filter appearing right here, so we can straight back jump to it. But even better, now there’s Cloud Watch log insights that’s now available. We can try it. Now we’re not going to try it because we don’t have much information, but basically you have some sort of query language right here which is quite easy to read honestly, and we can run the queries and then it’s going to give us data over time. So we can get a log visualization, we can get logs or visualization and we can have sample queries.
So to get guided, for example, you could query Cloud Trail or you could query Lambda or whatever you want. So if we look at a Cloud Trail query, for example, we can say okay, I want to see the number of log entries by region and easy to event type. And here we get this whole query written out for us. So it’s a nice way to get guided. Now it’s not in the exam just yet, but it should be very soon. And the idea is just okay. How can we use query logs? Well, we can use log insights to query the logs using some sort of structure language directly in Cloud Watch. So that’s it. That’s what I want to show you for Cloud Watch logs. I think it’s really exciting what’s happening right here, and I hope you understand how this works. I will see you in the next lecture.