AZ-303 Microsoft Azure Architect Technologies – Monitor Azure Infrastructure Part 2
- Alerts and Metrics Walkthrough
Azure can of course monitor lots of different components. But for this example, to show you some of the options that we have, we’re just going to monitor our virtual machine. We can of course monitor any of the items we have, so storage account, keyboard and apps and so on. But as I said, to keep things simple, we’re going to go virtual machine.
So by default when you’re you look at a virtual machine, you’ll already see a number of charts that are provided for you by default. And these are actually driven by the monitoring system that Azure provides you. If on our virtual machine submenu down the lefthand side, we’ll find this metrics under monitoring. If we go onto there, we can then actually start bringing up information for any of the metrics for the virtual machine. All we do is in the metric we go through and select from one of the metrics that we have.
So let’s go for something simple such as Essential CPU and that will show us our historic CPU. Now, my virtual machine has not been running very long. I’ve only had it running for an hour or so. What we can do is up here in the top right, you can set the time range. I’m going to change that to the last hour and click Apply.
Then that will show some more granular details of the CPU percentage over the last hour. If we want, we can actually add additional metrics to this chart. So for example, we could add something called inbound flows. If we do that, that will combine the two on the chart. This way you can provide lots of different metrics that you want to track because especially when they’re interrelated. And it will just help you try to identify any issues that might be occurring. So now I’m just going to remove the outbound flows at the inbound flows.
What we can also do with our chart is change the type so it defaults to line chart. You can also change to area, bar chart, scatter and so on. I’m just going to change that to area. What we can also do is we can pin it to a dashboard. Now when we come to pin to a dashboard, we actually have a couple of options. We can pin to the default dashboard, or we can create specific dashboards for different scenarios. So for now, I’m going to say select another dashboard. What we could do here is create a new dashboard. So I’m going to call this one VM Monitoring. Select your subscription and click Apply. This will create you a new dashboard. And we can either select it here, or instead we can go to the menu here, click Dashboard. This takes us to our default dashboard.
So if we’d have clicked that to pin two dashboard without creating a new one, it would have just added it here. But actually I’ve created this new one called VM Monitoring, going here we can now see that dashboard. So from here you can create yourself dashboards for various different scenarios depending on what you need them for. So for example, if you were a monitoring team looking after virtual machines, you could add various metrics here to give you a quick overview of the health of your estate. I’m just going to go back and add my CPU threshold again. And the other thing we can do here is actually create alerts.
So our CPU has never really gone in the past hour above just under 30%. And what we might want to do is alert us if that went above, say, 80%. So what we can do is create an alert. We can actually create alerts from various different areas within Azure, but because we created our alert from a metric that we defined on our virtual machine, it preselected a few items for us automatically. So for example, it scoped this alert to our virtual machine and it’s also created a condition for us which is measuring CPU. If we click the details of that, we can then go in and define what we want that threshold to be. So for example, we could say 80%. We can also have dynamic thresholds. So with Dynamic Thresholds, Azure automatically adjust the threshold as to what it thinks it should be based on baseline created over a period of time. You then tell it rather than a set amount, you tell it whether to alert on high, medium and low. So depending on how strict you want to be, you would set that accordingly.
I’m going to change that static and 8% and then click done. The next thing I want to do is add an action group. This is optional, but it will help you group actions when you have lots of different things and you’ll see why in a second. So the first thing I’m going to do is create a new action. Now I’m going to call this monitoring team. So the idea is within this action group would be our monitoring team members. And what I want to do is define an action. So I’m going to say email. And with various different options we can choose from, we can make this action group fire a run book, kick off Nigeria function, logic app, secure, web hooked. But I’m going to go for email stroke SMS. And this now asks me who we want to email to. So in here we can give a list of email addresses, click OK. And if you have more members in your team, we could of course add another email and again, similarly set the additional people. Or in fact, we could have it trigger other actions.
Once we’re happy with all the actions that we want it to trigger, simply click OK. Once that action group has been created, go ahead and click Add in action group name here, select our new monitoring team and click select. And now whenever this triggers it will perform the action in that action group. Finally we can set some alert rules so the alert details is what’s actually going to appear in the alert lock. So for example we could have this here and alternative layer you can also put a great severity description and then finally we can set a severity so this isn’t set anything in particular in the system but when it appears in the alert logs you can set different severities so that you can filter those logs accordingly and we’ll see that later.
For now let’s just set these severity to four enable rule on creation and click create alert go back to our virtual machine now and now go to the alert section under monitoring from here we can either create alert rules or manage existing rules. So we can see here our rule that’s being created and as I say we can then delete it if we wanted to or add additional rules. You can also through here manage our action groups if we wanted to be able to create more action groups or edit that existing action group through here we will also see when those alerts are triggered but as you can see at the moment that that’s not happened just yet.
- Activity Logs
The Azure Activity Log is a subscription log that provides insight into subscription level events that have occurred in Azure. This includes a range of data from Azure Resource Manager operational data to updates and service health events. The activity log was previously known as audit logs or operational logs. Using the Activity Log, you can determine what, who and when for any right operation taken on the resources within your subscription. For example, who stopped the service? It provides an audit trail of the activities or operations performed on your resources by someone working on the Azure platform. You can also understand the status of the operation and other relevant properties. There are different categories for Event Logs, and these are Administrative, which contains the record of all create, update, lead and action operations performed through the Resource Manager.
The Service Health category contains the record of any service health incidents that occurred in Azure. Alert categories contain records of all activations of any Azure alerts you set up. So for example, if you set up an alert rule to trigger when CPU is above a percentage, it will appear in that category. The Auto Scaling Category contains records of any events related to the operation of an auto scale engine based on any auto scale settings you’ve defined in your subscription. Recommendations contain recommendation events from certain resource types such as websites and SQL servers. These events offer recommendations on how to better utilize your resources. The Security category contains records of any alerts generated by the Azure Security Center. And finally, there’s the policy and resource health. These categories don’t contain any events. They are reserved for future use.
- Activity Logs and Diagnostics Walkthrough
In this lecture, we’re going to have a look at diagnostics and activity logs. The alerts and metrics that we saw so far were just the alerts and metrics that we got through the Azure console. If we go to diagnostic settings, we can actually start collecting more detailed information. And this information comes from within the virtual machine itself. And it does this by installing an agent on the VM. So before we can use this, the first thing we need to do is enable guest level monitoring. So if we go ahead and click that, that will then go ahead and install the monitoring agent. Now, it’s also going to output that data to a storage account. And because we didn’t select when it’s actually going to create a different storage account there, but we’ll go and change that shortly, it can take a while for that monitoring agent to install. So just give it ten minutes and then come back. Once that monitoring agent is installed, we can then go through and configure various aspects.
So for example, we can enable different ones. By default, we get performance counters enabled. If our virtual machine had ASP. Net applications or SQL Server installed, we could select those options, start monitoring that information, you can tell it which logs to collect, again, standard application system, and also the different kinds of logs that you wish to monitor. Again, if you have IIS installed, you could tell it to pull the IIS logs. And again, if you’ve got a. Net application, you can enable that logging as well. You can also ask it to collect memory crash dumps. The syncs area allows us to actually output our data to other services such as Cache and Insights. So if you have a. Net application that’s running Application Insights are configured with Application Insights, we can actually plug it up through here to output those details and give you more granular information there.
Finally, through the Agent tab, we can set where it’s sending the data to. And more importantly, perhaps he set a disk quota. So this just enables you to make sure your costs don’t run out of hand by getting too much logging information sent to it. So, whereas diagnostics logs and the other logs we’ve seen so far have been about performance and metrics coming from the virtual machine itself, the activity logs are more about events that are happening within Azure itself.
So if we have a look at here the top four that have been triggered in the last 6 hours, or we’ve got some health events, we’ve started the virtual machine and we’ve updated the virtual machine through here. It’s very similar to the logging that we saw earlier. We can go and change the time span to different values, we can change the event spiritus. So by default it’s showing all events. And again, we can filter down on just critical or error events and in fact, we can add other filters. So for example, we could choose event category. With event category, we can then tell it just to show certain events. So for example, if we’re just interested in security events or service health events, again, we can filter down on those.
Now our various agents have been installed. What I want to do now is go back and look at the diagnostics logs. If you remember, in the diagnostics settings, we had to enable them in here and tell it which storage account send them to. So what I’m going to do now is go and actually have a look at that storage account. Within the storage account, if we go to the storage Explorer and open tables, we’ll see that we had a number of tables automatically created for us. So for example, we’ve got the Windows events log table, and if we click on that, we get to see a list of all the logs that we’ve got on our virtual machines. These have all been pulled out of the Windows event log. And so through here, we could then query this database or start pulling the data into other systems before we move off the standard logging and monitoring that we can do. I just want to go back to the dashboards. If we go back to our VM monitoring dashboard from our dashboards on here, we can actually start to edit these and change them. Now, this symbol here means that this dashboard is being shared.
Now, if I want I could unshare that, or by clicking the same icon, I can go and manage the users to see who can access the dashboard. So again, this is a great way for a central administrator to create these dashboards and share them among members. Once you’ve created your dashboard, you go into your role assignment and you would add your users access to that dashboard. You can give the various different levels, but the most appropriate would be read to give people read only access to the dashboard or contributor if you want them to be able to edit the dashboard. If we go back to the dashboard, the other thing we can do is actually start to edit it itself.
So for example, we can add various filters, we can configure the customize the tiles themselves. So for example, in our CPU percentage, we can override the default settings and change this to an hour. We can even resize elements so we can change the chart to look at how we need it. So if we want lots of charts on there, we could shrink that down. And we can also from the Tale Gallery, actually add other charts. Once we’re done changing the layout or customizing our chart, we click done customizing followed by publish changes, which will make those changes then available to everyone else.