Amazon AWS SysOps – Databases for SysOps Part3
- RDS API & Hands On
Believe it or not, you need to know a few APIs going into the exam for as a sys ups. The first one, and I think the most important one is going to be the described instances API, which does exactly what you think it does. It describe what’s in your instance. So it will help you get a list of all the DB instances that you have deployed and that includes read replicas. So it’s a very nice way to get the list of all your read replicas. And also it helps you get the database version.
And you may have a question around how do I get the database version. Now, you can also create a snapshot using an API that makes sense. So maybe you’ll have questions around how do I automate snapshots. Maybe AWS Lambda is a great way of doing it using this API describe events to get all the information around events that happens in your DB instance.
And also, finally there’s an API to reboot your instance if you wanted to trigger, for example, a failover multiaz, if your instance is multi AZ and just rebooting your DB instance to have that. So let’s just have a quick play with describe DB instances. So say for example, that I wanted to know my DB version and I also wanted to know my DB replicas. So all the replicas of my database, I could use the described instances API call for this. So let’s just go ahead and do this. And this is as simple as writing AWS RDS. Describe DB instances pressing enter. And here we go. I get some information about my database.
So let’s see what we have. Well, we have two DB instances. We can see that the engine is postgres, we can look at the endpoints and we can see when it was created. We can look at the security groups, we can know the parameter groups. So group demo postgres right now and it’s in sync. We know the AZ, we have a lot of information around the subnets.
We have some information if it’s multiaz or not. We have also a list of read replica DB instance Identifiers. So I know that my DB read replica, my DB replica is my read replica. Really, really cool. I can see the engine version, which is what I wanted to see. So 10. 4 that’s postgres and you get some information around, for example, the multiaz, what’s the secondary availability zone for me? So basically, any information you may get, you get it from here. And here I have my DB replica that I can also describe.
I get the endpoint directly, which I can connect to. And you also get a lot of information around the parameter groups, which is going to be exactly the same. And this one doesn’t have multiaz and this one doesn’t have any read replicas because it’s already a replica. So that’s it. This is quite a cool API call, but it does allow you to get a lot of information and short lecture, but I think it’s very telling about what you can do with the API. And I will see you in the next lecture.
- RDS & CloudWatch
So let’s look at RDS with Cloud Watch. And so as we know, Cloud Watch is going to be obviously deeply integrated with RDS. And so we’re going to get from the hypervisor some basic metrics such as number of database connections, swap usage, read IOPS, and write IOPS, read latency, and write latency, read throughputs, and write throughputs disk depth and free storage space. So from these, obviously we could do a lot of trial shooting. For example, if the latency is high, something is wrong.
If the read I ups are peaking, maybe we’ve reached our eye ups for our EBS volume. If the disk cue depth is too high, that means that a lot of operations are waiting to be executed. We can also look at the CPU and sure that it’s not too high, all that stuff, right? So this is provided by Cloud Watch Basic Metrics, and you can enable enhanced monitoring.
And we’ll do this right away. And enhanced monitoring is usually metrics that are gathered from an agent that runs on the DB instance. So it is more special, more specific. And what you get with it is that you get a lot of information around the processes and the threads that use the CPU, and you get access to over 50 new CPU memory file system and disk IO metrics. So let’s go have a look at how we can enable enhanced monitoring right now. So let’s go to mydb modify, and in there we’re going to be able to enable enhanced monitoring.
Because if you remember when we first created our database, we did not enable it. So I’ll just scroll down and as you can see all the way at the bottom, we can enable enhanced monitoring. In there, you’re able to basically get more information and you need to define a monitoring role. We’ll just create default and RDS will create a role for us.
And then you can say how granular you want your monitoring to be. I’ll leave it at 60 seconds. But as you can see, you can get per second granularity, which is quite awesome. And now you click on continue and apply immediately to apply the changes right away. And here we go.
It’s working. So it’s just going to give you a little bit of a weird error message, but you just have to wait until everything is created. And now we’re done. So if we go to monitoring in there, you’re going to get Cloud Watch so we can look at what Cloud Watch gives you. So it gives you CPU utilization, the number of connections that we have, the storage space, the right, I ups the read, I ups, the memory, that’s free. But then using basically enhanced monitoring, we’re going to get a lot more. So to view enhanced monitoring, pretty tricky monitoring.
And then there’s a drop down. And here you can choose Cloud Watch. So this will just give you the Cloud Watch information. But also you can choose enhanced monitoring. And this will give you all the enhanced monitoring. So we have to wait a little bit until the agent starts up and you can also get the OS process list if you wanted to. So let me just wait a little bit for enhanced monitoring to kick in and I’ll be right back.
Okay? So now if I go to enhanced monitoring, I start seeing some data. So as you can see, we have free memory, active memory, CPU, user load, average use file system, we have more graphs as well, available number of tasks running, et cetera. So enhanced monitoring does give me access to more metrics. And if you go to OS process list, you get some information around which processes using whatever of memory and Ram.
So this gives you a lot more information around what is happening on your database thanks to this enhanced monitoring. So that’s it. I just want to show you this, but overall the idea is that the exam will ask you questions around how maybe a metric can impact your database and just use your common sense and just troubleshoot using this, using your brain basically. So that’s it. I will see you in the next lecture.
- RDS Performance Insights
So let’s talk about RDS Performance Insights, and that’s the last one around monitoring for RDS, but I think you need to know it and it is quite a cool tool overall. So Performance Insights allows you to visualize your database performance and you can analyze why there are issues that affect your database. So you can visualize the database load and you can filter the load by four different types of metrics. The number one is called Weights. And Weights is very cryptic, but basically, basically it shows you the resource that will be the bottleneck.
So it could be CPU, IO could be some lux. So if you needed to know how to upgrade your database to a different type of instance, and know if you wanted to optimize the CPU or optimize the I O, weights will give you a good idea of what your database waits on the most, if it’s a CPU, the I O, et cetera. Now you can also filter by SQL statements.
So if an SQL statement is by default blocking your database or making it crawl for whatever reason, then you can identify that SQL statement and maybe you can reach out to the team or the application that runs that statement and try to understand how you can optimize that statement. You can also filter by look by host. So basically by filtering and by grouping by host, you can find the server or the application server that will be maybe hammering our database and take action, maybe blocking access or maybe talking to them and understanding why they’re using so much of our database, maybe they need a read replica and then finally by users. So this is to connect using the usernames and to find maybe the user that is using the most our database.
So the idea is that this Performance Insight allows us to understand who or where from or what statements is using our weights, which is a CPU, I O, luck, et cetera. Now, database load will be evaluated as the number of active sessions for the database engine. And the whole thing basically allows you to travel shoot, including using this SQL code or putting loads on your database.
So our database is not interesting because it’s not running anything special, there is no application. So I was just going to show you screenshots of what you can expect. So this is Performance Insight and this is from the Amazon blog. And so as you can see, you get a line right here and it shows you the max number of CPU.
So as long as you’re under it, that means that you’re doing fine. But if you’re over it, that means that your database is running at capacity. And so if you slice that graph by weights on the right hand side, it shows you what’s using your database. So CPU is only zero point 32 and IO table, SQL handler is 1. 82. So it seems like IO in there is going to be something that you may want to optimize or improve, maybe using a better disk or whatever. But this is the kind of idea using this, the slice by weights, which gives you like all this list of stuff that’s happening and you can understand what’s blocking your database. Now you can also basically analyze the SQL queries that are running.
So as you can see, this 1138 is an SQL query that is taking a long time. So maybe update schema three, table one, set s one equals mg five random. Basically that does a lot of action and this takes a lot of resources. And maybe their team running that SQL statement, you should go and talk to them and understand why they’re doing it and if they can do anything better. So this is quite a cool way of basically troubleshooting which SQL queries are taking the most time. And then you can also view by users.
So you can see on the right hand side that RDS user is having four connections, Jeremiah is having two. Again, that’s from the blog. But you get a lot more information around who is doing what on your database, which could be quite helpful as well. Maybe some application is opening 1000 connections and you need to know about it right away. And so by host, I had one of the graph for it, but it will basically tell you which application server is connected to your database. So overall, all these things really help. I think Performance Insight is a great tool to help you travel shoots. Let me just show you where it is in the AWS console. To use Performance Insights, you will go to the left hand side and you would need to actually enable Performance Insights.
So to enable it, you would have to go to your database and modify it. So you click on your database instance and click on Modify. But it turns out that because we’re using a T two type of instance, performance Insight is not supported right now on DBT two instance classes, unfortunately. So we can’t have it. And this is why I showed you screenshots of it, but from the screenshots you should get a great idea of how it works. But if you had a non T two micro type of instance, then you should be able to modify your DB instance and enable performance monitoring. And so, as you can see, if I click on it and maybe do M four large and I scroll all the way down, then I start seeing Performance Insight and I can enable it and say how many days of retention of data I want, et cetera, et cetera, but I won’t do it right now. But you get an idea and now you’re supposed to be a monitoring expert for RDS. So congratulations. I will see you in the next lecture.
- Aurora Overview
So let’s talk about Amazon Aurora, because the exam is starting to ask you a lot of questions about it. Now, you don’t need deep knowledge on it, but you need enough high level overview to understand exactly how it works. So this is what I’m going to give you in this lecture. Aurora is going to be a proprietary technology from AWS. It’s not open sourced, but they made it. So it’s going to be compatible with postgres and MySQL. And basically your Aura database will have compatible drivers. That means that if you connect as if you were connecting to a postgres or a MySQL database, then it will work.
Aura is very special and I won’t go too deep into the internals, but they made it cloud optimized and by doing a lot of optimization and smart stuff, basically they get five X performance improvements over MySQL on RDS or three X to performance of postgres on RDS. Not just that, but in many different ways, they also get more performance improvements to me, I watch it, it’s really, really smart, but I won’t go into the details of it. Now, Aurora storage automatically grows, and I think this is one of the main features that is quite awesome. You start at 10GB, but as you put more data into your database, it grows automatically up to 64 terabytes.
Again, this has to do with how to design it. But the awesome thing is that now as a DB or a Sys apps, you don’t need to worry about monitoring your disk, you just know it will grow automatically with time. Also, for the read Replicas, you can have up to 15 Replicas, while MySQL only has five. And the replication faster, the way they made it, it’s much faster. So overall it’s a win. Now, if you do failover in Aurora, it was going to be instantaneous. So it’s going to be much faster than a failover from multiaz on MySQL or RDS. And because it’s cloud native by default, you get high availability. Now, although the cost is a little bit more than RDS, about 20% more, it is so much more efficient that at scale, it makes a lot more sense for savings.
So let’s talk about the aspect that are super important, which is high availability and read scaling. So Aura is special because it’s going to store six copies of your data anytime you write anything across three AZ. And so Aura is made such as it’s available. So it only needs four copy out of six for writes. So that means that if one AZ is down, you’re fine, and it only means you have three copy out of six needed for reads. So again, that means that it’s highly available for reads. There is some kind of self feeling process that happens which is quite cool, which is that if some data is corrupted or bad, then it does self filling with peer to peer application in the back end and it’s quite cool. And you don’t rely on just one volume, you rely on hundreds of volumes.
Again, not something for you to manage. It happens in the back end, but that means that you’ve just reduced the risk by so much. So if you look at it from a diagram perspective, you have three AZ and you have a shared storage volume, but it’s a logical volume and it does replication, soldiering and auto expanding, which is a lot of features. So if you were to write some data, maybe blue data, you’ll see six copy of it in three different AZ. Then if you write some orange data, again, six copy of it in different AZ. And then as you write more and more data, it’s basically going to go against six copy of it in three different AZ.
The cool thing is that it goes on different volumes and it’s striped and it works really, really well. Now, you need to know about storage and that’s it. But you don’t actually interface with the storage. It’s just a design that Amazon made and I want to give it to you as well so you understand what aura takes. Now aurora is like multiaz for RDS. Basically, there’s only one instance that takes rights. So there is a Master in Aurora and that’s where we’ll take rights. And then if the Master doesn’t work, the failover will happen in less than 30 seconds on average. So it’s really, really quick fell over. And on top of the Master, you can have up to 15 read Replicas, all serving reads.
So you can have a lot of them. And this is how you scale your read workload. And so any of these read Replicas can become the Master in case the Master fails. So it’s quite different from how RDS works, but by default, you only have one Master. The cool thing about these read Replicas is that it supports cross region replication. So if you look at Aurora on the right hand side of diagram, this is what you should remember. One Master multiple read Replicas and the storage is going to be replicated self healing, auto expending little blocks by little blocks. Now, let’s have a look at how Aurora is as a cluster. This is more around how Aurora works. When you have clients, how do you interface with all these instances? So, as we said, we have a shared storage volume and it’s auto expanding from 10GB to 64 terabytes.
Really cool feature. Your Master is the only thing that will write to your storage. And because the Master can change and fail over, what OVERAP provides you is what’s called a Writer Endpoint. So it’s a DNS name, a Writer endpoint, and it’s always pointing to the Master. So even if the Master fails over, your client still talks to the Writer End point and is automatically redirected to the right instance. Now, as I said before, you also have a lot of read Replicas. What I didn’t say is that they can have odo scaling on top of these read Replicas. So you can have one up to 15 read Replicas and you can set up odo scaling, such as you always have the right number of read Replicas. Now, because you have odo scaling, it can be really, really hard for your applications to keep track of where are your read Replicas, what’s the URL? How do I connect to them?
So, for it, you have to remember this. Absolutely. For going for the exam, there is something called a reader endpoint. And a reader endpoint has the exact same feature as a writer endpoint. It helps with connection load balancing and it connects automatically to all the read Replicas. So anytime the client connects to the reader endpoint, it will get connected to one of the read Replicas. And there will be load balancing done this way. Make sure, just notice that the load balancing happens at the connection level, not the statement level. So this is how it works for Aura. Remember writer endpoint, reader endpoint. Remember auto scaling? Remember shared storage volume that auto expends. Remember this diagram because once you get it, you’ll understand how Aura works. Now, if we go deep into the feature, you get a lot of things I already told you.
Automatic failover backup and recovery isolation, insecurity industry compliance, push button scaling by auto scaling, automated patching with zero downtime. So it’s kind of cool, darker magic they do in the back end. Advanced monitoring, routine maintenance. So all these things are handled for you. And you also get this feature called Backtrack, which is giving you the ability to restore data at any point of time. And it actually doesn’t use rely on backups, it relies on something different. But you can always say, I want to go back to yesterday at 04:00 p. m.
And you say oh no, actually I wanted to do yesterday at 05:00 p. m. And it will work as well, which is super, super neat. For security. It’s very similar to how RDS works. You get encryption at rest using Kms. You get automated backups snapshots and Replicas. And all of these will be encrypted encryption in flight using SSL. So it will be the exact same process as MySQL and Postgres. So for MySQL will be a SQL statement to enforce SSL, whereas for Postgres it would be the parameter group you have to use. You can authenticate to Aurora using IAM, which is super neat. And you’re responsible for protecting the instance with security groups. Finally, just like for RDS, you cannot SSH. So there’s stuff you still can’t do. And on top of it, finally, there is something called Aurora Serverless, which is in that case, you don’t need to introduce an instance size at all. It’s just auto scales for you.
It only supports, for now, MySQL 5. 6 as of January 2019 and Postgres, which is still in beta. And it’s helpful, basically, when you can’t predict your workload, so when you can’t know how it will scale or if you get a huge workload peak. And so the DB cluster will start shut down and scale automatically based on how many CPU or connections they are available on your Aura instance. And the cool thing is that you can migrate from a cluster to serverless and vice versa. So you can choose basically based on the time of the year, what you prefer having, and when you have Aura serverless scaling, you have ACU.
It’s called Aurora capacity units. So it looks a lot like what happens for DynamoDB. So you just have capacity units and basically the capacity units will increase and decrease as the load increases or decreases and you’re going to get billed in five minutes increments of using an ACU. Now, some feature of Aura serverless are not supported versus just Aura itself. So if you plan on using serverless, just look at the documentation to make sure you know exactly what you’re missing on. So that’s it for Aura. I hope that makes sense for you and I will see you in the next lecture.