Amazon AWS DevOps Engineer Professional – Incident and Event Response (Domain 5) & HA, Fault T… part 9
- Multi AZ – Overview
So one thing that’s important to understand going into domain six of the exam is how multi AZ fits within aws. So there are some services where you need to consciously enable multiaz. That includes efs, elb, asg, beanstalk, in which you assign which AZ you want your applications and instances to be deployed in. So for efs, you say, okay, I want to have a network interface in zone A, B and C, the elb you set to which zone do you want you to have load balancing? Same for the asg, where do I want to create my EC? Two instances. And because beanstalk relies on elb and asg, then it inherits the same settings. Then there are rds or elastic Age where there is a multiaz setting.
And that means that there will be a synchronous ten byte database for failover in one other Availability Zone. So here we use two AZ for rds and elastic H multiaz. And this is used when we want to have a failover database within the same region. Then we have aurora, and aurora is a bit special because the data itself is stored automatically across multiaz. So by default, aura will have its data stored as multiaz so resistant to failures of an entire Availability Zone. But you can have a multiaz master for the database itself, which is the exact same setting as the rds database.
Okay, then we have elasticsearch, which is a managed service, but still we have to enable consciously the multi master setting to get a multi AZ setup for elasticsearch and Jenkins, in case you want to do self deployment of Jenkins. If you want to have multiaz Jenkins, then you need to have a multimaster. And this is usually done by leveraging an auto scaling group to deploy your Jenkins Master. Now, there are some services where multi AZ is implicitly there, and that’s for example, S three, except if you use the one zone infrequent access storage tier, in which case that means that your data, as the name indicates, is only stored in one zone, except apart from that, it’s stored in multiaz.
Then there’s dynamodb, where your data, when you put it into dynamodb, is going to be replicated across multi AZ. By default, there’s no setting to enable. And then when you think about it, all of aws proprietary services, all the managed services that iOS offers, they’re multi AZ. They work even if an AZ goes down, supposedly. Okay, so there’s this one thing I forgot to mention in this slide, which is what about EBS? So EBS is a very special service, right, because EBS is tied to a single Availability Zone. So how can we make EBS multiaz? And I want you to pause the video right now and just think about as a devops, how can you make an EBS multiaz? And just think about it for a second.
All right, so let me give you the answer, or at least one answer, of how you can make this happen. So let’s create an auto scaling group.That oto scaling group is going to be multiaz but we will set one for the Min, the max and the desired capacity. So that means that we’ll have one instance in one of the AZ we specify in the asg. And then we’ll create a lifecycle hook for determined action. And when this lifecycle hook happens we’ll make a snapshot of the eds volume. And then when this instance starts we’ll have a lifecycle hook again and we’ll copy the snapshot, create an EBS volume and attach to the instance. So what does it look like as a diagram? So you better understand what happens.
We have an auto scaling group and it’s across AZ one and AZ two. And the Min max and desired capacity is one. We have an instance and right now it has an EBS volume. And let’s say it’s going to get terminated for whatever reason. Maybe the AZ goes down. So there’s a terminal hook and the lambda function will back up the EBS volume using an api call and create a snapshot from it. Once the backup is done, then the lambda function will tell the auto scaling group that yes, we have successfully done this hook and then the auto scaling group say okay, I need to have one more new instance. So a new instance will be created in AZ two in which case we have a new lambda function maybe that will create as part of the launch hook, it will create a new EBS volume from the snapshot that we have created from before.
And that EBS volume will be obviously placed in the same AZ that the new instance is in. And then the instance at launch time will have a script maybe, or the lambda function will have a script to attach that EBS volume onto your new instance. And effectively we’ve made EBS multiaz. So this is kind of like a hack but this is the kind of thing that the exam can test you on because it requires some automation, some scripting. It’s a good example of how we can use autoscaling groups, some hooks, some lender functions, how we see the limitations of EBS volumes being tied to a single AZ. But snapshots can be used to make an EBS volume move, quote unquote move from an AZ one to an AZ two.
Okay? lastly, for EBS volumes we’ll add it if you’re using a pi ups volume. So provision I ups volume which is io one. To get the maximum performance after a snapshot you need to read the entire volume once. That means that you will be prewarming the io blocks. So on this new instance in 82, if the EBS volumes we attach is an io one type of volume, then the EC two instance should read the entire volume blocks, all of them to make sure that the volumes is prewarmed. And then you’ll get maximum performance for that. io one volume. Okay, so that’s it for this short lecture. Critical lecture about multiaz. But hopefully that makes a lot of sense to you, and I will see you in the next lecture.
- Multi Region – Overview
So while multiaz is super important, it is quite natural to use. But the exam is going to take you to the next level of things. It’s going to ask you about multiregion. And multiregion is something that’s quite new in aws. It’s something that you have to sometimes manually implement. And so as a DevOps, it does require you to have a lot of creativity. So let’s talk about some services that have some concepts of multiregion. We have dynamodb Global Tables and we’ve seen the those it’s a way to get multi way replication. So it’s active, active and it’s enabled by streams.
So we can have a dynamodb table in three different regions. And anytime a write happens in one of these regions, it will be replicated through a stream to the other regions. So they’re quite nice if you have a global application. aws config has the concept of aggregators. So if you want to aggregate the configuration of all the regions within your accounts, but also across all the multiple accounts that you have, you can create an aggregator. And we’ve seen this how to do this, and this is how, for example, you can get an aggregated view of all the configuration and all your compliance in one account.
rds has support for cross region read replicas and these read replicas are usually used to improve read. So for example, if say, we have an application in Australia and we want to have our American users have a better usage of our application less latency, then we could create a crossregion reapplica in America and that rereplica will be only used for reads. It could also be used for disaster recovery. That means that whenever the Australia region fails, then we could promote one of these read replicas in America and say, now you are the main database for writes. So just remember that using read replicas they’re only used for reads, not for rights. So the users in America, while the Australian database is still up, would have to write to the Australian database.
Okay, then we get aurora global database. So the name is a bit confusing. That means that one region in aws is the master and the other region there’s only two regions for global databases in aurora. The other one is used for read latency again read improvements and disaster recovery. But this is more of a feature directly baked into aura. And the api call is very simple to promote an other region as a main database. Then EBS volumes. We’ve seen those for multiaz, but the snapshots can be copied across regions. Same for your amis, their amis they’re scoped per region. So if you want to have a globally application using the same ami, you need to copy that ami across regions to make sure that other regions do have access to the same ami.
And it’s very common practice than to store the ami ID in a parameter store, so that you have a constant name across all your regions and rds snapshots, again, they can be copied to other regions. If you wanted to have this kind of backup strategy for your rds database, vpc peering, that’s important to mention. So it allows you to have private traffic between your different regions in aws. So if you deploy in one region and another, the vpc ID will be different. Okay, super important. And so as such, you need to peer traffic between regions. For that you need to have vpc peering. So next we have Route 53 and then we’ll be using a global network of dns servers for your dns queries to be available.
So Route 53 by default is a multiregion available service and we’ll see how we can use Route 53 in a second to do some multiregion architectures. And then we have S three, we can do cross region replication. So from one S three bucket in a region to another one and that can be really helpful to have a backup of your data in another region. And that can be really nice for your disaster recovery strategy. Or if you want to provide another region with a low latency access to your data, that could be really helpful as well. Or if you just wanted to aggregate data across many regions into one central bucket, maybe if you wanted to do some analytics workloads.
We have cloudfront cloud Front as a global cdn at the Edge locations. So whenever you deploy a cloud front distribution, then it’s going to be available in so many different countries in the world, more than, more than the regions you have. There’s more Edge locations than the regions out there. And if you wanted to deploy a lambda function at the edge, then you would use lambda at Edge and these lambda functions would be deployed onto the Cloud Front cdn and you could use those to do a B testing, for example. And using this A B testing, you’re able to say, okay, some users will be assigned a cookie to go to version A of our application and some users will be assigned a cookie to go to version B.
But there’s many different use cases for lambda at Edge. So let’s remember all these services have multi region enabled. I don’t think I’ve forgotten one, but maybe if I did, please let me know and then I want to talk about some things. Like what if we use a traditional architecture using an application load balancer and a no scaling group, for example. So for this we can use Route 53. So for example, the health check is going to be at the very center of your Route 53 multiregion architecture because it allows you to do automated dns failovers. So what does that mean? We have our first architecture in one region and that’s an application of the balancer and it’s connecting to an auto scaling group that has many instances.
So we know this is very classic and we have a second architecture in the exact same architecture, but in another region, okay? And we want to be able to direct our users to the region that makes the most sense to them. So for this, we create a Route 53 record. And it could be a latency record, it could be geoproximity, it could be another kind of record. But what this does is that our users are redirected to a region that makes sense for them. Okay, but how do we know to redirect them to a region that also is going to be healthy and available, right? So for this, we have health checks, and you can have a health check for each and every single region. So we can have a health check for region one and a health check for region two.
And what will happen is that if that health check fails on region one, then even though we have a latency or geoproximity record, route 53 will make its best effort to not send any traffic to the region that’s having health check issues and send the traffic instead to the other regions. So effectively, we have a multiregion application being created this way. So what is a health check? There’s three ways of creating a health check in Route 53. There is health check that monitors an endpoint. It could be an application, a server, or another database resource. So that could be good. But say your application is having a little bit of trouble processing some requests and http maybe the health check will fail and you don’t want it to fail.
So that could be one way of doing things. But there’s a second way you could have a health check that monitors other health checks, and that’s called a calculated health checks. And that means that you can create some complex formula around how you want all your health checks to go together and create a combined health check. And then finally there’s a third health check, which I think is extremely important to understand and know as the DevOps going into the exam, is that the health check can also monitor any cloudwatch alarm of your choosing. So a cloudwatch alarm can be backed by any cloudwatch metric that you want, even a custom metric, okay? So you could have full control over how a health check is defined in Route 53.
So, for example, if you wanted to start redirecting requests to another region because you’re experiencing throttles in dynamodb, then you could create a cloudwatch alarm on top of that metric. And then when that alarm goes off, when it’s in alarm state, then the health check will be wrong unhealthy, and the Route 53 will know to redirect your users to the other region. So that could be quite interesting. And you should know as well that the health checks themselves do provide some metrics in cloudwatch. So if you wanted to have some alerting of any kind of hey, when do my health checks go off, can I be alerted? In slack, for example. Yes, you can create a cloudwatch metric that’s going to be integrated with a new cloudwatch alarm that will send data maybe to an sns topic.
The lambda function will be connected to your sns topic and that lambda function will be sending notifications directly into your slack channel. Okay, just an idea, but this is a high level overview of multiregion in aws. Hopefully that makes sense. All of this is not new, but hopefully it makes sense to you. And remember that some services do have a concept of multiregion baked into them, while others, like these ones, must have route two to three as an overwhelmed overarching service to perform this multiregion deployments and that health checks are very, very important and central piece of those deployments. Okay, well, that’s it for this lecture. I hope you liked it and I will see you in the next lecture.