Amazon AWS Certified Data Analytics Specialty – Domain 6: Security
- Encryption 101
Welcome to this section on security and encryption. These are not necessarily the most fun sections to deal with, but they are super important for the exam. The exam will definitely ask you a lot of security questions as well as encryption. And so Kms, the encryption SDK, the parameter store, I am, all these things are a central piece of the exam and I want to make this as easy as possible because security, I know you’re not an expert, or maybe not an expert, but it’s tough to understand sometimes. But I try to make it as simple as I can with some diagrams. And hopefully by the end of this section you’ll have a good grasp of how security works and which security to use in which circumstances. And then you will rock all your questions.
So first an overview of encryption mechanism and the first one is going to be encryption in flights. Then why would we want even encryption in flights? Well, we want encryption in flight because if I send a very sensitive secret, for example, my credit card to a server to make a payment online, I want to make sure that no one else on the way where my network packet is going to travel can see my credit card number. And so I want to make sure that when I make a payment online, I have that green luck. I have that Https website which guarantees me that it is an SSL enabled website and I will get encryption in flight. And so when you have encryption in flight, the data will be encrypted before I send it and then the server will be decrypting it after receiving it.
But only myself and the server know how to do these things. Now the SSL certificates are what’s going to help with the encryption. And so another way to see it is Https. So anytime we’ve been dealing with an Amazon service and it had an Https endpoint that guaranteed us that it was encryption in flight. And now the whole web, almost the whole web needs to run on SSL and Https. Basically when you have this enabled, you’re protected against the man in the middle attack. And so this guarantees that when you have that green lock and that the service certificate is valid, that no one can retrieve your sensitive information. So let’s do a quick example. Here is us, and we want to talk to an Https website on AWS. Could be Dynamo DB, it could be whatever you want.
And then what we’re going to do is that we’re going to have to go over super-secret data, we’re going to encrypt it with SSL encryption and send this over the network. And then the website will receive that data and know how to decrypt it. Okay, very simple, the idea of it. But the execution is not as easy. So this is how much it’ll give you. The good news is that all programming languages know how to do SSL encryption and decryption.
And all the libraries do this for you, so you don’t have to worry about anything. This is not something you have to deal with directly. The second thing is going to be called server side encryption at Rest. And so that is when the data is encrypted after being received by the server. So before that the server was receiving data, decrypting it, and using it in its decrypted form. Here the server is going to store the data on its disk. And so we need to know that the server is storing the data in an encrypted form, because in case the server gets hijacked by someone else, we don’t want that someone else to be able to decrypt the data.
And so the data will be decrypted before being sent back to our clients. So, thanks to a key, usually called the data key, then that data is going to be stored in an encrypted form and the encryption and decryption keys must be managed somewhere, usually called a Kms or key management service. And the server must have the right to talk to that key management service. So here’s our object, and we’re going to transfer it, for example, to EBS. So it’s going to be transferred over whatever mechanism, and EBS will use a data key. And using the data key, it will perform encryption of that data. And now it’s stored in an encrypted form. And then the day we need to retrieve that data, for whatever reason, then EBS, the service, will do decryption for us using the data key again, and we’ll get the encrypted data and back to us over Http or CTPs, for example.
So this is how server side encryption works. And as you can see, the server side itself or the service manages the encryption and the decryption and uses a data key it has access to. So this is for server side encryption at Rest, and we’ve seen that many AWS services do use that encryption at Rest. Now let’s talk about client side encryption. And in client side encryption, the data will be encrypted by the client and the client is us, and the server will never be able to decrypt that data. The data will then be decrypted by a receiving client.
So all in all, the data is just stored on the server, but the server doesn’t know what the data means. And the server, as best practice, should never be able to decrypt the data anyway. And for this, we could leverage something called envelope encryption. But I have a whole lecture on this later on, because this is pretty advanced, but the exam will ask you about envelope encryption. So for now, let’s just do an abstraction of it. And so we have our object and on our client, we’re going to use a data key and we’re going to encrypt our data client side, okay? So we perform encryption with that data key. Now, we send that data to any store of data we want.
Could be FTP, it could be S three. It could be whatever you want, really. You put your data wherever you want in Amazon or somewhere else, and then when you receive it, your client will receive an encrypted object. And if it has access to the data key, if it can manage to retrieve the data key from somewhere, then it will be able to perform a decryption and get the decrypted objects as a result. So, as you can see now, the encryption happens client side, okay? The server, the data store does not know how to decrypt or encrypt the data. It just receives encrypted data. And so that’s quite secure as well. So these are the three kinds of encryption you can get overall, except envelope encryption that will show you later on. So this is not using Kms just yet. This is just an abstraction of how encryption works. I know this may be a little bit simplified, but hopefully that clears up what encryption it is. And in the next lecture, we’re going to do a deep dive into Kms.
- S3 Encryption (Reminder)
Now we are getting into the fascinating topic of encryption for S Three. Just so you know, the Exam loves to ask questions about S Three encryption. So I require you to pay really a lot of attention here. And I know encryption is not an easy topic, so I really tried my best here to explain to you in simple terms how encryption works in S Three and what are your different options? So there are four methods of encryption for objects. In S three, there is S three SSC s Three and to encrypt S three object using keys and handled and managed by AWS, SS three, Kms, which is the exact same thing, except nowadays will use Kms to encrypt your data, SS three C, which is when you provide your own encryption keys. And Amazon S Three will encrypt your data and client side encryption where you encrypt your data. Client side. Don’t worry, I have diagrams for all of those, just so you get a better idea of how they work. But just so you get an idea, there are four methods of encryption for S Three. It’s super important for you to know again what method is adapted to which example in the exam.
So SS three s three. So, for example, this is the one where the encryption keys are handled and then managed by Amazon S Three. You actually don’t even see them. The object will be encrypted server side, and the encryption type is AES 256. Remember this. To make it work, you must set a header when you send your data to Amazon S Three, which is this very long header. Just remember the form. It’s x AMZ for Amazon Server side encryption. AES 256. Which makes sense because we’re requesting Amazon to perform server side encryption for us with the algorithm AES 256. Okay? So here’s what it looks like. In a diagram, we have our object, okay, and we want to put it into our Amazon S Three bucket, but we want to encrypt it with SSCs Three. So the first thing I’m going to do is make an Http or Https request and I’m going to add that header xamz server side encryption, AES 256. You must set it, okay? What happens is that now Amazon S Three receives our object. It’s there. And then because we requested server side encryption, it’s also going to create a managed key and a managed data key. And this is managed by S Three. And using these two things will happen is that there will be some encryption, and after encryption, the data will be put into the Amazon S Three bucket. Make sense? So the one thing to notice here is that the encryption happens server side.
It happens on the Amazon s three side. And Amazon s Three provides the encryption key. Now, if you use Kms, it’s also an encryption server side. Except this time, the data key will be managed by Kms. The advantage of using kms is that you get more control over the rotation of that key and you can get an audit trail about how that key is used. The object will be encrypted serverside and you must set a header. The header is exactly the same. It’s Ximz server side encryption. But the value this time is AWS Kms. So if we look at an example and a diagram, again, we get the object, we get address three. And what we do is that we transfer the object using Http or Https and the header that we set before. And so the object is in Amazon s three. And so now the key that is used is a Kms Customer Master key or CMKI. So that’s the only difference.
And now the encryption still happens and the data is put in the bucket. So the difference between SSE s Three and SSD Kms is that this time the key that is used is a Kms Customer Master key that you can manage over time. If you use SS three C, then it’s for serverside encryption using data keys that are fully managed by you. The customer outside of AWS, Amazon will not store the encryption key you provide. And Https in this case must be used. Encryption key must be provided in the Http headers for every Http request made. So that’s a lot of information. How does that look like? Because I think that makes more sense to explain. We have the object and we have Amazon Sree, and we provide and we generate a client side data key. Okay? Now over Https only, okay, not Http, Https only because it has to be encrypted in secure connection.
We provide the object and we also provide the data key in a header. The exam doesn’t ask about which header it is. Just so you know, the data key is in one of these header. Now we have put into S Three the object and the client provided data key. Okay? So now we transfer both things into Amazon. S three. Amazon S Three does the encryption between the object and the client provided data key. The object is encrypted into the bucket and then Amazon throws away the client provided data key. So in this example, you see the clients themselves have provided the data key to encrypt the data. So in this case, Amazon just does the encryption, but throws away the key right away. Finally, there is client side encryption. And for this you need to use a library such as Amazon S Three encryption clients, just to make it a bit easier. And the idea is that now the clients must encrypt the data themselves before sending it to S Three. The clients must decrypt the data themselves as well when they retrieve the data from S Three and the customer fully manages the key and encryption cycle.
So now how does it look like? We have Amazon S Three on the right hand side and the client on the left hand side. And using the S Three encryption NDK, we will generate a client side data key altogether. With the object, we will encrypt that data client side. That’s why it’s called client side encryption. So after this, we get an encrypted object, and that object will be transferred over to the bucket. So you see the difference here is that now our clients are performing the encryption and also the decryption. Okay? So these are the four. Hopefully the diagrams just make a bit more sense. And finally, you may have questions about encryption in transit. So encryption and transit is that basically Amazon exposes Http endpoints for non encrypted traffic and Https endpoints, where you have encryption in flight. That means that the data between exchange between two servers is encrypted in flight.
And so you’re free to use the endpoint you want, but overall, Https is going to be the recommended method. And if you paid a bit of attention, if you use SSEC, you have to use Https because you also transfer the data key over the network. Encryption in flight also is called Ssltls in the exam. Okay, so that’s all for encryption, server side, client side, and InTransit. Now let’s just go and do a quick hands on to get an idea of how things work. So now let’s go to upload a file. I’ll upload these same files before my online retail extract. Click on next. Click on next. And now in Properties, if I scroll down, there is the encryption properties, and I’m able to set no encryption, the Amazon S three master key encryption or the AWS Kms master key. And I can select the key.
Either I can use the AOS S Three manage master key, or I can create my own custom Kms IR ARN. So the idea here is that we cannot do SSEC and we cannot do also the client side encryption from the UI, but it’s possible to do programmatically. So here, for example, maybe I want to use the Amazon S Three master key and that Amazon manage all the keys for me. Or maybe I want to have some control over who uses which keys in which file. So maybe I’ll use the AOS S Three AWS Kms Master Key. Okay, I click on next upload. And now my file is being uploaded, and behind the scenes, AWS will automatically encrypt it for me. How do we make sure? Well, by clicking on this file and going to Properties, we can see that the encryption is set to AWS kms.
We can also click. So here we can see AMS kms, and we can also change it the other way to whatever we want. The other thing we can do is if we go back to the bucket and set Properties, we can set a default encryption mechanism to basically store all the files by default with some encryption. So we can say AES 256, which is SSE s three or AWS Kms. And here we have again to specify a key. So this way, click on save. And now any file uploaded to my SJ bucket will automatically be encrypted with AWS Kms. So I hope that shows you all the encryption mechanism you can do in Kms. That’s really helpful if you want to protect your data, obviously, and for for compliance reasons. I hope you like this, and I will see you in the next lecture.
- KMS Overview
Now let’s talk about Kms. Kms is going to be everywhere in AWS anytime you hear encryption. So encryption is usually backed by Kms. It’s an easy way for you to control who has access to your data. And AWS will manage the encryption keys for us. It will be fully integrated with IAM for authorization and with Cloud Trail if you want to track the API calls onto a Kms. So it has seamless integrations with so many AWS technologies that won’t list them all. But, for example, for Amazon EBS, it can help to encrypt volumes. For S three, you can do server side encryption of object using SSE Kms. For Redshift, you can do encryption of data RDS as well. SSM, you can encrypt secrets in the parameter store, et cetera. And on top of it, you can also use the CLI and SDK to leverage the Kms functionality. So, overall, it’s quite a handy service. It’s integrated everywhere. But how does that work? Really well, anytime you need to share sensitive information, you should use Kms.
So that includes database passwords, credentials to external services, private key of SSL certificates. And the value that we have when you use Kms is that the CMK, okay, the customer master key used to actually encrypt and decrypt the data will never be able to be retrieved by us, and the CMK will be rotated, or can be rotated by Kms for extra security. So the idea is that we actually don’t manage the keys. AWS manage them for us, and we actually don’t perform the encryption of self. AWS does the encryption for us, but the idea from it is that we get enhanced security. So overall, why do we use Kms? Well, because we should never, ever store your secrets in a plain text file, especially in your code. So when you have an encrypted secret, you can store it in your code. This time, this is fine, or maybe in an environment variable, even better. So, Kms, when is it helpful? Well, it’s simple.
When you want to use encryption of up to 4 data per call. So you cannot pass a huge file to Kms and tell it to encrypt it. So if your data is greater than 4 KB, which can happen for very, very big data sets and big data, you should use envelope encryption. And that gives you basically, a functionality that I won’t go in depth here, but basically, we’ll generate a new data key, and that data key will be used to encrypt the big data sets. So, overall, think Kms straight Kms for 4 data and envelope encryption if your data is greater than 4 KB, but envelope encryption anyway, leverages Kms. Okay? So to give Kms access to someone, you basically need to set a key policy to another user. And you also need to make sure that the Im policy allows the API calls.
All right, so let’s have a look at how it works on the kms side. So you are able to fully manage the keys and the policies. So you can create keys, do key rotation policy. You can even disable keys or enable the keyBack. You can also edit the usage of the keys, for example using Cloud Trail. And you have three types of CMK you can define. You can have a device Managed service default CMK, which is what you get for free for every service for EBS s three, all these things. Any service will have a device managed service default key. You can also create your own keys for $1 per month. Or you can import your own keys. You basically create them on your own and then you import them into Kms and that’s one dollars per month as well.
And then you’re going to pay for each API call down to Kms. So three cents per ten thousand API calls. What does that mean? That means that anytime you encrypt and decrypt data, you’re going to be charged a very, very small fee. But you need to be aware of it because it’s really important. Because if you have a big data set, say, of a million files, well, you can make the math, but it will cost you about $3 to decrypt everything or to encrypt everything. Okay, so how does Kms work from a diagram perspective? Well, say you have the client or SDK and you have a secret. For example, it’s a password, and that password is not very long, so it’s less than four kilobyte.
Well, you have the Kms service and you’re going to use the encrypt API. What will happen when you use the encrypt API is that within the Kms service, it’s going to look at the CMK you want to use and say, okay, is the user allowed to do this CMK encrypt call directly with reading IAM? So you look at IAM, okay, you have the IAM permissions. It will look at the key policy and say, okay, you do have access to both these things, so I’m going to perform the encryption. So the encryption happens and then Kms will send you back the encrypted secrets. So on our end, we have never seen the CMK.
We just sent something and we received back an encrypted secret. So we’re going to store that encrypted secret and later on our application needs to decrypt that secret. So we’re going to use the CLI or is the SDK again, issue a decrypt API call and using the same CMK again, kms will check the IAM permissions, making sure we do have the decrypt access and also look at the key policy. And then the decryption will happen and it will send us back the decrypted secrets in plain text. So this is how it works. This is basically how Kms works from a diagram perspective. What we need to remember out of it is that we never ever get to perform decryption ourselves. Kms does it for us using the CMK that it has access to. But we do not have direct access to the CMK. You also need to make sure that realize that it is tightly integrated with IAM permissions to ensure that everything is secure along the way. Now, how does encryption works in AI services? Well, some require migrations.
For example, if you create a snapshot or a backup. So if you have an EBS volume and it’s unencrypted and you want to encrypt it, you first need to back it up. But you make a snapshot and then create an encrypted volume. SIM. Four RDS Elastic Cache or EFS. But one technology does allow you to just encrypt something in place, and that’s s three. So if you wanted to have an unencrypted file and encrypted right away, you could use Kms for that. Just enable encryption, and Kms encryption will happen on the fly. And that’s something that’s good to know, but that’s the only technology in AWS that does allow this. All right, so that’s all for Kms. I really hope that was helpful, and I will see you in the next lecture.