Amazon AWS Certified Data Analytics Specialty – Domain 6: Security Part 3
- AWS Services Security Deep Dive (3/3)
Okay, so we are in part three of our security analysis of the services and we are starting with EMR. Now, EMR is something that’s going to be extremely important to understand the security of going into the exam. And so as such, you need to pay a lot of attention into the next two slides and you need to remember everything I’m going to say. So let’s get going. So first, because EMR provisions some easy to instances, it is is possible for us to provision a key pair to manage SSH access into these instances. We can also attach Im roles to these two instances. Why? Well, because maybe you want to access S Three using Emrfs for doing the request, or maybe you want to write data back into S Three for whatever reason. So you need to make sure that the im roles have access to S Three. Obviously, if you are accessing DynamoDB and you need to use High for this, then you need to make sure again that your EC two instances are permissioned to access DynamoDB. And again, this im role is crucial to achieve just that. Now, for security groups, you have two kinds. You have the first kind, which is for the master node, and the second kind for the cluster nodes, which represents the core node and the task node. And the reason why we have security groups is to allow node to node communication because say you are running a spark job or a distributed master use job or anything that does require your nodes to talk to one another, then you need to make sure that the network security rules are in place to allow these communications.
So, security groups are extremely important in terms of authentication. We have Kerberos authentication and this is helpful when you want to connect your authentication mechanism to Active Directory. And then finally there is an open source way of managing security on EMR for authorization, which is called Apache Ranger. So it’s RBAC role based access. And you need to set this up on an external EC Two instance and then connect it to your EMR cluster. So that’s not something you can set up directly from EMR. You have to set up externally on EC Two and connect it, set it up with your EMR cluster. I would recommend reading that blog if you want to know more about this EMR security. I think it’s a very interesting blog to read and something that could be very useful going into your exam. Okay, the next for EMR is going to be around EMR encryption and that’s super important to remember everything I’m going to say. So we have apt rest data encryption for EMR FS. So emrfs is when your data lives on Amazon Israel. As such, we need to encrypt the data in Amazon Sray.
So we have three different options. We have SSE S, three SSE, Kms, or even client side encryption. So three kinds, as you can see the four kinds. SSEC is not supported by Emrfs. You can also encrypt the data in your local disks. So using this Emrfs, you have both the encryption happening on Amazon is free and on your local disk. Let’s talk about these local disk in further. So you have different kind of encryption for your local disk. The first one is Open source Edge DFS encryption, which is to rely on whatever encryption has been decried for the Hadoop ecosystem and use that in your Hadoop cluster. But you can also use some more AWS native encryption mechanisms. So if you have an EC two instance store, so that means that your storage is attached physically to your EC two instance, then you can use NVMe Encryption or Lux Encryption.
Okay, remember these two things. And next if you have an EBS volume, so that means that your storage is not physically attached to your EC two instance. That means it’s attached through the network. Then you can have EBS encryption the native encryption using Kms. And the good thing about it is that it works for the root volume of your EC two instance. Very important to remember. But you can also use Lux Encryption and in that case it does not work with Roots. So if you come to an EMR cluster that is backed by EBS and you want to make sure the EBS volume are encrypted, including the root volume, then you have to go with EBS encryption Kms, not Lux Encryption. OK. That’s for at rest. Now for InTransit, we can have encryption between your nodes so that there’s node communications being encrypted using SSL. And you can also make sure that any traffic done from Emrfs so between S three and your cluster nodes is also going to be end to end encrypted using TLS. This diagram can be helpful to remember things and put it into perspective.
And this comes straight out of the documentation of EMR for the data encryption options. So, as I said, you have to remember everything I just said for EMR. This will be some very valuable points going into the exam, so hopefully that’s helpful. The next technology is going to be Elastic Search and so we can deploy it within our VPC and that will provide us with network isolation. And we can also set Elastic Search policies to manage security further, for example, to restrict by IP address or whatever. In terms of data security, we can encrypt the data at Rest using Kms and we get encryption intrinsic using the Https endpoint through SSL for how do we manage the Elastic search service? Well, we can get IAM or Cognito based authentication. And if we use Cognito, well, that allows the end users, for example, to log into Cabana directly through enterprise identity providers such as Microsoft Active Directory using SAML. So a lot of keywords, I know, but at least need to understand how one can log it into Cabana, maybe using Cognito okay, so now we have redshift.
And redshift is also very important to know in terms of security. So the VPC will provide you with network isolation for Redshift, and you can, on top of it, apply cluster security groups to ensure you do have control over which machines can access your redshift cluster. You get encryption in flight because you’re using the GDBC driver enabled with SSL. So you can definitely encrypt your data directly going into Redshift, and you can encrypt that Rest using Kms or an HSM device. And so if you use an HSM device to encrypt that Rest in Redshift, you need to establish a connection between your HSM device and Redshift. All of that is well documented on the AOS documentation. What you need to remember is that Redshift is at one technology that can use either Kms and this way it’s managed by AWS, or we can use a custom HSM device or hardware security module.
It also supports SSE Three using default manage keys. And you can attach Im roles for Redshift, basically allowing you to dump or read data from S Three. So, okay, finally, if you use a copy or an unload command, you can reference the im roles that you have assigned to Redshift. Or alternatively, if you wanted to inherit your own key credentials, you can paste your own access key and secret key directly into your SQL statements. And Redshift will leverage those when reading data from S Three or writing data to S Three. Okay, hope that makes sense. Again, make sure that you really know redshift security very well. Athena. Athena is very easy. Use Im policies to control access to the Athena service itself, and the data will be stored in S Three anyway. So you inherit all the security that comes from S Three. That includes Im policies, bucket policies, and ACLs. You’re going to encrypt data according to S Three senders. So we have SSCs. Three SSD Kms.
And CSU means client side encryption. For Kms. You can get intranet encryption, obviously, because using TLS certificates between Athena and S Three, and you can use a JDBC driver that’s going to be enabled by SSL. Finally, if you need fine grain access, you can use the average glue catalog security itself to restrict who can do what in Athena. All right, that’s it for the Query Technologies. Finally, the visualization technology of QuickSite. Well, you have two kind of editions. You have the Standard Edition, where you can use Im users to log into QuickSite or email based accounts. And for the Enterprise Edition, you can use Active Directory based security to log into QuickSite with a Federated login. It also supports MFA or multifactor authentication. You get encryption at Rest and in Spice as well. So you get encryption at Rest within QuickSite, and within the processor engine.
And finally, if you wanted to get row level security to control which user can see roads, that is now a new feature from QuickSite, which I believe is supported by both editions. So that’s it. For all the technologies of our views and security, could be boring, could be a lot of information. But you do need to know all this information going into the exam. So here it is to you. Feel free to take your time. Revisit them one by one, read some blogs and get familiar with them. All right? I hope you like this. And I will see you in the next lecture.
- STS and Cross Account Access
Now let’s talk about Sts. You may already be familiar with it, but let’s just go of it one more time. Sts stands for Security Token Service. It basically allows to grant limited and temporary access to AWS resources. Basically, with Sts we’ll generate tokens, and these tokens will be valid for up to 1 hour. They must be refreshed, and they will allow to access some AWS services where we will use this. Well, we use this mostly for cross account access. So we basically allow users from one AWS account to access resources in another account.
We also use this for federation, and we’ll see federation in detail in the next lecture. But for example, for Active Directory, we’ll provide a nomads user with temporary AWS access by linking Active Directory credentials. Or we can use SAML to basically do the same. Or we can use single sign on to basically allow users to log into a TOS console without assigning im credentials directly. So Sts is used by many other things. Anytime you have a security code token, that’s temporary Sts is in the play. We can also use Sts for federation with third party providers such as Cognito. And we’ll see all this federation stuff in the next lecture, don’t worry. And it’s used mainly for web and mobile application when you have cognito in play. And it makes basically the use of Facebook, Google, Amazon login to federate all these logins together.
So, SDS, just remember, overall it allows you to get temporary access to aviation resources, and it’s used for cross account access for the federation stuff. Next lecture. Now, for cross account access, how does that work? We’ll define an im role for another account to access, and we define which accounts can assess this IAM role. And then we’ll use Sts to retrieve credentials. And then with these credentials, we’ll be able to impersonate that IAM role. And that’s called the stream role API. Then the credentials can be valid for between 15 minutes to 1 hour.
So as a diagram, what does it look like? We’re a user, and we want to access a role either in the same accounts or in another account. So we want to access it. How do we do this? Well, we’ll do the assume role API on sts. Sts will check the IAM permissions, making sure you can do it, and then it will send you back temporary security credentials. And these security credentials will basically allow you to impersonate that role that you wanted to assume. So that’s it. That’s something to remember. It’s quite easy when you think about it, but it’s good to see it once again. Sts is a service to give you temporary security credentials based on what you can access. So in the next lecture, we’ll see federation, and that will be very interesting.
- Identity Federation
Okay, so let’s talk about identity federation. You may have heard identity federation many, many times in AWS, and to be honest, for me it was quite a cryptic topic, it’s really hard to understand. So I’m doing my best here to explain to you how entity federation works, what the SAML incognito, how it is integrated with all these things. So let’s take it step by step and hopefully you’ll have a clearer view of it after this lecture. So federation means that we let users that are outside of AWS to assume temporary roles to access our AWS resources. What does that mean? That means that our users don’t need to have a user in AWS to access AWS. How does this dark magic works? Well, basically the users will assume an identity provided access role. So that’s a lot of information. Ready? Let’s go through a diagram to understand better how this works. So we are a user and we’re in our company, or we are a mobile app user, whatever it is, we’re a user and we don’t have an account in AWS. But what we do have is access to third party servers for login. It could be our companies, it could be whatever. And this third party is trusted by AWS. So we have defined preempt beforehand a trust between the third party and AWS. What happens is that our users will connect to this third party and through some complicated process that we’ll see in a second, the third party will give back credentials.
There will be temporary to our user, to us. Then as a user, what can we do with these credentials when we can directly access AWS through the console or the API? So this is how federation works, identity federation, that’s because the identity is stored somewhere else, it’s stored on a third party. If you understand this, you’ve basically understood identity federation. Now, identity federation, what is this third party authentication that we’re talking about? It could be LDAP, it could be Microsoft Active Directory, which is also similar to being Sam’l. Sam’s a standard, but Active Directory is an implementation of it. It could be single sign on, it could be open ID, it could be Cognito. So all these things can be third party authentication. And so using federation, the thing you have to remember is that we don’t need to create individual IAM users. The user management is done outside of AWS. So for the exam, they may ask you about some very specific form of identity federation, namely SAML, custom broker and cognito.
So we’re going to see these three in details right now. The first one is SAML federation, and that’s for enterprises. Basically, if you’re a large enterprise, you most likely have a Microsoft Active Directory or you have something SAML 2. 0 compliance where you already manage your users through this and you basically want to integrate this with AWS. What this gives you is that now all your users ultimately have access to the alias console or the CLI through the temporary credentials. So you don’t need to create a new IAM users for each of your employees, which is quite nice. So how does it look like? This is a diagram. Now, this may be a very complicated diagram, it comes straight from the AOS documentation, but it is very clear. So let’s walk through it one step at a time to understand what happens. We are the client app and we are within our organization. So a large organization.
What we’ll do is that we’ll go to the identity provider IDP, which is Sam’l compliance. So it could be Microsoft Active Directory and it will authenticate the user based on the user database. Then when we are authenticated to this IDP, the IDP will send back a SAML assertion. It’s basically a token. A SAML assertion is basically a token. Now we get some SAML assertion and what we’ll do is that automatically we’ll call assume role with SAML to Sts, which is a special API on Sts. And Sts recognizes this SAML assertion and we’ll give us back we’ll trade this SAML assertion for temporary security credentials. So now we’ve basically logged in, got a SAML assertion, traded that SAML assertion from SDS to security credentials. And now with the security credentials we can for example access AWS just normally. For example an S three bucket that’s if you want to have a CLI based access, if you want to have a console based access, there is also a nice diagram by AWS. The idea is the exact same from our browser. We’ll access the portal of an identity provider.
So it’s like a web based thing. We’ll get authenticated same thing. Then the IDP will return a SAML assertion and now we can use that SAML assertion directly to sign in into the AWS SSO endpoints, which behind the scenes talks to Sts. And then once it’s all validated, once your SAML is traded for SDS, then we’ll validate the whole thing and you get redirected to the AWS management console right here. So the idea is the same, right? If you look again, we trade our identity internally for some AWS credentials and basically there is just a bunch of back and forth that happens. So if you understood this, then you understood SAML Federation. It’s actually not that complicated.
Now, if you don’t have assemble 2. 0 way of identifying your users, then you need to use custom identity broker. So that’s only if you don’t have assemble 2. 0 and that’s a bit more complicated. Basically you have to program what’s called an identity broker and it’s an application and you have to program it. And this identity broker must be used to determine the appropriate Im policy you apply. So what’s the change where here it is again. We have our users browser application, we’ll access our identity broker and the identity broker is something that we have to program. Okay, this whole thing in the green dot green circle is something that we have to program and the identity broker will validate our identity with maybe a corporate identity store authenticated. And then if it’s happy, it has superpowers, it has super user superpowers.
And you can ask from SDS any security credentials for any policy. So it’s up to the identity broker to really tailor a policy just for the user that was connected. So it’s a bit more work to do. And that’s why it’s called custom Identity Broker application. So we have to create this. It goes to Sts, makes a request for security credentials, the security credentials come back, they’re given to our users, and then they can either access AWS with the APIs or get redirected to the AOS Management console and we can access AWS. So it is the exact same principles as SAML, but it’s not SAML. And therefore we have to do a lot more manual work. We have to do a lot more work to implement that identity broker. So if you see custom identity broker, that means enterprises but not SAML 20. Whereas before if you see SAML 20, that means identity Federation directly integrated with AWS. And finally, this was for our corporate users. So anytime you are basically sitting at a desk at your corporation but what if you have an app and an app, your users need to basically put files into an s three bucket. How do we do this? Do we create a user per app? No, that doesn’t sound scalable.
Right? So the goal is to provide AWS resources access directly to our users of our app. How do we do this? Well, we log in through a Federated identity provider or we can remain anonymous. Then we get AWS credentials back from what’s called a Federated identity pool and that comes straight from Cognito. And then these credentials will come with predefined IAM policy that basically allow users to do what they need to do. Don’t worry, there’s a graph. So an example of it is anytime you want to provide temporary access to write your S three buckets, maybe using a Facebook plugin, you should use AWS, Cognito and Federated Identity Polls. Note there is something in the documentation called Web Identity Federation. You may have heard of it before and it is an alternative to Cognito. But now in the documentation, AWS recommends against it and it says you should just use Cognito because it does the exact same thing. So for this reason I’m not teaching Web Identity Federation because it’s not at the exam anymore.
Cognito is going to be the way to have public applications access AWS resources now concurrently. How does that work? We have our app and our app is directly connected to our identity provider. It could be congrato user pool, Google, Facebook, Twitter, Sample, open ID, whatever you want, right? But it’s an app on the wild, it’s public. So our app logs in to our identity provider and gets a token back from it. Then the app will talk to the Federated identity Provider Incognito. And will basically trade in that token that will be verified by the identity provider. And then the identity provider will get credentials from Sts. And then same pattern as before the identity, the Federated identity on Cognito will send us back temporary Aus credentials. And now, using these credentials, we can directly talk to our S three bucket and make some calls and see if we’re authorized to do what we need to do. So that’s the idea here. The difference is that now we connect to public identity providers such as Cognito, user Pool, Google and Facebook. But the idea is exactly the same.
We trade in a token we retrieve from a third party to a service on AWS to get back some temporary AWS credentials. So if you understood this, you understood Federation. Take your time, review the graphs, look at the documentation, review this lecture. I promise it will make sense after a little bit. Just write it down on a piece of paper. It’s quite hard sometimes to understand how these things are orchestrated. It takes a lot of time for me to get to understand this, but once you understand it, it makes total sense. And any question at the exam that talks about Federation, you will not be afraid of it. You will be embracing it and happy that you know the answer right away. So I hope you enjoyed this lecture. I will see you in the next one.