SAP-C02 Amazon AWS Certified Solutions Architect Professional – New Domain 5 – Continuous Improvement for Existing Solutions part 5
- AWS Athena
Hey everyone and welcome back. In today’s video, we will be discussing about the AWS Athena Service. So Athena again is a pretty new service and it’s a very interesting service not only for the developers but also for the security engineer folks. So let’s go ahead and understand what Athena Service is all about now in a very high level will overview AWS Athena is a service that allows us to analyze various log files from S Three using the standard SQL statements. So we’ll understand this with an example. So let’s assume that we have other cloud trail logs which are stored in AWS S Three and you want to perform certain kind of analytics on top of those cloud trail logs. Let’s assume that you want to see who logged in to your AWS account in the past three days.
Now, in order to perform certain kind of analytics, cloud Watch is one of the solutions. However, you’ll not be able to really run complex queries in Cloud Watch. So in order to achieve those use cases, ideally what organizations do is they have a log monitoring server like Elk stack like Splunk or various other log monitoring solutions. So in order for log monitoring solutions, generally what organizations have is that they create easy two instances. They deploy monitoring stack like Splunk or Elasticsearch Cabana or various other vendors. They add the data source from S Three so that they can import the cloud trail logs to those monitoring solutions. And once those cloud trail logs are imported, then the organization can begin writing certain queries in those monitoring tools.
So this is definitely one of the approaches, but it takes a lot of amount of time and it would also lead to certain kind of complexity as well as the infrastructure cost. So what AWS Athena allows us to do is instead of doing all of these steps, what you can do is you can query the S Three. So you have to put the S Three bucket, you decide a table format and you directly run the query and that’s it. You don’t really have to do all of these things. Now, initially it might look a little complex or confusing so let’s jump into the practical and look into Athena in a practical way. So I’ll go to the Athena from my AWS console. So this is Athena and Athena is now supporting views. Also, this is one of the recent features it actually released last week.
So we just discuss on the things that is important for our basic understanding as a security engineer as well as the exam. So we had discussed about the cloud trail example. So let me open up the cloud trail real quick and look into where exactly the cloud trail logs are being stored. So if I’ll go into trails, the S Three bucket where the cloud trail logs have been stored is a packed Hyphen trail. So let’s quickly verify whether the logs are being stored there because from Athena we will be querying for the logs which will be storing this cloud trail specific information. So I’ll type packed hyphen trail and within this, if you will see I have the cloud trail log. Let’s do a US east one and you see I have a lot of cloud trail logs which are stored in this bucket. Perfect. Now this is the first information that is needed.
The second information that will be needed is the query specific structure language. So a cloud trail has a specific structure. So if you just open up any of the log files and if you look into the event, the event has a specific structure like event version then within user identity you have Type, you have principal ID, you have ARN and various others. So this structure is the first thing that needs to be put in the AWS Athena. So I already have a structure document and if you’ll see over here the event version is of type string then within user identity you have Struct where you have type is equal to string, principal ID is string and various others. So if you’ll see user identity ARN has type principal ID, ARN account ID and various others.
And this actually directly references to the log format. You have user identity RN which has type principal ID ARN. So this is basically the structure of the logs that Athena can really collaborate into. Now, if you go look into the last line here we are specifying the location of the cloud trail logs. So let’s quickly copy this format and what I’ll do, I’ll paste it in the Athena. All right, now the last part, as we have discussed is the location. So this is one of the parts that you will have to change. So the cloud trail bucket name is packed Hyphen trail, the AWS log is the same and second is the account ID. So if you see packed hyphen trail AWS logs and then you have to put the account ID.
So I’ll be using the account ID. Let me quickly copy this up. So in order to copy it, I’ll just quickly select it, I’ll copy this account number and I’ll paste it within my query. So this is the structure and we are also specifying the location where my cloud trail logs have been stored. I’ll go ahead and click on run a query. So if you will see the query has been successful and it has created a new table called as Cloud trail underscore logs and within this table you have all of these log specific parameters. Perfect. So this is the first part that we are interested in. So now that you have the table which is created, the next thing that we need to do is we can go ahead and we can run various type of queries. So I have a sample query which is present over here.
Let’s copy this and I’ll paste it over here. So what this query is doing is that it is a select statement and what we are selecting, we are selecting User Identity ARN. So if you look into the Cloud Trail log, let me click on the event you are selecting User Identity ARN. So this would mean this specific field. So this would return the ARN of the principal for which the Cloud Trail event has been generated. So this is the first field that we are selecting. The second field that we are selecting is the event name, the third is Source IP Address. The fourth is event time. All of these are referenced to the Cloud Trail event. If you see you have an event name, you have the event time as well as you have the Source IP address.
So these are the things that we are querying for from this specific select statement and from where we are querying, we are querying from the Cloud Trail underscore logs table. So this is the logs table and we are limiting the result of up to 100 output. So let’s do one thing, let’s go ahead and run this query. So if you see the query has been running, so it will take a little amount of time to be executed. Perfect. So it has been executed and for whatever relevant fields, if you’ll see I have ARN. So first is the user identity. ARN. Second is the event name. The third is the source IP address. This is the source IP address field. The fourth is the event time. And you can see that I am getting a lot of events specific to.
So if you see this is the ARN. So someone from the root account has done this get trail status. The IP address from which this event has been occurring is one one 5930 dot 52 and this is the event time. Now, one of the great things that you would have seen is we did not really have to create any infrastructure, we did not really have to pull the logs from Cloud Trail event. All of these things are automatically taken care from the AWS Athena. And this is the real magic of the Athena. So if you have a huge amount of logs which has been stored in S three and you do not really want to have that complex setup of installing Configuring log monitoring server, all you have to do is specify the S three bucket name and run the SQL query statements and all the magic will be done by the AWS santhina service.
So this is the high level overview of AWS Athena. So one thing that I want to share before we conclude this lecture is one of the real world use case where in one of the organizations which I had recently joined, that organization had received a huge amount of spike in traffic due to which a lot of production systems went down and basically the entire application was down. Now, the question that came was whether that huge spike was genuine or whether it was part of an attack. So, since that organization did not really have a great infrastructure or did not really have any log monitoring solution, we decided to use AWS Athena to query the VPC flow log for certain kind of information. So if you know, VPC flow logs are also can be stored in s three.
And what are the information that we queried? We basically requested for the number of accepted as well as rejected logs 1 hour before the spy cocker. We also queried for number of accepted and rejected logs 1 hour after or I would say during the time the spike occurred. We also queried for most number of IP addresses from which reject log had occurred and to which elastic network interface the highest pike had been occurred to. And from these we came to know that there were around five specific IP addresses which were trying to do an Http based attack to make the website down. And after the information that we received from the Athena, it actually took us around ten minutes to get this information and we decided to block those IP addresses from the network ACL as part of the initial blocking investigation. So this is one of the real world use cases, and there are a number of possibilities that can be achieved with the help of.
- Understanding Federation – Part 01
Hi everyone and welcome back to the Knowledge Portal video series. So, in the previous lecture we spoke about delegation and how a user from one account can do a zoom role to the second AWS account. So that is what delegation was all about. Today we look into very similar concept calls federation. Again, this is a very, very important, important concept. And federation is generally used in most of the enterprises based companies. So let’s understand what federation is with a very simple use case. So let’s assume that there are 500 users within an organization and your organization are using three services. So when you have Jenkins, jenkins is generally used for CI CD. Second is AWS, and the third is HR management system. Now, as a solutions architect and a system administrator, you have been assigned a role to give users access to all of these three services.
So now the question is there are 500 users and you need to give access for the 500 users to all of these three services. So how will you go ahead and approach this particular use case? Now, one very simple way of doing is you add 500 users, first in Jenkins, then you add 500 users in AWS via IAM, and then you add 500 users in HR Management system. So this is one way. Now, the problem with this kind of approach is that let’s assume that tomorrow ten more users join in. So what you have to do is you have to add ten users here in Jenkins, then AWS, and then in HR Management, which is very clumsy, it takes a lot of time and it is not an idle way of doing this. So this is again a hard work. And for lazy people out there like me, this is not a very ideal approach.
So what we can do over here is we can have a central directory. So here we have a LDAP. In LDAP all the users will be stored. So administrator has to store the user only in LDAP. Once the users are stored in LDAP, depending upon what are the filtered ted settings that you have done, user will be able to log into Jenkins, AWS and HR Management System directly. So no need to create user in Jenkins or AWS or HR Management system, just create a user in LDAP, establish a trust relationship between LDAP and all three services, and then user will be able to seamlessly connect to all of these three services.And if you see, it becomes very simple. So now let’s say that tomorrow ten users come to your organization. All you have to do is you have to add all those users in the LDAP directory and it will do the job for you.
So this is a very simple way and most efficient way of doing this because generally, if you look into a critical organization, like an organization who are following PCI DSS standards, you need to immediately revoke the access once they leave the organization, like once they are either resigned or terminated. So again, going to each and every service and disabling the account will take time. Instead of that, if you are using LDAP, all you have to do is just disable the user in the LDAP and that user will not be able to log in anywhere. So again, very simple and efficient for using. And this is one of the reasons why most of the enterprises they use LDAP or something similar. So there are various solutions which can be used for storing central users.
Like Microsoft Active Directory is something which is very famous in most of the organization. And as far as Linux is concerned, you have Red Hat Identity Management or free IPA which does the similar things. So let me do one thing, let me actually show you the demo of free IPA on how exactly it looks like. So I am logged in over here, this is a free IPA and if you see there are a lot of users which are present over here. So if you go to the different screen, let me actually I think it got logged out. Let me log in once. So generally all you have to do is you have to store the user in this identity store and then you do a federation from year to your AWS. So we have lots of users which are present over here. You can do something like within the user group you can create a user group call as Production.
So this is very similar to what we did in the delegation. So let’s create a user group called as production. And once you create a user group production, you add the users within this particular group. So let me just add some random users LDAP user five, LDAP user nine and manager I’ll add over here. Okay, so these are the three users which are created now in the AWS site you can create a role which does the federation and you can establish a trust relationship. Like only the users which are present within the production group can log in to the production role which is present within your AWS account. So this is one of the basic on how you can go ahead with federation.
- Understanding Federation – Part 02
Hi everyone and welcome back to the second lecture of federation. So let’s go ahead and understand federation in more detail. So talking about a definition of federation, federation basically allows external identities to have a secure access in your AWS account without having to create any Im users. So this, this is a very simple definition. Now when we speak about delegation, in the previous lecture we had the users which were present in the IAM and from IAM they used to do a zoom role to the second account. Now federation is little bit different where the users are not present even in the IAM, so they are present in the external entities. Something like an identity server like Ad or IPA that we discussed or even it can be a web identity provider which can be Facebook, Google, Amazon cognito or even open ID based compatible provider.
So generally in many websites if you must have seen they have an option like sign in with Facebook, sign in with Google or sign in with say Twitter or LinkedIn. So that is basically a web identity provider federation. So you don’t really have to sign up there. All you have to do is you sign in with your Gmail account and you will be logged into the website. So that is something like a federation in a very simple terms. Now let me actually show you that example because there is a very nice website I would say which is security tube. So if I open security tube again a great thanks to Vivek Ramchandran, he has done amazing work in field of open source based visual education.
So if you look into the security tube there is an option call as log in with Google. So once I log in with Google it actually asks me for the username and password. So once I log in with the username and password I will be logged into the website. So this is a very simple way of federation as far as the web identity provider is concerned. So this is what web identity provider and again the first option is Ad or IPA. So let’s understand on how federation works. So you have an active directory over here or you can consider as IP and there are a lot of users which are present over here. And on the second side you have an Amazon web services. Now what you want is you want a user within your active directory to log in to your AWS. So what you need, you need is some kind of a middleman.
So this middleman is called as an identity broker. So identity broker is an intermediate service which connects multiple providers which can be in our case this is the identity provider active directory and this is the service provider AWS. So the identity broker will allow the user from the identity provider to log into the service provider. So this is what identity broker really means. So let’s understand the steps behind how exactly it will work. So you have an active directory, you have the users and you have the service provider which is AWS. So the very first thing the user will sign in to the identity broker page. So there will be some kind of a login form where the user will put the username and password. Now the identity broker will verify if the username and password is correct from the active directory.
So the validation of the authentication will be done with the active directory. So once the active directory says okay, the username and passwords are correct, then the identity broker will call the Sts service in the AWS. Now, Sts service will respond back with the authentication response which will contain the access key, secret key, the token and the duration. So those four things will be important for the user to connect. Now, once the SPS will send the AR, which is the authentication response to the identity broker over here, identity broker will forward that response to the user and user will be able to sign in with those credentials to the AWS. So again, once the authentication response is received by the identity broker, user don’t have to put the username and password again, they will automatically be able to log in in the AWS account.
So again, this is a very simple step, let’s revise it again because this is extremely important in terms of exam. So first user enters the username and password to the identity broker. Identity broker will validate it from the active directory of elda. Once it is validated and authenticated, the identity broker will call the Sts service in the Amazon. Sts service will respond back with the authentication response and with that authentication response the user will be able to directly log into the AWS account. Now again, one important thing to remember over here is that there is a direct trust between the identity broker and AWS. So that trust relationship is already there between identity broker and AWS. So these are the steps that you need to remember.
So user logs in with the username and password. Credentials are given to the identity broker. Identity broker validates it against the Ad or LDAP. If credentials are valid, broker will contact the Sts token service from Amazon and Sts will share the four things which are access key, secret, key, token and duration. And now user will be able to do various things with the help of these four authentication response that is received. So there are three important notation to remember. First is identities which are users. So the users can be part of LDAP or user can be part of Facebook or even Gmail or any open connect based provider. Second is identity broker which is the middleware or the middle person that takes the user from point A, which can be LDAP and connect them to point B which can be AWS.
So that is the identity broker and the third is identity stored. Identity store is the place where users are stored. It can be Ad, it can be IP, it can be Facebook or any other providers. Now again learning. On how to implement the identity broker or LTAP is out of the syllabus. However, if you really want to see on how this works, I will attach a video which will precisely explain on how this exactly process works. In terms of practical session so this is the basic about federation again I will really encourage you to remember all of these steps you should have all of these steps on back of your mind and in the notes section I will give you a reference to the video which will show you the practical demonstration on how this works.