Amazon AWS SysOps – S3 Storage and Data Management – For SysOps (incl Glacier, Athena & Snowball) Part 7
- Storage Gateway for S3 – Hands On
Okay, so just a quick hands-on with storage gateway. We’re actually not going to set up one, but I want to show you how it works in behind the scenes and just so you can see how you can create one. So you would get started and then you would have an option to choose the gateway type. You could choose a file gateway, a volume gateway or a tape gateway.
So the idea I wanted to show you here is that file is to store a file as objects in Amazon s Three. And there’s a local cache for loaded and see access. Volume gateway will be for block storage in Amazon s three with point in time backups as EBS snapshots. And you can choose either a caged volume or a stored volume.
So this is exactly what I showed you before. And then tape gateway to back up your data to Amazon is three English here using your existing tape based processes. And so for each of these, basically, if you were to create a file gateway, you click on Next and then you need to select a host platform where you would actually download an image and run it on premise.
Or you can even use EC Two. And then there is some set up instructions for EC Two. We won’t do it right now, but the idea is that once you do all these things, you note the IP address of the new instance, you click on Next and you write the IP address right here.
If everything is connected correctly, if it’s all configured and you click on Connect to gateway, then you’ll have to activate it and then configure local disk and you’ll be done. But we won’t do it right now. All I wanted to show you is that we could have three options. File gateway, volume gateway could be cased or stored. And finally tape gateway. Okay, that’s it for this quick hands on. I will see you in the next lecture.
- Athena Overview
So let’s talk about Athena to conclude this section. Athena is awesome. To me, it is really, really, really cool. It’s a server less service and you can perform analytics directly against S Three files. So usually you have to load your files from S Three into a database such as Redshift and do queries there or something. But with Athena, you leave your files in S Three and you do queries directly against them. For this, you can use the SQL language to query the files which everyone knows, and it even has a JCBC or ODBC driver. If you wanted to connect your Bi tools to it, you get only chart per query and for the amount of data scanned. So you can go really, really crazy. And you can just get billed for what you are actually using. It supports many, many different types of file formats such as CSV, JSON ORC Avro Park. And in the back end, it basically runs Presto.
Presto, if you know, is a query engine. So the use cases for Athena are so many. But you can do Bi analytics reporting. You can analyze and query VPC flow logs, ELB logs, cloud trails, trails, S Three access logs, cloud front logs, all these things. So in the exam, they will ask you, hey, how can we analyze data directly on S Three?
How can we analyze our ELB logs? How can we analyze our VPC flow logs? Well, the answer is use Athena. So that’s it. That’s all you need to know. We’re actually going to do one hands on just to get some practice with Athena and see how it works. But for the exam, it’s really, really straightforward. Anytime you need to analyze data directly on S Three, usually the logs or ELB logs, et cetera, you would use Athena. That’s it. I will see you in the next lecture.
- Athena Hands On
Okay, so let’s have a play with Athena. So Athena is to query data in S Three without loading it into a database directly using the sequel query language. So I’m going to get started and see how we can get set up. So what we want to do is set up a query in Athena onto one of our buckets in our S Three buckets. And the bucket I want to run a query on is going to be my S Three access logs. So I have created a bucket that contains all my sere access logs that we enabled from before and it’s named S three access logs stefan Two.
Okay, now I’m going to go to Athena and we are welcome with the screen. And as it said, it says before you run your first query, you need to set up a query result location in Amazon S Three. So this is something you can click on or if you go to Settings, you can also set the query result location like this. But I’m going to use directly the prompt in here. So I’m going to set this up and say, okay, I need to create my query result location. So I’m going to call this AWS Stefan Athena results. And here we’re in Frankfurt. So I’m going to say Frankfurt. Okay. And finally, do we want to encrypt the query results, yes or no? And do we want autocomplete as an option? And for now we won’t give anything but we could have encryption and autocomplete if you wanted to. I’m just giving them unchecked and we’re just going to go and click on Save.
So now my query results have been saved in here and so we can go ahead and start typing some queries. So let me go ahead and first refresh this page so we have the same screen. Okay, so now the page is refreshed and I can go ahead and type my first query. So what I’ll do is that I’ll go into my files in my code and Athena S Three XS logs and the first thing you have to do is to create a database. So by default we’re working on a default database, but I want to create a specific database for these queries. So I’m testing the entire command create database S Three access logs dB and I click on Run query.
Or you can do CTRL Enter to run the query as well. If you don’t see the Run query button, make sure to refresh this page and it will appear automatically. So now in my database I have access to default and S Three access logs dB that has been created. And next what I’m going to do is to copy an entire statement to create a table named my bucket logs. And this entire statement was taken from this URL. So if I go onto the web and open up this URL, then what I see is how do I ignore my Amazon’s free server access logging Athena. And the answer is use this entire query.
So that’s what I’m going to do. I’m just going to go ahead in here and copy this entire query here we go all the way to location into my Athena console. And the one thing I have to do is to replace the location. So the location here and you have to replace it two is right now S three target bucket name slash prefix. And for me, all my S three access logs are in this bucket and there is no prefix. So I’m just going to change this to S three access logs. Stephan two and I had a slash at the end. Okay, so I’m going to run the query and hopefully the query is successful.
So once the query has been run on the left hand side on my tables, I see there is my bucket logs that has been created and if I click on the arrow here, I can see all the different columns that have been defined for this table. So as you can see, there are a lot of different columns and all these columns have also been defined right here. So that makes sense. And what we can do is start visualizing what data there is in this table. So I’m going to click here on the three dots and then click on Preview table and this will start a new query called New Query Two which is select star from here is the name of my database, limit Ten. And so what this will do is that it will display the first ten rows from my data in S Three. So as we can see here, the really cool thing that happened with Athena is that the data never left Amazon S three.
It is in this bucket, but it is being queried by Athena directly onto my S Three bucket. So if we look at the results, we can see there is a bucket owner, the name of the bucket, the request date, time, the remote IP, the requester, request ID operations, operation, etc. And the request key URL So all this kind of information we get from before, you can just visualize it here. As you can see there are tons of columns in this table, but we get some good information. So what can we do with it? Well, I guess we can do some more interesting queries that I wrote for this example. So I’m going to go to New Query Three and then I’m going to go to my file and in here I have access to two queries that I wrote. So let’s copy this first one and see how we go with it.
So I’m having select request we are I operation Http status count star from this my buckets log and then we group by it. So that means that if we run the query, we’re going to get some results around which Http status gave us some Http code. So we have for example, a post request of status 200. We had two of those, a head request with status 400, we have 20 of those, a get of 404, we had 91 of those. So maybe get 404 is not good because that means not found. So maybe we want to go ahead and investigate these 404 further. Or if we scroll down, we can see get 403 was six. So that means six times a file was requested, but the access was denied.
So maybe we want to have a look at these files as well, so we can do some also very simple operations. We can just do select count star. So I’m just going to go back to the first query. Now, I’ll do select count star from this and this will give me how many access logs row I have. So I run the query and in here it’s saying that I have and we have to wait maybe a little bit 331 entries on my S Three access log. So maybe you get something different, obviously, but this gives you a cool SQL query language on top of your database in S three. And then finally we can do investigate these 403 errors we’ve been getting. So as you can see, there was six and 2403 errors.
So now if I go into a new query and paste this, I’m going to see the list of these eight denied queries and how they happened. So we can see we have eight rows in here and we can see the bucket, the request, date, time. So when this happened we can very importantly see the key that was requested. So favicon ICO beach, JPEG and so on. We can also have a look at more information such as who was the referral and so on. So we get a lot of good information to analyze these S Three access logs. And so that’s the whole power of Athena in here. By just creating a table on top of our data that sits in S three, we were able to get a lot of information without provisioning, for example an RDS database or a Redshift database or anything to do analytics on top of our data.
So this is what makes Athena a very powerful server less query engine. If you want to travel, shoot. If you want to get analyze your logs and so on. So you can play around. For example, if you go in S Three and you type logs in the search bar. So I’ve created a few logs buckets. So if I click on logs, I get my cloud front logs, my ELB logs, my S Three logs, my VPC flow logs and so on.
So if you type, for example, Athena analyze ELB logs, then you get directly a documentation page, for example, this one to say how to use Athena to query application logs. And you scroll down and it shows you how to create the table and finally, the entire table is here. And finally, the sample queries for your albugs and so on. So have a play with it. It really shows you the power of Athena. And then that’s it. You understand what Athena is used for. And hopefully that was helpful. I will see you in the next lecture.
- Section Cleanup
Okay, so now we can clean up this entire section. So you could delete the database, but you actually don’t pay anything as long as you don’t run any queries. But if you wanted to drop this database, you would do drop database S Three underscore access underscore logs underscore DB you will run the query and then it will delete entirely database, although it’s not handy. So first you have to delete the table and then you would run this query to actually delete the database.
So let’s go do it again. Delete drop database S three underscore access underscore logs underscore TB. Run the query. Here we go. We’re done. Now, for the S Three management console, you could go ahead and delete as many buckets as you want. Make sure you delete the ones you’ve created, not the ones that are created for you by beanstalk or something else.
But you could go ahead and delete some of the bucket you wanted. As remember, to delete a bucket, you first need to remove all the files within it, as well as the ones that are versioned. So you could do this as well. And finally you could go to select Cloud Fronts and delete the distribution. So for this you go to Cloud Fronts and you find distribution and you do disable to disable it first and then you delete it to delete Elite second. So I’m just not going to do it because I want to keep it just in case I have to record something on top of it. But this is what you would do to clean up. And that’s it. That’s all you have to do. This section shouldn’t cost you anything. And hopefully you’ve learned a lot in this S Three section. And I will see you in the next one.