Amazon AWS SysOps – S3 Storage and Data Management – For SysOps (incl Glacier, Athena & Snowball) Part 6
- Glacier Vault Lock – Hands On
So let’s have a play with vault locks. By going into the Glacier console and in there, I’m able to go ahead and create a vault. So as you can see, Glacier is used for creating vaults, setting data retrieval policies and sending event notifications. So let’s create a vault and I’m going to create in my region close to me, I’ll call it my Demo Vault. Click on next step. And here we can set up notifications in case some jobs complete. So basically when a retrieval job is completed, we can receive an SMS notification. I will not enable it right now, we’ll set it later and then review. Everything looks good, I’ll submit it. And here is my first glacier vault. So basically I have just like a bucket, but it’s called Vault in Glacier.
And so if I click on this Vault, I can get some information, I can get some information around the details of this vault, the notification that I set if I wanted to get information, the permissions. So we can set a Vault Access Policy document I just mentioned to basically say who can do what on this vault, just like a S three bucket policy, really. And then a vault lock. And the vault lock basically allows you to create, edits and view details and to create a lock policy. So this lock policy gives you compliance.
So if you create a lock policy and we can click here to see how to write a lock policy, we could set a specific kind of lock policy. For example, this one is to deny delete permissions for Archives less than 365 days old. Or the second one would be to deny permission based on the tag. So you have lots of different ways to do Lux, but the idea is that using Lux you are able to get strong requirements on how your data is going to be in Glacier. The thing to know is that once you set a Luck, you cannot change it. So I will initiate Vote Luck, which says that I need to match my ARN, obviously. So let’s go to Glacier and find my ARN. So here’s my Lock and here’s my ARN. I’m going to copy this right here. So once I set my Vault Policy Lock, then it will not be able to be changed ever.
So I will click on initiate vault lock. And here I get a vault ID. So I need to absolutely copy this. You cannot lose this. And it’s saying we have 24 hours to validate this policy and complete the lock Process, after which your Lock ID will expire and your in progress policy will be deleted. So now I have 24 hours basically to complete the lock process. So let’s close this. And so here I have the option to either delete my Vault Lock or I have 23 hours to complete it. So I want to complete it. So I click on complete vault lock and I copy. I paste the lock idea I just got from before. And if you didn’t copy it, then you have to redo everything. So delete the vault lock and recreate it.
Then I acknowledge the fact that my vault lock, once it’s configured, I will not be able to change it. Ever. It’s irreversible. That’s why it’s so strong from a regulatory perspective. And I click on complete vault lock. And here we go. My vault lock policy is now locked and so will never, ever, ever be able to delete an archive that is less than 365 days. Ever. I cannot change this. And so this is why vault lock policies are so important and you need to know them. Coming into the exam, the last thing to know is that through this UI, you are not able to upload files directly to Glacier. You would have to use the SDK or the CLI or something like this. I won’t show you. But the idea is that you don’t get a full UI like you do for S three here. We have to use the API if we wanted to upload file into this demo vault.
So that’s it for Glacier. Just remember how we created a vault, how we lucked it using some kind of luck policy, and how you could set another luck policy here for access to this vault. And I will see you in the next lecture.
- Snowball Overview
Okay. Let’s talk about snowball. So it’s a kind of a fun name, but it’s basically a huge box and that allows you to basically physically transport data in and out of AWS. And we’re talking terabytes or petabytes of data. It’s basically an alternative to moving data over the network and you pay network fees. So if you have a lot of data, if it’s huge, and you need to transfer it, for example, from Onpremise all the way to the Amazon cloud, maybe sometimes it’s better to actually use this giant box called Snowball, send it to AWS and load it this way so it is secure. It’s temper resistant. There is Kms encryption on it, and there’s a tracking using SNS and text messages. And there’s an E Ink shipping label. So it’s really well made and you’re going to pay per data transfer job.
So the use case of Snowball would be to do large cloud migrations or decommission a DC or do disaster recovery. And basically the idea that when should you use Snowball and when should you not use Snowball? Well, if it takes more than a week to transfer over the network, you’re probably better off using a Snowball device. So how does that work? It’s kind of funky in a cloud era to just ask for a physical device. But here we go. You’re going to request a Snowball device from Amazon console and it’s going to be delivered to you. Then you install the Snowball onto your servers. You connect it to your servers and then you copy the file using the Snowball client. Then you ship back the device. When you’re done, it goes right away to the right AWS facility thanks to the EINC shipping label. And the data will be loaded for you into an S three bucket.
And then the Snowball will be completely wiped so that no one can access your data, obviously. And all the tracking along the side is done, obviously using SNS text messages and the AWS console. So for you to visualize it, you can directly upload to S three, maybe over the Internet. Maybe you have a ten gigabyte per second broadband, but maybe it’s not enough. Maybe you have petabytes of data. And so using Snowball, you order a Snowball, it comes to you, you load it using your servers directly into the Snowball box. You ship the Snowball box, you have the import export feature directly done by AWS and your data will be in an Amazon S Three buckets.
So, pretty cool, but kind of a new idea, isn’t it? Now Snowball Edge is a snowball, but it’s improved now it has computational capability to the device. So you can have 100 terabytes of capacity and you can either have its storage optimized, so you get 24 vCPUs or compute optimized and you get 52 vCPUs and maybe a GPU. The idea is that you can load a custom EC two AMI on it and Lambda functions and so you can support basically perform computations while your data is moving, which is quite cool. So now on top of moving alongside on a truck or something, your snowball device will actually perform computations for you and save you time.
So it’s quite useful if you want to pre process the data while the thing is moving. And the use case would be data migration, image collection, IoT capture or machine learning. And so as if it wasn’t enough, say you have more than 100 terabytes and maybe you have petabytes of data, then there is a snowmobile and it’s seriously a truck, it’s actual truck. So you can transfer Exabyte’s of data with it. So one exabytes equals 1000 petabytes equals 1 million terabytes. So it’s not for everyone, obviously. And each snowmobile itself will have 100 petabytes of capacity. And you can have multiple of these trucks come into your facility in parallel if you wanted to load this much data. So it’s usually better than snowball if you transform more than ten petabytes of data at a time. But obviously now you need to have a truck on your facility and load the data into there.
So that’s it for all the snow something on AWS. So snowball and snowmobile. Basically the idea is that as soon as you transfer a very large amount of data you should use a snowball. And if it’s an insanely high amount of data, then probably a snowmobile. Finally, let’s view a solution architecture in which how do we get data from snowball into Glacier.
So something you should know is that snowball cannot import data into Amazon Glacier directly. What you have to do instead is to use Amazon S Three first and then you’re going to use an S Three Lifecycle policy to transition that data directly and immediately into Glacier. So snowball, as I said, will import data into Amazon’s Three because Amazon is three is the only place where snowball can drop its data. And then using an S Three Lifecycle policy, we will migrate that data into Amazon Glacier and effectively we’ll have made snowball do put data into Amazon Glacier. But that involves an extra step into Amazon is free in the middle and you have to remember this. So that’s it for this lecture on snowball. I hope you liked it and I will see you in the next lecture.
- Snowball Hands On
So to use snowball, well you just type in Snowball and we’re not going to order one, but I’m just going to show you how it works. So there’s literally a snowball page and you can create a job. And so the idea is that you say, okay, what do I want to do? Do I want to import into AWS Export or just do local and compute storage only. So I’ll just say, okay, I want to import data into AWS. So it’s going to say, okay, we’re going to deliver to you a snowball and then you’re going to copy the data to the snowball and AWS will move to data. And so there is a truck involved to move that little snowball. All right, so we’ll click on Next and then you have to add a address so you could add whatever address you want. So I’ll just add some random information right here just so we can go quick into a country, say Austria. And the phone number is going to be something random.
You can select the shipping speed express or Standard Shipping. Click on Next and then we can choose the type of snowball we want. So we call it demo job. And here I can say, okay, I want a snowball has 80 terabytes of storage or I want a Snowball Edge that is going to be storage optimized with 100 terabytes. And now we’re going to get 24 vCPU and 32 gigs of Ram. Or maybe I want snowball. Compute optimized. And you see there is HDD and SSD now, so the sum is going to be around 50GB, but now we have more vCPU and more Ram. Or maybe we want as well a GPU on it if we want to perform some machine learning. So you select the snowball you want and then when you’re happy, you select the bucket name you want to write your data to so you could create a new bucket. I’ll just put it into a random bucket and then you can enable Compute with EC two.
So you can literally load an AMI onto your snowball, which is kind of a funny idea, but I love it. And then click on Next you can select an IAM role so that’s basically your snowball can do what it needs to do. You can set up encryption, which is great. So I’ll just click on Create Im role right now, just so it’s done. And click on allow. Okay, now the Im role is done which allows snowball to import data into S three and the Kms key that I want to use. Maybe this import export is great. Click on Next then we can set notifications so email addresses or we can choose a new SNS topic.
Basically if you wanted to know and receive text notifications when our job is done, I’m not going to send any notification right now. And then it’s going to say all the status you’d like a notification for so, job created. Preparing appliance shipments in transit. Delivered InTransit at Sorting Fantasy at AWS. Finally, importing completed or canceled.
So you get a lot of information around how your snowball job is going. Click on Next, and then when you’re ready, you can review everything. And I am not going to create a job, because that would be suicidal. I will pay a lot of money for this. Do not do this as well. But basically, you could literally order a snowball device to your house or to your company and load up to 100 terabytes of data on it, which I think is really, really cool. All right, so that’s it for snowball. At least you get an idea of how it works. And then you would obviously monitor your snowball jobs in this UI. So I hope you liked it, and I will see you in the next lecture.
- Storage Gateway for S3
So this is a lecture on storage gateway. And so first let’s talk about hybrid cloud. So AWS is now pushing for hybrid cloud. So what does that mean? That means that part of your infrastructure will be on the cloud on AWS. But also if you’ve been operating for a long time, part of your infrastructure will also be on premise. And this can be due to many reasons. Maybe you have a long cloud migration or security requirements or compliance requirements. Maybe it’s your strategy to be half and half. So hybrid. And the idea is that S Three, for example, as we’ve been talking about it, it’s a proprietary storage technology. It’s not like NFS, which is standardized. So how do we expose the S three data when we are with on premise servers or on premise computers? And so that’s the whole idea behind AWS storage gateway. It’s going to give us access to S Three through a gateway, which will expose standard APIs. So if we look at AWS, how the storage works today in a cloud native way, well, we have block storage, which is EBS or EC to Instant Store.
That’s basically our volumes. Then we have file storage. That’s when we dealt with EFS and we were storing files on the network file system. And then we have objects when we were storing files and objects directly on S Three and Glacier. So all these three things are different. I know it’s hard to see, but they’re actually different. And so the idea is cloud storage gateway will bring a bridge to these solutions. So let’s have an example. So the use cases where we want to maybe bring the on premise data into S three or Bridget is to do disaster recovery, backup, and restore or maybe tiered storage. So we get three types of storage gateway.
And for the exam, you’ll need to know the three of them. The number one is a file gateway, which will basically allow us to view files on our local file system on premise, but it will be backed by S three. There will be volume gateway to do the exact same thing, but with volumes, and then there will be types for backups and recovery. And so all these things are going into the storage gateway, and we’ll have details on it in a second, and it will go straight into EBS, S three or Glacier. But behind the scenes, we don’t have to handle this, storage gateway will do this for us.
So the idea is that we need to know when to use storage gateway. And for the exam, you need to know the difference between file gateway, volume gateway, and tape gateway. So let’s make sense of all of them right now. Now, file gateway is when you have S Three brackets and you want them to be accessible using maybe the NFS protocol or the SMB protocol. And these are standard protocols. So NFS is for network file system we’ve seen this before using EFS, but this time it’s to expose an S three bucket using NFS. This will support S three standard S three, IA 100 and IAS. So it supports all modes of S three. And each bucket will need to be accessed by the file gateway and it will have its own IAM role.
And the cool thing about it is that through the file gateway, we access our files in S three, but it exposes an NFS API and the most recently used data will be caged into the file gateway. So the cool thing about it is that the file gateway will take our most active S three objects and cache them locally. And this file gateway, because it’s NFS, it can be mounted on many servers. So usually a diagram helps.
So we have our data center on the bottom left and we have our application server and it wants to use the NFS protocol for network file system, maybe V three or V 4. 1. So it’s going to talk to the file gateway. So we have to set up a file gateway on premise. And the file gateway automatically goes and talk to the AWS cloud to S three, S three IR Glacier, and basically gets the files we need and caches them locally onto the file gateway. And so the cool thing is that from our applications perspective, it seems like we’re talking to a local network file system, but the file gateway actually does some magic behind the scenes and talks to S three or Glacier.
So that’s the idea behind file gateway. Now we have volume gateway. This is when you want to have block storage, and it’s usually in the exam it will say iSCSI protocol and they will be backed by a free. And so the idea is that the EBS snapshots will be made from time to time and they will be in a free. And this will help us restore on premise volumes if we wanted to. So we have two options for Vlog gateway. We have the caged volumes, and that’s basically going to give you a low latency access to the most recent data on your volumes or a stored volume, which is going to be an entire data set that will be on premise. And it will have scheduled backups to Asterisk from time to time. It will go to asteroid.
Again, the diagram may help, but we have our customer premise, and usually we mount volume using the iSCSI protocol. And so the idea is that our application server is going to mount a volume from the volume gateway and for on premise, it looks like it’s just a local volume. But the magic happens again with the volume gateway. It basically will store this as Amazon EBS snapshots backed by S three. And so again, the ideas here is we provide on premise servers access to block to volume storage, block storage. And the idea is that developing gateway scales a lot thanks to being backed by the Amazon cloud. So again, we would have to set up developing gateway. And you have again two options. Either you cache the volume, so that means you have low latency access to the most recent data, or you have stored volumes where all the data set lives on the volume gateway.
And from time to time you will get scheduled backups to S Three. Finally, for tape gateway, some companies still have processes to use physical tapes. And so with tape gateways, basically you use the same processes, but it’s going to be backed in the cloud. And for this you build a VTL or virtual tape library and it will be backed by Amazon, S Three and Glacier. The idea is that if you have existing processes or software and they use sometimes the ISCI interface, CSI interface, then it will work as well with tape gateway. So tape gateway is for a backup reason.
And so we have a backup server in the backup software and we’d be connecting directly using iSCSI to the tape gateway. And automatically the tape gateway is smart and will basically create a virtual tape library stored in S three or Glacier. So the idea here is that if you see anything around backup and virtual tapes, think tape gateway. So for the exam, I know this may be complicated and for me, even it took me a lot of time to wrap my head around it and I tried to explain it best. But for the exam tip, basically read the question well and it will hint at which gateway to use. So if you get a general question around, we need to bridge on premise data into the cloud, think storage gateway at a high level. If you get a more detailed question, then if it says we want to have file access or NFS, think file gateway. And file gateway, if you remember, Is back by S Three. If it says we want volume or block storage and there is CSI in the question, think volume gateway, it will be backed by S three with ABS snapshots.
But you don’t need to know this for the exam. And then if you see VTL tape solution, if you see backup, the word backup with iSCSI think tape gateway. So if you just learned this one slide, make a screenshot of it, I don’t know, just remember it. Then you should be all set for the exam. But I want you to give you details around how it worked in the back end. So remember, storage gateway, on premise data to the cloud, file access or NFX will be file gateway, volume block storage, is CSI will be volume gateway and VTL tape solution backup with iSCSI be tape gateway. All right, you know everything about storage gateway. I will see you in the next lecture.