Google Associate Cloud Engineer – Object Storage in Google Cloud Platform – Cloud Storage part 1
- Step 01 – Playing with Object Storage in GCP – Cloud Storage
Back after talking about block and file storage in the last section. Let’s now shift our attention to Object Storage. The object storage service in GCP is cloud storage. What is object storage? What is cloud storage? Let’s talk about that in this specific section. And to get an idea about object storage and cloud storage, let’s start with a demo. What I would do is I would go into cloud console and I would say, cloud cloud storage. And this would take me to storage. Let’s go in there and you might see a few buckets which are listed in here. Don’t worry about what are in here. Let’s try and create a new bucket. Before we’d be able to store anything in Cloud Storage, we need to create a bucket.
So Bucket is kind of a container for all the objects that you would want to place in Cloud Storage. Enough, I would need to give the bucket a name. Let’s just call it my first bucket. The bucket name should be unique globally, so whatever bucket name I’m using, you cannot use it as well. So I’ll call this by my first bucket in 28 minutes to make it really unique. And I’ll say continue. The next thing that we would need to choose is where we would want to store our data. The options are a little different in here. Earlier. Until now, we were talking about Zonal storage and regional storage. However, with cloud storage, the options are regional, dual, region or multiregion. So you can either store it in a single region.
Then you would be able to choose the specific region where you would want to store it in. You can also choose to store it in two regions, so high availability and low latency across two regions. So there are certain dual regions. So Nam. Four. EUR. Four. Asia. Two. These are dual regions which are present in Google Cloud. So these are two nearby regions which combine and offer us a feature called Dual Regions. So Nam four is Iowa and South Carolina. EUR four is Netherlands and Finland. Asia One is Tokyo and Osaka. So you can use any of these dual regions as well. In that kind of scenario, your data would be stored in those regions. Or you can also go from Multiregion. When it goes to Multiregion, you have again three options you can store in multiple regions in US, multiple regions in European Union, or multiple regions in Asia.
So you can choose one of these and then click Continue. What I would do is I’d go for a specific region. You can choose any region. It does not really matter. So you can choose a region and say Continue. Once we choose where we would want to store our data, where our bucket needs to be created, we can choose the default storage class. Different kinds of data has different needs. Based on your needs, you can choose the appropriate storage class. The classes which are present are Standard, nearline code line and archive. Standard is recommended for short term storage and frequently accessed data. If your data is frequently accessed, the best place to store is Standard. If your data is accessed less than once a month, then Nearline.
If your data is accessed less than once a quarter cold line. If you have long term backups which are accessed less than once a year then go for archive. For now I will choose standard and go and say continue, let’s not worry about the advanced settings for now and let’s go ahead and say create. One of the important things that you would have noticed in here is we did not say what is the size of the storage that we would want. All that we said is this is the bucket, this is where we would want to create it and this is the default class that we want to make use of. Now I would want to store data in my cloud storage bucket. How can I do that? In the previous lectures you would have downloaded a zip file with the download content of the course.
If you go into that folder and go into downloads and go over into cloud storage, you should see a number of files like this 2030 cloud index HTML process Tools what I would recommend you to do is to pick those files and drag and drop them into the bucket. So the bucket screen is open and I am just dragging and dropping them. In here you’d see that within a little while all the files would be uploaded. So it says upload started and you can see that all the files in those folders would now be uploaded to this bucket. Over here in the bucket you can see folders, you can see files. If you go and click a specific folder you’d be able to go in and see the subfolders which are present in here and the files which are present in here.
So what we are seeing in here are folders and objects. So index HTML is an object and inside 2030 we have a folder ten and inside ten we have a couple more objects. Course. One PNG and coast. Two PNG. So all these are objects and we are uploading our objects into buckets and that’s why this is called object storage. So cloud storage is the object storage object. In Google Cloud platform there are a wide variety of use cases where cloud storage would be useful. I just wanted to give you a handson first because I wanted you to build a mental picture of what’s happening with cloud storage before we talk about all the theories related to it. I’ll see you in the next step with the theory related to cloud storage.
- Step 02 – Exploring Cloud Storage in GCP
Come back. Cloud Storage. Cloud Storage is the most popular, very flexible and inexpensive storage service. It is serverless, it’s auto scaling and infinite scale. We saw that when we created Cloud Storage. We did not have to say what is the amount of storage that will store. In Cloud Storage, you can add as many objects as you would want into Cloud Storage and it would automatically scale to your needs. And that why it’s called serverless as well. So it’s auto scaling and infinite scale. In Cloud Storage, you can store large objects using a key value pair approach. Cloud Storage treats an entire object as a unit. So you cannot make partial updates whenever you are operating on an entire object. Most of the time, cloud storage is recommended.
So over here earlier we uploaded a few objects. So if I actually select one of these so course one dot PNG, you’d be able to see the key and the object. So over here you can see the Uri and whatever is in here at the end. 20310 course one PNG. This is the key and the value is the content of the course one image. Whenever we are storing files, we would not be updating files bit by bit. What we would do is whenever we want to change this file, what we would do, we would create a new image and then upload the entire image as such. So we would be treating the entire object as a single unit. We will not try and do partial updates in these kind of situations. Cloud storage is recommended. You can also have access control at object level for each of the files which we were looking at earlier. You can say this file needs to have different access compared to other files.
This is the reason why Cloud Storage is also called object Storage. You are storing objects and you are treating the entire object as a unit. Cloud Storage also provides you with the Rest API to access and modify the objects. So over here you can see the URL, right? So this my first bucket in 28 minutes. This whole thing is the URL, which you can use to access the specific image. If you click this, because we are already authenticated, we’d be able to see the image. So cloud storage provides Rest API to access and modify objects. Cloud Storage also provides a CLI. Important thing to remember is the CLI for Cloud Storage is not G cloud. The CLI for cloud storage is Gscutil. So GS stands for Google Storage.
So GS Util is what is used to play with Cloud storage from the command line. There are also client libraries. So if you want to actually talk to Cloud Storage from a Java application or a Python application, you can use these libraries and you can send requests to Cloud Storage. Cloud Storage is actually used to store all different file types. Text files, binary files, backups archives, a variety of files can be stored in cloud storage media files, archives, application packages, logs, backups of your databases, storage devices, and staging data. Whenever you’d want to move something from on premise to cloud, whichever database you’d want to move to, the first thing that you do is to move the data to cloud storage.
And most of the database solutions in Google Cloud Storage allow import of data from cloud storage. So whenever you want to transfer some data from on premise to cloud, the most probable solution would be to first move the data to cloud storage. If you want to take a backup of the database, you would use cloud storage. If you want to create an archive, you would use cloud storage. So cloud storage is a multipurpose object storage solution in GCP. Let’s talk more about cloud storage in the subsequent steps. I’ll see you in the next step.
- Step 03 – Understanding Cloud Storage – Objects and Buckets
Welcome back. In this step, let’s discuss the structure. How do you store data in cloud storage? Objects and buckets? First you’d create buckets and then you would upload objects with key and value to the bucket. Objects are stored in buckets. Bucket names are globally unique. You cannot have the same bucket name in two different projects or in two different GCP accounts. Bucket names are used as part of the object URLs, and therefore your bucket names can only contain lowercase letters, numbers, hyphens, underscores and periods. Earlier we saw the fact that this URL this is the object URL for this specific course, one PNG. And this contains the bucket name my first bucket in 28 minutes. And that’s the reason why a bucket name should adhere to all the constraints that you typically put on a URL.
You can have three to 63 characters and there are a few limitations. So you cannot start with GOG prefix or it should not also contain Google, even Misspelled. You can have unlimited objects in a bucket. So you can upload as many objects you can upload as many objects as you’d want to a bucket. The bucket which we have created is in a project, my first project. If you have created it in storage project, that’s fine too. It does not really matter. But whichever bucket we create is associated with a specific project. Each object is identified by a unique key. So this key which we have in here so starting from the bucket name source 20310 one PNG. This is the key of this object.
So this is unique in a bucket, key is unique in a bucket. And the maximum object size is five terabyte. So this file, the course One PNG can go up to five terabytes. However, remember that you can store unlimited number of such objects. A single object maximum size is five terabytes. However, you can store unlimited number of such objects. In a quick step, we got an overview of the structure of how we store data in cloud storage. We create a bucket and then we store objects into it with key value pairs. We talked about the fact that while a single object can be up to five terabyte, you can have multiple such objects, actually infinite such objects in cloud storage. I’ll see you in the next.
- Step 04 – Understanding Cloud Storage – Storage Classes
Welcome back. In this step, let’s talk about cloud storage. Storage classes. When you were creating the bucket earlier, we needed to select a default storage class. What is a storage class? Why do we need it? Different kinds of data can be stored in cloud storage. So media files, archives, application packages, logs you might have backups of your databases, storage devices, or you might have long term archives, or you might be actually moving a data from on prem to cloud. And you are using cloud storage as a temporary storage. And with these kinds of data, with the different kinds of data which are present, there can be huge variations in access patterns. Some of these data you might want to access every day. Some of these kinds of data might not be accessed at all.
Some of these might be accessed once a month or once a year, and so on and so forth. So the question is can I pay a cheaper price for objects I access less frequently? There is data that I would want to access very, very rarely, and I would want to pay cheaper price for them. Is there an option like that? That’s what storage classes provide you with. They help you to optimize your cost based on your access needs. All storage classes in Google Cloud are designed for a durability of eleven nine s 99. 99s. Let’s now look at different storage classes in Google Cloud. The first one is standard, so the name as it says here is standard. There is no minimum storage duration. If you’re using multiregion and dual region, you get an availability higher than 99. 99%.
If you’re using a single region, then you could get an availability of 99. 99%. This is recommended for frequently used data or data which would be present in cloud storage for very short period of time. The next storage class is near line storage. If I have data which I expect to read or modify once a month, on average, I can put it to nearline storage. The neon storage duration is 30 days and the typical monthly availability is 99. 95% in multiregion and dual regions, 99. 9% in single region. The next one is coaline storage.
If you expect data to be read or modified at most once a quarter, the ideal storage class is coal line storage. So coal line minimum storage duration is 90 days and the typical monthly availability is same as nearline storage. The next one is the archive storage. Archive storage is for data which you expect to access or modify less than once a year. The minimum storage duration is 365 days and the typical monthly availability is same as colon storage. The important takeaways from this specific slide are if you have data which you expect to access less than a year, use Archive. If you have data which you expect to read or modify at most once a quarter, then colon storage once a month.
Nearline Storage for frequently used data or for data which you expect to store for very little time. Go for standard. One of the important things to remember is you can configure a default storage class at a bucket level. However, you can actually change the storage class even at the level of an object. Whenever a new object is uploaded to a bucket without a storage class assigned to it, the default storage class of the bucket will be used. However, if you assign a specific storage class to an object, then the storage class that you assigned for that specific object will be used to create that specific object. So in the same bucket, you can have different objects with different storage classes.
Now, let’s look at a few important features across storage classes. These features are present across storage classes in Google Cloud platform. High durability. We already talked about it. Eleven 9. 99. 99% annual durability. Low latency irrespective of the storage class you would store in whether you are using archive, nearline, colon or standard, you have very low latency. This is different from other clouds where the latency would vary depending on the type of storage. However, in Google Cloud it’s always low latency. You can expect to access the first byte typically within tens of milliseconds. All storage classes provide you with unlimited storage auto scaling and there is no configuration which is needed.
There is no minimum object size for any of the storage class. The APIs you use across storage classes is the same. The way you store data to Archive is the same way you’d store data to standard class as well. The committed SLA from GCP is 99. 99% for multiregion and 99. 9% for single region. This applies only to standard nearline and cold line storage classes. This is committed SLA and there is no committed SLA for Archive storage. In the step, we looked at the different storage classes in cloud storage. So try and understand all the storage classes and be prepared to choose the right storage class for the right scenario. I’ll see you in the next.
- Step 05 – Understanding Cloud Storage – Uploading and Downloading Options
Welcome back. In a step, let’s look at the different options that are present in cloud storage to upload and download objects. We’d be uploading very, very large objects to cloud storage. It might be archive offer, database. How do you efficiently upload and download things? That’s what we’d be looking at in this specific step. The first option for uploading things to cloud storage is simple upload. So when you you have very, very small files that can be re uploaded in case of failures, and if you don’t have any metadata that is associated with the object, that kind of situation you can go for simple upload. Multiplied upload is also recommended for small files that can be re uploaded in case of failures.
In this case, you have object metadata. You have some metadata which is associated with the object. You can also go for resumable upload. This is typically recommended for larger files. However, you can actually use resumable upload for even smaller files as well. So for most use cases this is recommended. However, remember that with resumeable upload you’ll have one additional Http request which is involved compared to simple and multipart uploads. The great thing about resumable upload is that in case of failures, you can resume from where the failure happened from cloud storage also supports streaming transfers, so you don’t know what is the size of the object and you would need to start uploading to cloud storage.
You can also do that. In that kind of scenario. You’d go for streaming transfers. The last option is parallel composite uploads. In parallel composite uploads, file is divided up to 32 chunks and these chunks will be uploaded in parallel. This is the recommended option to go for if you have large file and if you don’t have any network or disk speed restrictions. So if you have fast network and very fast disk speed and you want to upload a large file, parallel composite uploads is recommended. Next, let’s look at the downloads. You have a simple download where you’d want to download an object to a specific destination.
You have streaming download. You have a streaming download where you don’t know the size of the object or you’d want to stream the object to a process. And you also have sliced object download. When you have large objects, you’d want to slice them and download the individual paths. In those kind of situations, you can go for sliced object download. In a step, we looked at what are the important options related to uploading and downloading objects in cloud storage. I’m sure you’re having a wonderful time and I’ll see you on the next step.