Google Professional Data Engineer – Ops and Security part 4
- Lab: Cloud IAM
In this lab, we will be using enduser and service accounts to see how modifying their privileges affects what they’re able to do within Gcp. To start with, let’s take a look at a couple of the end user accounts. The first one is of the user Vitaskrinivasan, who is a project owner. The second one is that of Judy Raj, whose permissions we will be modifying and then seeing how that affects her access. So once we log in as Judy, we will see that she does not have permissions to view any instances in this project because she has not been added to the project. So let us go back to Shrine NASA’s account, who is the project owner and grant access to Judy.
So first let us go to the main project page and see that as a project owner, vittel is able to view a lot of the project information and can also edit details such as the name of the project. This user can also play around with the roles which are in this project. But now let us take a look at the main IAM console. As a project owner, Vital has the ability to do things like adding new users to the project. So let us proceed to add Judy to this loonycon project Six. So we hit add, we specify the name of the member whom we wish to add to the project and then we assign a role to them. For our example, let us just grant Judy the role of a project viewer.
But while we are on the screen, let’s just take a look at the different kinds of roles which are available in Gcp and they have been categorized. So there are things like error reporting, monitoring, resource manager and so on. But we’re just adding this one role for Judy and we just hit Add. And now let us see how this affects her access within the console. So we switch over to the tab which has Judy logged in and let us first check out the main project page before going to the Im console. So let us refresh and go into project settings where we will see that Judy is able to view all the information for this project. But unlike the project owner, she cannot edit the project name.
And moving to the Im page again, Jodi is able to view all the members here, but all the other options to edit roles are grayed out, unlike in Witter’s account where all those options to edit the roles are available. For our next step, let us go and create a new bucket and see how modifying Judy’s permissions can affect her ability to view it. So we head over to storage and choose to create a new bucket. Call this bucket Looney Three and then hit Create and once it is ready, upload an image into this bucket. So we go to upload files and I’m going to choose to upload one file from my file system.
So the file selected as lunicon underscore color, and once it has been uploaded, I just find the name a little cumbersome. So I’m going to rename it to sample PNG. And once that is complete, let us now switch over to Judy’s account and see if she’s able to view this bucket. So I switch over to the first tab, which has Judy logged in, and from over there, let’s navigate to the storage section. And we see here that Judy is able to view this bucket. Now let us play around with Judy’s settings again. So we switch back to the project owners tab and then go into the IAM section. And over here, we are going to edit Judy’s permissions.
So we go over to her specific account, and then we remove the project viewer privilege, which she has. So once we remove it, we see that her account just disappears from this page. Now let us switch back to Judy’s tab and then navigate once again to the storage section. And over here, we see that she no longer has the ability to view buckets. For our next test, let us give jury back some privileges on the project. But this time, her role will be much more constrained. So for that, navigate to the project owner’s profile and from the Im console, hit add. And we’re adding Judy back here. But this time, her role, instead of being that of a project viewer, it’s going to be much more constrained.
And we assign her the role this time of storage object viewer, so it’s much more specific. So once we add this particular role, navigate back to Jodi’s account, and over here, we can see that she does not have the permissions to view anything on the project page. After this, let us navigate to the storage page, and over here, we see that again, she cannot view the buckets from the console. However, Judy does have the storage objective role assigned to her, so she should not be blocked off from the bucket entirely. So though she cannot see it from the console, let us see if she is able to access the bucket from the cloud shell. So once we bring up the shell, let us run this GSU till LS command on the bucket.
And over here, we can see that Judy does indeed have the ability to view the bucket only from the shell and not the console. Now that we have some familiarity with the user accounts in Gcp, let us just close out of this shell and then move along to service accounts. So for this, we just need to be the project owner. And then we navigate to the service accounts page under Im. And over here, let us create a new service account. And we will soon see what the service account actor role is capable of. But first, when creating the service account, let us follow a best practice and give it a name which identifies exactly what the account does. So we call the Service account Read Bucket Objects and then we assign it the role of the Storage Object Viewer.
So once that is ready, let us just hit Create. So with our Service Account in place, go ahead and add a user to it, which essentially means that that user will be able to run operations as that service account with the same privileges. So we select the Service account, click on Permissions, and we now have the option to add members. So the member we will add is this fictional company called Altostrad. com. Let us just say all their employees will be able to run as the Service Account. And for that to happen, the role which we will assign is that of a Service Account actor. So once we have selected that, add this member to our Service account. And as a next step to add another variable in the mix, let us head back to the Im main console page.
And over here add a new user of sorts, or more specifically the domain of Altostrad. com. Which means that all users on that domain will have the specific role. And the role which we’re going to assign to them is that of Compute Instance Admin, which is found under Compute Engine. And this role essentially grants admin privileges on things like instances, instance groups and images, as well as read access on networking resources. So all employees of Altostrad. com will have this privilege. And you will note that this is much broader than the specific privilege we granted to our Service Account, which was just to read Buckets. So yes, we are going to add this new domain and we will move along to our next step, which is to provision a VM instance.
And we will configure this such that any access to the Google Cloud APIs from that instance will be done using our newly created Service Account. So we head over to VM instances and create a new instance which we will call Demo im. Let us select a zone of US central for this instance, and once we choose our machine type, head over to the Identity and API Access section. And over here we will select our own Service account. What this essentially means is that any access to the Google Cloud API from this instance will run with the privileges of this Service Account, which, just to refresh your memory, has the ability to read Bucket objects, as its name implies. And once we have our host provision, let us just connect to it via Ssh.
One thing to note here is that I’m currently logged in as the user wittell, who is the project owner and has a very broad set of privileges. But once we’ve logged in, let us just run this command which will list all the instances in the project and we wait for it to run and we see that it has error out. So it looks like we don’t have the correct permissions. So we’re not exactly running with the privileges of Bittell, but we are running with the privileges of our service account. And to see if we do indeed have the permissions of the service account, which you remember has the ability to read buckets, let us just get the the information for our bucket. So we navigate to the storage section. We copy over the name of our bucket, and we note that it has a file called Sample PNG.
And from our Ssh terminal, let us just try to copy the file from the bucket. We just wish to copy to the current directory, and we see that this operation is a success. So we do indeed have permissions to read objects from our bucket as the service account allows. As a final test, let us try to upload something to our bucket. To do that, we just create a copy of our sample PNG file and we call it Copy PNG. Once that is done, let us try to copy it over into the bucket, for which we use the GS utility CP command. Again, but we copy from our local to the remote bucket, and as expected, we see that we do not have the permissions to perform a right to the bucket. All right, so that concludes our lab. I hope you are able to get a good understanding of im and the use of service accounts as well. Thank you.
- Data Protection
Here is a question that I’d like us to keep in mind as we go through the video. Let’s say that our app makes use of personally identifiable and sensitive information, classifying or redacting such data, identifying where such data exists within our data set and redacting it requires us to write complex machine learning programs to identify the sensitive data and deal with it. Is this statement true or false? We’ve seen how the IAM that’s the Identity and Access Manager helps to provide authentication and authorization using policies on the resource hierarchy. Let’s now take a look at something known as Iap. That’s the identity aware proxy. This is an even higher level of abstraction which combines both authentication and authorization for any Http or Https based access.
Iap or the Identity Aware Proxy acts as an additional safeguard on particular resources you can earn on Iap. For a resource, this will cause the creation of a special Oauth two client ID and secret one per resource. These additional oauth client IDs will show up in your API. Manager don’t go in and delete any of those, because Iapp will stop working on that resource if you do. Let’s understand schematically how the Iap works. There is a central authorization layer for applications which are accessed via Https. Going back to our conversation about load balancing at different levels of the network stack, we had seen how the higher up one went in the protocol stack, the smarter the load balancing became. The same principle applies to identity management as well.
And so Iap is a way to carry out this protection at the application layer rather than at the network layer. This allows us significant additional functionality. For instance, we can set group based application access. A resource might be accessible to employees and inaccessible to contractors, for instance. That’s a pretty smart level of access control. We’d never have been able to achieve this using, say, IP addresses alone. Let’s now parse this flowchart. Let’s take a look at how Iap and IAM interact. Notice that Iap is an additional step. It is not bypassing IAM. So users and groups will still need to have their identities and roles set up correctly in IAM. Otherwise, access will not be granted.
Authentication well, authentication here can come from two sources, either from App Engine or from Cloud load balancing. That’s from Https, and that’s because these are the only two application level sources of resource requests. In either case, Iap will go ahead and check the user’s credentials. This can be done from the browser because this is Https. If credentials are not found, the user will be redirected to an author to Google sign in and those credentials will now be required. Authorization is carried out as before. Making Use of IAM iap does have some limitations because it will not protect against activity inside VMs. Remember that this is an Httpsim application level protocol.
So for instance, Iap will not help with someone trying to ssh into a VM or an App engine flexible environment. This means that we’ve got to configure firewall and load balancers to disallow traffic which is not coming in from serving infrastructure. That will cause all requests to be funneled in only through either Https load balancing or from App engine. And one last little bit of fine print https signed headers need to be in use for Iap to work correctly. This conversation about the limitations of Iap leads in nicely to a conversation about preventing data exfiltration. So first, let’s define this heavy duty sounding term. Data exfiltration basically is data leaving your secured system because of someone with bad intent.
Here, an authorized person has extracted data from the secure system and shared it with some unauthorized third party or moved it to an insecure system. Data exfiltration will usually occur, like I said, due to bad intent, due to malicious or compromised actors. But it could also happen accidentally. It’s unlikely, but possible, of course, that you might forward a really sensitive email to some person outside your organization. That’s why it’s usually the case that exfiltration is caused by bad intent. In any case, let’s move on and examine some of the most common types of data exfiltration. There isn’t a whole lot of rocket science here. Outbound email is one example.
Downloads to insecure devices, that’s another example uploads to external service devices. All of these are exactly the kinds of actions which you would associate with a pretty rough and ready bad actor rogue admins folks who are terminated but still on their notice periods. In order to prevent exfiltration, there are a whole set of do’s and don’ts, particularly for virtual machines. Let’s start with some of the dons. Do not allow outgoing connections. Do not allow outbound network traffic to unknown addresses. You can set firewall rules to enforce this. Next, do not make IP addresses public. In fact, this is one of the important reasons why internal load balancing helps so much. It’s a way to avail of the benefits of load balancing without using any external IP addresses at all.
This allow some kinds of access which are associated with exfiltration. For instance, remote connections using remote desktops, for example. Remember that this is something that you need to change by default. It is okay for folks to connect into VM instances using remote desktop protocols. You might recall that there is a default network and that has some default firewall rules. This is one of them. So ideally, you should actually go in and explicitly turn off RDP connectivity. Likewise, you should also make sure that Ssh access is not given unless absolutely necessary. Once again, this will require you to go in and change one of the default rules associated with the default Vpc in your cloud application.That does it for the don’ts. Let’s examine some of the do’s.
The first of these has to do with the use of multiple VPCs and appropriate firewall rules. For traffic between them. Remember that there is always going to be one default Vpc, but a project can have up to five. It makes sense to create multiple VPCs and isolate different parts of your code and your application from each other. Do so by creating multiple VPCs for the logical partitions. That way the network partitions will correspond to the units of isolation. Also, be sure to set up the correct Firewall rules to sync these up whenever possible. Use something known as a bastion host. That’s a formidable sounding name. What it really means is that you have one IP address which is known to outside users or even to instances within your architecture.
That IP address is the bastion host. That bastion host then goes ahead and sets up a bunch of internal connections. So you want to make sure that external machines can only Ssh into the bastion host and not to the individual instances inside your network. You might also want to specifically whitelist external machine IPS and thereby limit the source IPS that can communicate with that bastion. And lastly, you want to be sure to configure Firewall rules to allow Ssh traffic only if that Ssh traffic is between the bastion host and an internal instance. It should not be possible to set up an Ssh connection between two of your private instances. Let’s move on now to talk about data loss prevention.
I find this topic ever so slightly misnamed because it really has to do with the redaction or classification of very sensitive user data. It can be incredibly laborious and time consuming to find what data in your user application is actually sensitive and then to go ahead and redact it. And that’s why Google has made available some powerful machine learning based APIs to do this for you. Those APIs are collectively called the Data Loss Prevention API. The basic idea of this Data Loss Prevention API is to help you understand and manage sensitive data either in your cloud storage buckets or in cloud data store. Remember, the Data Store is the document oriented no SQL service that’s available to you. What Google allows you to do is to easily classify and redact sensitive data.
And there are specific types, for instance, textual or image based information. You can also redact sensitive data from text files and even classify text files based on their potential sensitivity. These are pretty magical APIs. Let’s understand how they work. There is a classification API where the input will be raw data. This could be either image or textual data potentially containing sensitive information such as email addresses of end users, Social Security IDs, driving license Identifiers, and so on. And the Data Loss Prevention API. The output will have the information types for specific types of sensitive information as well as likelihoods.
These are probabilities that that information is indeed sensitive, and lastly, offsets Ie positions in that user data which are likely to be sensitive. So if this text is the input into the classification API. The output will be an array of information that you see on screen. Now, we have an array or a list of information types. These could be us. Healthcare Identifiers, email addresses, american driving licenses, Canadian passports, or UK taxpayer reference numbers. All of that is in column one. Column two contains likelihoods. You can see that this is like an enum with values from very likely up to very unlikely. And then the third column contains the offsets. These are the locations in the user data where that sensitive data is most likely to occur.
Once those offsets are available to you, you can then go ahead and easily redact the sensitive information. But it turns out that you don’t even need to do that, because the Data Loss Prevention API will do it for you. It’s trivial to ask for redacted output for the corresponding input. All of this is really pretty magical. And if you’re wondering how this wave inducing functionality is achieved, it’s through a combination of machine learning and rule based approaches. The machine learning includes contextual analysis and pattern matching, and there are also rule based specific approaches, such as calculating checksums, for instance, to see if a number is a debit or a credit card, as well as word or phrase lists.
These would help with the specific Identifiers that we have previously discussed. Let’s return to the question which we posed at the start of this video. The statement on screen now is false. Let’s say that our cloud app makes use of important or sensitive data. It is still pretty easy for us to classify data as sensitive and to go ahead and redact it. And it is easy because we can just make use of GCP’s Data Loss Prevention API. As we’ve discussed, the Data Loss Prevention API obviates or eliminates the need for us as end users to write our own machine learning code, to find patterns match versus sensitive identifiers such as Social Security numbers and so on. All of this is abstracted away for us by the Data Loss Prevention API. And so the statement on screen is false.