Google Professional Data Engineer – Ops and Security part 3
- Cloud Endpoints
Here is a question which I’d like us to ponder over while we go through the contents of this video. The extensible service Proxy this is an important part of exposing APIs via cloud Endpoints. The question is, does the extensible service proxy always sit on the same Vpc, on the same virtual private cloud as the code that you are exposing as an endpoint? Think about about it and we’ll get back to it. Let’s wrap up this section on ops in the Google Cloud environment with a quick discussion about cloud endpoints. Cloud endpoints help us to create, share and maintain our APIs. These are the endpoints in our apps. This service makes use of something known as the distributed extensible service proxy.
This is a way of providing API access with low latency while also providing authentication, logging, monitoring, and all of that good stuff in just one unit. The basic idea here is to wrap up our code. That’s our API container. Place it on an app engine flexible environment instance, or on a Kubernetes engine instance, or on a GCE instance that’s just a VM. And then, crucially, place the extensible service proxy container in front of our API container and on the same machine. That is, on the same virtual machine instance. This is a way of making sure that there is no network hop involved between the extensible service proxy container and our API container.
That extensible service proxy will take care of communicating with the service management and control APIs, which is how deployment happens. And it will also take care of receiving incoming API calls from clients, whether they are web clients or Android or iOS clients. This proxy will then send on the actual API calls to our container. This is also an excellent illustration of the use cases of containers because the extensible service proxy is actually a docker container. And so, technically, we could host our API anywhere where docker support is available. Now, if you are sharp eyed, you might have noticed that I had not included standard instances in my original list of environments which support cloud endpoints.
Well, that’s not strictly true. You can actually use some types of standard environments, provided they have the endpoint service turned on. You can definitely use cloud endpoints with flexible environments as well as with container or compute engine instances. But do be careful about having both the proxy container and the API container with our code on the same instance to avoid a network hop between them. That would, of course negate the whole point of the cloud endpoint service. Let’s get back to the question we posed at the start of the video. An extensible service proxy must indeed sit on the same Vpc or virtual private cloud as the endpoint code.
But in reality, there is even a much stronger condition. That extensible service proxy must sit on the same machine on the same instance, the same physical instance as your endpoint code. The reason for this is that the extensible service proxy is going to perform a bunch of critical functions including authentication. If the service proxy and if your code sat on different machines on different instances, then communication between them would require a network hop and that in turn would negate the purpose of the authentication. So for a cloud endpoint to function as required the extensible service proxy and your code must be sitting on literally the same machine.
- Cloud IAM: User accounts, Service accounts, API Credentials
Hello and welcome to our module on identity and Security on the Google Cloud platform. The topics we will discuss here include authentication and the use of service accounts, end user accounts and API keys. We will also discuss authorization and the use of roles and resources. We will conclude this talk with a discussion of the best practices when it comes to identity and security on Gcp, and we will begin with authentication. But first let us take a look at where it fits in in the whole identity and security landscape. So we can consider that this is broken up into authentication and authorization, or who you are and what you can do. Authentication we will break up into two broad forms.
One is the use of API keys and the other is what we will call standard flow, which involves the use of end user accounts or service accounts for authorization. This involves the assigning of roles and privileges to different kinds of identities and will broadly come under the Identity Access Management. Let us begin though, by discussing service accounts. So service accounts can be very flexible and they’re also very widely supported within Gcp. In fact, all Gcp APIs support service accounts, where the same cannot be said of the other Credential types. For pretty much any application which runs on a server, it’s probably best if you use a service account for it to communicate with APIs within Gcp.
One big benefit of service accounts is that they are linked to projects and not to users, so you don’t have to worry about losing access if a user account is deleted after him or her having left an organization. Another plus is that all the resources which a project requires can be obtained at one go if the correct roles are assigned to these service accounts. A compelling reason for not using a service account and using an end user account instead is when you would actually like to distinguish between the different end users using the same project. And we shall soon see an example of that. We have already seen that a service account is linked to a project as opposed to an individual user.
And also service accounts can be created either from the console or programmatically from the SDKS. The credentials associated with the service account can be accessed using an environment variable called Google Application Credentials, and the set of credentials active at any given point in time is called the application default credentials. Let us now check how the application default credentials or ADC is set. So, if your code uses a client library, then the ADC will first look to see if the Google Application Credentials environment variable is set. And if it is, it will just pick up the credentials from the file that variable points to. If that variable is not set, then ADC just uses one of the default service accounts which are automatically provisioned.
So make sure you don’t delete that service account because if none of the above credentials can be obtained, an error will be thrown. Let us now move along from service accounts to end user accounts. As we had touched upon previously, one must use a service account whenever possible. But there are cases where the use of end user accounts just becomes unavoidable. Let us take a look at a couple of examples. So imagine a user is logged into an application and needs to access a BigQuery data set which belongs to him or her. Since this data is kind of sensitive, we cannot just let a service account use it. So you need to be logged in as an end user.
Another case is when a new project needs to be created, and that privilege is usually with an end user. Let us now take a look at a real life example of end user authentication. So consider a user who navigates to Quora. com, and quora now needs to access certain resources on behalf of that user.So a sign in screen to Google is presented. The user signs in using his or her Google credentials. And once Google has authenticated that user, cora is satisfied and is able to release the resources for that user. Let us now quickly review the components involved in that transaction, starting with the resource owner, which happens to be Quora.
Quora is also the resource server, which will grant access once the user has been authenticated. Quora is also the client in this interaction, which happens to be talking to Google, which serves as the authorization server. Let us move on now to another form of authentication and authorization using Oauth 20. The way this works is, let’s say an application needs to access some resource on behalf of a user. That user will be presented with a consent screen. Let’s say the user grants consent. The application will then request for credentials from an authorization server. Once those are obtained, it uses those credentials to access the specific resources.
The way to create an Oauth Credential in Gcp is to simply navigate in the console to API manager credentials and then hit Create, select, or Client ID as your Credential type. And this will help you generate an Oauth client ID and client secret which you can then use in your application. Let us look at an example now where the user of an application wishes to access the Gcp API. So, with the user’s consent, the project will access the API on the user’s behalf by sending a request to the Gcp API manager to authenticate the user by passing along the Oauth credentials. And if the authentication is successful, the user is granted API access.
Once again, let us take a look at the different roles in this transaction. So, the Gcp project is both the resource owner as well as the resource server. In our example, the client in our case happens to be the Gcp project as well, which is communicating with the authorization server, which is our Gcp API manager. A couple of things to keep in mind when using Oauth credentials so the client ID secrets can be viewed by anyone who is a project owner or editor, so be careful about who has those permissions. Also, if you happen to revoke access to some user, they may have the secret stored somewhere, so it’s probably best practice to reset them.
Let us move on now to another form of authentication that is the use of API keys. So, API keys are simply encrypted strings and they can be used to call some of the Gcp APIs, especially those which do not need access to private user data. That is because API keys are linked to projects and don’t really identify users. So API keys are especially useful for applications which do not have a back end server, so they can just hit the APIs directly. Also, these keys can be used to track the number of requests made to the APIs so that the project which it is linked with doesn’t exceed its quota of API calls and can also be billed according to the number of calls it makes.
To create an API key from the console, you could just navigate to API manager Credential and hit Create. And when choosing the type of Credential, just select API key. A couple of things to note when using API keys is that one can be susceptible to man in the middle attacks, especially if you are making API calls using an unencrypted connection. For example. So if someone were to get their hands on your API key, they can just make a number of API calls and your project could quickly exceed its quota or incur a huge bill. Also, API keys don’t really identify the user or an application, but are only linked with a GCP project. So they’re number of machine learning APIs offered in Gcp and it is possible to use API keys to access all of them.
- Cloud IAM: Roles, Identity-Aware Proxy, Best Practices
We now move along to the management of permissions which are assigned to different identities in Gcp or identity and Access Management. This chart here gives an overview of the various components involved in IAM. So let us start with identities which could be End User or Google accounts, could be Service Accounts, or it could also be all users or all authenticated users. There are also roles which can be quite granular and can be assigned per resource. For example, they could be read role on buckets. The resources which could have sets of roles linked with them include projects, compute, engine instances, virtual networks, buckets, and a whole lot more.
Policy essentially encapsulate identities and roles and link them together. Roles within Gcp include what are called primitive roles, which comprise of viewers who have read only permissions on resources, that are editors who can read resources as well as modify and delete them, and may also be able to deploy applications and owners who, in addition to the viewer and editor privileges, can also create resources and can grant or revoke access to different users on the resource. Gcp also allows custom roles which as of December 2017 are still in a beta stage. With custom roles you can add individual permissions such as read access to one resource and write access to another, for example.
However, there are certain restrictions. customed roles cannot be applied to folders or groups of projects and also these roles, when created in one project, cannot be used to access resources in another project. Let us now take a quick look at the resource hierarchy in Gcp. So at the very top you have an organization which comprises a number of projects, which in turn consists of a number of resources. So your access control policy can be set at any level in the hierarchy and it will be inherited by any of the children of that resource. One thing to note is that if a child policy conflicts with that of the parent and the less restrictive policy is the one which applies.
So for example, if a user is given readonly access to a cloud storage bucket, for example, but that same user has write permissions at the project level, then the user will have write permissions on that bucket. So this is one more view of the resource hierarchy, but this time with folders which are essentially groups of projects and sit between the project level and the organization level. So taking a closer look at some of the levels in this hierarchy, an organization is not necessary. But if a project were to belong to an organization as opposed to an individual user, it does make it easy to delete the individual user account if that person were to leave the company. Organizations are available if you use G suite and it can be configured by the G suite super admin.
Also, since organizations are the root of the resource hierarchy in Gcp, any permissions set at this level cascade down to all the child resources such as folders and projects and the like. Folders and Gcp are merely groupings of projects, so they may map on to departments, legal entities, or teams within your organization which have a lot of projects under their management. So it is also possible to have a hierarchy of folders. I’d also like to touch upon some of the features of G Suite as opposed to having an individual Gcp account. So it is possible to have an organization when you use G Suite and have projects belong to that instead of to an individual account.
One can administer users and groups when using G Suite, and you can also sync up your G Suite accounts with your own LDAP or Active Directory. You can also set up single sign on between Gcp and your other applications by linking with a third party identity provider. For more information on IAM, feel free to check out this page in the documentation. We now take a look at Identity Aware Proxy, which is a service offered by Gcp used to secure your Https based web applications. So we have seen all the components of identity and security, and you could say that Iap essentially combines all of this in order to secure your application. So Iap is an additional safeguard for a resource and doesn’t really replace IAM but just complements it.
Also, when you turn on Iap, it causes the creation of an Oauth client ID and secret for each resource. So don’t make the mistake of deleting any of these, or Iap will just stop working. One of the benefits of Iap is that you don’t need to rely on network level firewall rules. For example, so rather than open up your application to anyone who is within your network or who has connected using VPN, you specify a more application level access control, and you can grant access to only specific users or specific groups. So for example, you could say that only company employees have access to your application and contractors do not. Just to reiterate, the use of Iap does not do away with Im, but is merely an additional step.
So you still need to configure your users groups and roles correctly. The way Iap works is say, requests come into an app engine or an Https load balancer for an application which has Iap enabled. So Iap will first check the user’s browser’s cookies for the credentials. If they don’t exist, then the user is directed to a Google Account sign in page. Once the Google has signed in, then the credentials are sent to im for authorization to see if the user has the permissions to access that resource. So Iap is not a universal solution for security. There are certain limitations. So for example, you cannot really prevent anything who has access to a VM from connecting to it by Ssh and perhaps doing some damage.
You will still need to configure your firewall rules and a load balancer to block traffic which are not from the serving infrastructure. Another thing is that you will need to turn on Https signed headers when using Iap, and you can check out more about Iap by checking out its Documents page on the Google Cloud website. To conclude this course, let us take a look at some of the best practices when using cloud IAM. For one, your use of the organization folders and projects should kind of replicate the structure in your own organization. So for example, if you happen to have an entity within the organization which manages a large number of projects, do use folders and even subfolders if necessary.
Also, do make use of the inheritance feature which is available when assigning permissions. So for instance, if you want read access to be granted to a group of users across all resources in a project, make sure that access is given at the project level rather than at each resource. Very importantly, apply the principle of least privilege when assigning roles at any level. For example, if a user needs to have read access to all resources in a project, but also needs write access to some resources assigned, read access at the project level and write access at the individual resource level. Another best practice is to assign permissions to groups rather than to individual users.
So even if there is a task which requires multiple roles and only one or two users require those permissions, it is still best to create a new group and add the users to it and assign the correct roles. And finally, do perform regular audits of group members just to make sure that someone doesn’t have more permissions than they really require. And finally, when using a service account, one must have some kind of naming convention for them so that the name of the account clearly identifies what it is being used for. One must also be careful when granting the service account actor role to a user, since they will have all the privileges for that service account. And since service accounts do use keys to authenticate, one should perform some kind of key rotation. Okay, that concludes this course on cloud IAM.