Google Professional Data Engineer – VPCs and Interconnecting Networks part 1
- VPCs And Subnets
In this section, we learn about virtual private clouds in the Google Cloud Platform and how we can connect different VPCs together. VPCs are just another name for networks that we create on cloud platforms. Just as AWS and Azure, the Google Cloud Platform also has its own Vpc set up with its own quirks. The Google Google Cloud Platform’s Virtual Private Cloud provides networking functionality to compute engine virtual machine instances, the Kubernetes engine, containers and App engine flexible Environment just like everything else on the Cloud Platform, vpc provides global scalable flexible networking for your cloud based services. All resources that you instantiate on the Gcp belong to some network. They belong to some Vpc.
The VMs that you’ve instantiated so far, the container clusters that you’ve set up, the clusters that you set up for your Hadoop configuration, all of these belong to some network. Some Vpc instances which are on a VPC can’t automatically communicate with each other. They have to be first aware of routes and forwarding rules which have to be configured to allow traffic within a VPC and the outside world. So instances within a VPC to communicate with each other have to be aware of routes, and for all instances within a VPC to communicate with the outside world need to have this information as well. Routes alone aren’t sufficient though. On the Google Cloud Platform, you also need to explicitly configure firewall rules to allow traffic flow through your network.
Firewall rules allow you to specify what traffic flow is allowed and what should be blocked or is denied. Now, the most common use cases you already have an on premise network set up and you’re migrating some of your resources, some of your computation to your cloud platform. In such a situation, you often want to connect your on premise network with the Google Cloud. Some ways of doing this are VPN and peering. We’ll see how those work in this section. We’ll also see ways in which we can connect two networks on the Gcp together. You might have networks which belong to different projects. One way of connecting them is via shared VPCs. Or you might want two networks on the same project to be connected together and behave like a single network with IP addresses in the same space.
This you can do via peering. We’ll start off by looking at VPCs in isolation and then move on to interconnections between VPCs. Let’s define it first. A Virtual Private cloud is a global, private, isolated virtual network partition that provides managed network functionality. It’s global because a network can have instances from multiple regions in the world. It’s private because you have to set up special permissions for people to access your network. It’s virtual because the actual machines may not be physically located in the same place. The fact that it’s global means that the actual machines may be anywhere in the world, but they are part of the same network. And because it’s managed, you’ll find that a lot of grungy details of managing networking is completely abstracted away from you.
There is very little administration that you have to do. And if you look at the docks for the Gcp Virtual Private Cloud, you’ll see that it provides global scalable and flexible networking for your cloudbased services. You know that there are many data centers across the world where Google has its machines. These data centers are called regions. And regions can be in the US. They are in Europe, they are in Asia, even in Australia. The fact that our BPC is global basically means that the virtual machines which form part of this network can be anywhere in the world. They can be located in different regions, they can be located in different zones within a region. If you remember, a region can be made up of multiple zones and these zones are completely isolated from one another.
A failure in one zone does not affect another. Resources which belong to the same Vpc can lie in any of these locations. It’s global. VPCs have multitenancy, which means you can have networks in different projects within your organization and they can all belong to the same network. Instances on that network can talk to each other using private IP addresses. This is a big deal. This means that you can have network administration and security for multiple projects in your organization be performed by the same set of people. A shared Vpc essentially means that the traffic routes and firewall rules for your network can span across your projects, across teams, across different billing units. Virtual Private Clouds, when they are set up, they are secure by default.
You have to explicitly allow traffic into your Vpc and out of your Vpc and you can have firewall rules to allow or deny traffic even within the instances of your Virtual Private Cloud. Changes to your network. Adding new resources to the network, enabling new firewalls to the network, can be strictly controlled using identity and access management. Just like other services on cloud platforms, these VPCs are scalable. If you want to expand your network, you can simply instantiate new VM instances and connect them to your Vpc. You can instantiate your Kubernetes clusters and add them to the same network. The hard limit for instances on your Vpc is 7000, which is huge. A single Vpc on the Google Cloud platform can contain up to 7000 virtual machines.
Let’s understand the basic structure of a project and where the Virtual Private Cloud fits in. Typically in your Google Cloud platform organization, you’ll have several different projects. Where a project corresponds to a billing unit, a project might represent the resources which belong to a certain team. The marketing team might have a separate project, or if you have an engineering team working on your website, they might have a separate project and so on. Each of these projects within it will have many networks for different functionality. It’s quite likely that within a project. You might have resources that you want isolated from each other.
You don’t want the virtual machines to be talking via internal IPS. You will then put them on different VPCs. So you might have more than one Vpc within a project, and you’ll have resources associated with each of these VPCs within every project. The Google Cloud platform allows you to create up to five VPCs. If you want to go beyond five networks, you might have to contact the Google Sales team and see if you can increase your quota. A single network has a hard limit of 7000 instances. This is a limit and not a quota. You can’t increase the number of VMs on your Vpc beyond 7000. Just like networks in the physical world, a VPC on the Google Cloud platform is divided into subnet.
A subnet is short for subnetwork and is an identifiably separate part of an organization’s network. Think of it as a logical partitioning of your network. A subnet can represent all machines, say in a particular building belonging to a particular team, machines which perform a particular job, and so on. A single virtual private cloud or Vpc can be made up of any number of subnets. That means it can be partitioned into any number of logical bits. A subnet on the Vpc has other typical characteristics. All machines which belong to a subnet are part of a defined IP address prefix range. Once this IP range has been defined for a subnet, whenever a new virtual machine is added to this subnet, it picks an address from this predefined range.
This IP address prefix range is specified in the Cider notation or the classless interdomain routing notation. This is a standard notation that is used for IP address prefix ranges for subnets across the world. We’ll take a brief look at the Cider notation in the next slide in just a little bit. The IP address ranges for every subnet which belong to the same network should be unique. There should be no overlap between subnets on the same network. We just mentioned earlier that VPCs are global resources. VPCs span multiple regions. That means VPCs can have instances from the US. As well as from Asia. But subnets are regional. So within a subnet you can have resources from multiple zones within a region, but you can’t have a subnet which spans the US as well as Europe.
Let’s take a brief look at the Cider notation. This Cider notation is a method for allocating IP addresses to instances. This was introduced in 1993 and it replaced an addressing mechanism which had led to an explosion of routing tables on the Internet. The goal of the Cider notation is to slow the growth of routing tables across the net and to slow the rapid exhaustion of iPV four addresses. Here is a typical representation of an IP address using Cider notation. The address is ten 123-9024. Now 24 represents the network prefix, which means the first 24 bits are associated with the network to which the subnet belongs.
Eight bits are required to represent the number between the dots in this iPV four address. The address range that is represented by this is all the IP addresses starting from ten dot 123, dot nine dot zero to ten dot 123, dot nine, dot 255. Notice that the first three bytes, the first 24 bits are the same, only the last eight bits change from zero to 255. The forward slash 24 represents to the network prefix the number of bits which make up the network address. You can also say that every subnet has a contiguous private RFC 1918 IP space. RFC 1918 represents the address allocation for private internet. This is the request for comment which basically specifies how you allocate addresses to virtual machines.
- Global VPCs, Regional Subnets
The defining characteristic of a VPC is the fact that all instances which rely on the Vpc, whether they lie on a single subnet or on multiple subnets, can address other instances on the same Vpc using internal IP addresses, provided that we’ve configured the routes and the firewall rules in the correct way. Every machine can address another one using its internal IP address if they are on the same Vpc. The point about firewall rules is extremely important. On the Google Cloud platform’s, Virtual Private Cloud, you can’t send traffic from one instance to another unless there is an explicit firewall rule which allows you to do so. The whole point of a VPC is to provide isolation from resources which are on other networks.
The fact that a VPC is isolated means that if you have resources on other networks which want to communicate with resources on your network, they have to use external IP addresses. These are external IP addresses which are visible on the Internet. They are not private IP addresses. The way you can think about this is within a network the resources communicate with each other directly using internal IPS because the resources typically trust each other. If you’re in the same network, you’re probably part of the same team or part of the same organization. Resources in other networks are basically exactly that resources which are external to you, which may be third party resources.
You don’t fully trust them, which is why resources in other networks will be treated just like any other external resource. This is true even if the other network is part of the same project. We’ve mentioned this earlier, but let’s understand this thoroughly now. VPCs are global, which means that the resources which live within VPCs can span different regions in the world. The computers that you see on screen are part of the same Vpc. They are in two different regions and three different zones. US East One A and US East One B are both different zones in the same region. Instances from here can belong to the same Vpc. These two instances can communicate with each other using internal IP addresses. US. East one and Europe west One are different regions.
They are different data centers located on completely separate continents. An instance in Europe West One can communicate with an instance in US East One A or one B using an internal IP address provided there on the same Vpc. Because this is not a physical network, it is a virtual network. The actual location of the instance is immaterial. VPCs are global, but subnets which make logical partitions of a virtual private network are regional. They are regional in that you can have instances within a subnet with span zones, but they cannot span regions. Now, if you have a VPC and you have these three instances that you see on screen, all of these three instances cannot be in the same subnet because subnets cannot span regions.
The issue in this example is that our instance in Europe West One cannot be on the same subnet as the instances in US East One A and US East One B. So you can have this Vpc split into two subnets. The instances in US East One A and US East One B can be in one subnet because subnets can have resources from multiple zones. US East One A and US East One B are in the same region but in different zones, and subnets are multi. Zonal they are not multiregional. You can also have all instances within a subnet be from a single zone, as in the case of the Europe West One in this example. This is an important point to remember as you set up your networks or you answer questions on your certification exams. Networks are global instances can be in different regions or zones and can belong to the same Vpc.
Subnets are regional instances can be in different zones, but they can’t be in different regions. Here is a block diagram of how a network might look when you set it up on the Google Cloud platform. There’s a lot going on here. Let’s look at it carefully. Starting from the very top, you have some traffic that is coming in from a customer site. It goes through the Internet till it finally reaches your network. On the Google Cloud platform, this entire box you see is the network and there are various external IP addresses which is capable of receiving external traffic. If you look at the top of this box that is drawn on screen, you can see there is an Internet gateway or VPN gateway which have been configured in order to receive traffic external to this network.
This network is made up of three different subnets. Subnet one is in Zone A in region one, and it spans the IP space ten 244. That is the Cider range for this subnet. The remaining two subnets, subnet two and subnet three in this network are in a different region, that is region two. We have subnet two, which is within zone A of region two, but subnet three spans two zones within region two. So subnet three has instances in Zone A as well as zone B. Each of these subnets also have their own IP address range associated with them. Virtual machine instances within a subnet have to have an IP address drawn from the IP address range associated with that subnet, and you can see that it’s true in this example. 1042 belongs to the subnet ones IP address range.
You can quickly take a look and see that this is true for subnet two as well as subnet three. The individual instances within this subnet have their IP addresses from the range that is associated with that subnet. Notice that the IP address range for all of these subnets are non overlapping. If they have to belong to the same Vpc, the subnets should have non overlapping IP addresses. Now, VPCs differ from traditional networks in a few different ways. First of all, they’re on the cloud machines, are located anywhere in the world, and so on. But in addition, traditional networks typically had a range of IP addresses assigned to it. So you’d say that the span of IP addresses for this network x, is, say, X.
Every subnet within this network was comprised of a smaller range within the larger network range. This was a necessary and required feature. Every subnet which belongs to a network had a subset of IP addresses which were assigned to the larger network. This was in a traditional network. So you can think of the traditional network as a prehierarchy. As you can see on screen, the box on top represents the IP ranges of the network as a whole. And the IP ranges are split at every level. And the subnets have a subset of IP ranges belonging to the larger network. Networks on the Gcp or the Virtual private clouds on the Gcp work a little differently. The entire network as a whole does not have an IP address range associated with it.
This is important. Every subnet within the Gcp have their own IP ranges, but the network as a whole does not have a larger range associated with it. The subnet IP ranges do not have to fit into the network’s larger IP range. A subnet has a contiguous IP range associated with it, but the network as a whole can be made up of a motley collection of subnets which have their own IP ranges. Because of this characteristic, subnets on the Gcp play a different role as compared to subnets on a physical network. Subnets on the Gcp are logical groupings, so you can have the marketing team, the engineering team, the HR department. Each of them have their own subnet. They have their own IP ranges.
So these subnetworks serve to distribute resources or separate resources, make logical partitions of the resources that are associated with the subnets. They don’t have to fit into a standard hierarchical prelike structure, like in the case of a physical network for routing traffic, load balancing, et cetera. On the Gcp, subnets are just logical groupings of resources. There are two basic types of Vpc networks that can be set up on the Google Cloud platform. The first is the auto mode. In the Auto Mode, when you set up a network, it automatically sets up a single subnet in every region. You can manually create more subnets if you want to, but the Auto Mode network automatically creates these subnets so that once you spin up your virtual machine instances and they can be in any region, they will be appropriately associated with the subnet in that region.
But it’s possible that you want much more control while creating a network. You don’t want a subnet in every region, for example. In that case, you set up a custom mode network. Nothing is set up by default. No subnets are created. You have to manually configure all subnets within the custom mode network. Now, you might be wondering, I’ve just said that every VM instance that you set up has to be associated with the network. But you haven’t created any networks up till now. You’ve already set up some VMs, you’ve set up some clusters and so on. So how did that work? This is because when you spin up a GCP project, it automatically creates a default network for you. This default network is an Auto mode network. This is set up by default. You don’t have to know anything about it.
Any VM instances that you create automatically become part of this default network. This default network, in addition to being an Auto mode network and having a subnet in every region, also comes with a bunch of other configuration that is set up. So you are just up and running with it right away. You don’t have to do any additional configuration to get it to work. It comes with a number of routes set up, comes with firewall rules which make sense. We’ll see what they are in just a little bit. All of these come pre configured in this default network. The creation of this default network is what gets you up and running on the Gcp without having to think about networking at all. If you’re more interested in setting up databases or using Bigquery and doing all the other stuff, or setting up clusters, you don’t have to worry about what network they are in. They’re automatically part of this default network.