AZ-303 Microsoft Azure Architect Technologies – Implement and Manage Data Platforms (10-15%)
- Azure CosmosDb Introduction
Maxostazure Cosmodb is a NoSQL database service native to Azure that focuses on providing a high performance database. The first question you might ask yourself is what is NoSQL? So, NoSQL is an alternative to traditional SQL for storing and managing data. So SQL databases would normally use table tables to store data in rows and columns. SQL systems also generally tend to split related data between tables in order to be more storage efficient.
So for example, in an order record, you might store the order details in one table, the customer details might be stored in another, and the order line details in yet another. The tables would then be joined or linked together using the appropriate keys on each table. However, as compute and storage became faster and cheaper, we could start storing this related data together rather than split across tables.
And this is where no SQL database has become of use. Depending on the actual Norsql implementation, using data can generally be stored as documents whereby related data is nested within a document. Azure Cosmodb. Actually offers multiple APIs and different models that can be used interchangeably for various application scenarios.
Azure Cosmos DB is a feature that is referred to as a turnkey global distribution. This means it automatically replicates data to other Azure data centers across the globe without the need to manually write, encode or build the infrastructure yourself. However, distributed databases that rely on replication for high availability, low latency or both make the fundamental trade off between the reconsistency versus availability latency and throughputs most commercially available distributed databases. Ask developers to choose between the two extreme consistency models. Strong consistency or eventual consistency?
The strong consistency model is the gold standard of data programmeability, but it adds a price of higher latency and reduced availability during failures. On the other hand, eventual consistency offers higher availability and better performance, but makes it hard to therefore program the applications juicosmos DB approaches data consistency as a spectrum of choices instead of two extremes. Strong consistency and eventual consistency are at opposite ends of the spectrum, but there are many consistency choices between.
Developers can use these options to make precise choices and granular trade offs with respect to high availability versus performance. With the Zurich Cosmos DB, developers can choose from five well defined consistency models on this spectrum, from strongest to more relaxed. The models include strong bounded, staleness session, consistent prefix and eventual consistency. The models are well defined and intuitive and can be used for specific real world scenarios. Each model provides availability and performance trade offs and is backed by the appropriate SLAs. Azure Cosmos DB can be accessed using five different APIs.
The underlying data structure in Azure Cosmos DB is a data model based on atom record sequences that enable Cosmos DB to support multiple data models. Because of the flexible nature of atom record sequences, Azure Cosmodb will be able to support many more models and APIs over time. This in turn means if you are migrating away from an existing database technology, you can more quickly move to Cosmos DB by selecting an API that matches your skill set or previous implementation. The current implementations are MongoDB API, table API, Gremlin, Apache, Cassandra and SQL API. MongoDB API is, of course, acts as a massively scalable MongoDB service and is compatible with existing MongoDB libraries, drivers and tools. The Table API is a key value database service built to provide premium capabilities for example, automatic indexing, guaranteed low, latency and global distribution, and it works with existing table storage applications without making any changes.
The Gremlin API is a fully managed, horizontal scalable graph database service that makes sense to build and run applications that work with highly connected data sets supporting the open Graph APIs. The Cassandra API is a globally distributed Apache Cassandra service powered by Cosmos DB, and it’s compatible with existing Apache Cassandra libraries, drivers and tools. And then we have the SQL API. This is a JavaScript and JavaScript object notation or JSON native API based on the Cosmodb engine. The SQL API also provides query capabilities rooted in the familiar SQL query language.
In other words, you can use SQL to query documents based on their Identifiers or make deeper queries based on practice of the document. After you’ve created Azure Cosmos DB account under your subscription, you can manage data in your account by creating databases, containers and items. An Azure Cosmos container is the unit of scalability both for provision throughput and storage. A container is horizontally partitioned and then replicates across multiple regions. The items that you ran to the container and the throughput that you provision on it are automatically distributed across a set of logical partitions based on a partition key.
When you create a container, you configure throughput on one of the following modes either dedicated provision throughput, whereby the throughput is provisioned on a container and is exclusively reserved for that container, or shared provision throughput. In this case, containers shared vision throughputs with other containers you can configure shared and dedicated throughputs only when creating the database and container to switch from dedicated throughput mode to shared throughput and vice versa. After the container is created, you have to create a new container and migrate the data in it. So, depending on which API you use, Cosmos items can be represented either as a document in a collection, a row in a table, or a Nod or edge on a graph. Azure Cosmos items support insert replace delete of certain read operations, and you can use any of the Cosmos APIs to perform those operations. So it’s important to understand that Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of your application.
In partitioning, the items in a container are divided into distinct subsets called logical partitions. Logical partitions are formed based on the value of a partition key that’s associated with each item in a container. All items in a logical partition have the same partition key value. So, for example, if a container holds items, each item might have a unique value for a User ID puffy. If the user ID would serve as the partition key for the items in the container, there will be a thousand unique User ID values and therefore 1000 logical partitions would be created for the container. Therefore, in this case, using the user ID would not make sense. Instead, we need to find a better feel for grouping of an example in a user database may be the use of City data.
In this way, partitioning groups users based on the city that they live in. In addition to a partition key that determines an item’s logical partition, each item will also have an item ID. Combining the partition key and the item ID creates the items index which uniquely identifies the item juicy. Cosmos DB transparently and automatically manages the placement of logical partitions on physical partitions to efficiently satisfy the scalability and performance needs of your container. As the throughput and storage requirements of an application increase, Cosmodb moves logical partitions to automatically spread the load across a greater number of servers. With Cosmodb, you pay for the throughput you provision and the storage you consume on an hourly basis. Throughput must be provisioned to ensure that sufficient system resources are available for your Cosmodb database at all times. Cosmos DB supports many APIs, such as SQL, Nongodb and so on, and each API has its own set of data operations. These operations range from simple point reads and writes to complex queries. Each database operation consumes system resources based on the complexity of the operation. The cost of all database operations is therefore normalized and is expressed as a Request unit or Ru for short. You can think of Rus per Second as the currency fall throughput.
Rus per Second is a rate based currency. It abstracts the system resources such as CPUs, IOPS and membrane that are required to perform the database operations supported by Cosmos DB. The cost to read 1 Request unit. A minimum of ten request units is required to store each 1GB of data. All of the database operations are similarly assigned across using Rus. No matter which API used to interact with your Cosmos DB, container costs are always measured in ruse. Whether the database operation is read, write, query costs are always measured the same. With the theory out the way, let’s go ahead and spin up a Cosmo DB and see how we can interact with it.
- Azure CosmosDb Walkthrough
Let’s have a walk through Creating and configuring a Cosmos DB. As usual let’s go. To create resource and again, Cosmos DB generally is in the popular Quick Starts, or if it’s not there, simply do a search for Cosmos and choose Azure Cosmos DB. Go through the usual steps. We’ll create a new resource group and we’ll give it a name. We now need to tell it which API we want to use. It defaults to the core, which is SQL, but as discussed, we can choose different APIs. So for example, the MongoDB API, Cassandra, Azure Table and so on could also enable Notebooks, but we’ll leave that off for now. Secondly, you can apply a free tier discount. Obviously we’ll go for that for now. And as we can see, as it says here, we get a very small database for free. You can only have one account per subscription, so make sure that’s applied and then go and select the location. We now have the options of configuring things like georedunds multiregion rights and availability zones. I’m going to leave them all disabled for now and we’ll come back to those shortly.
As with some of the other past services, we can configure the networking to either be a public endpoint, a private endpoint, or to allow all networks. Again, for now we’ll just allow all networks. Going to ignore the tags and then go and click Create once validated. Once that’s completed, we can go to the Resource and have a look through the configuration. By default, it will take this quick start where you can go and start creating things. I just want to go to the Overview page first to have a look at the basic configuration we’ve set up. We’ve got the usual information and then we’ve got some monitoring. And we’ve also got this link here to go to a Data Explorer.
We’re going to go there first because we’re going to actually create some databases and tables within here. So at the moment we’ve got our SQL API, but there’s nothing in it. The first thing we want to do is create a container. Let’s go ahead and say New Container and I’m going to call it to Do List. I’m going to take these options as the default. So provision Database throughput with 400 reviews and then we’re going to create a container name which we will call Items. So we container ID, which we will call items. I’m going to leave Indexing as Automatic and finally we’re going to set a partition key. This is going to be a do item list that will contain a category that we’ll use for partitions and we’ll click OK. Without a do list there we can have a look, go to Items and Items and this will bring up an Explorer so we can see what data we’ve got in here.
And at the moment we don’t actually have anything. Let’s go and create some items up here. We’ll go to new item and I’m just going to copy some JSON into here. This is basically going to create a record. So it’s got an ID of one. We’re going to set the category to personal a name and a description and then is complete. So once we’ve got that paste in, I’m going to click Save. This has created the record here. We’ll also see that it’s returned some data here. So as well as the items that we’ve put in, there’s, some in built fields that Cosmos DB uses in the background, again for its inner working. So you don’t need to worry about those. But it is important to understand that they’re there.
Let’s go ahead and create another item now and again save that and again it creates the record. So in here, this Explorer, we can now see we’ve got the two records, one for a personal category, one for a work category. We can now start querying this like a normal database. So if we go up to this top menu bar here, we’ll see this new SQL query and we can start issuing TSQL type commands. So a select Start from C and this returns all our records. And we can start using sample query syntax. So for example, we could say order by C dot name, execute that, then let’s reverse the order because it’s now ordering by name and similar we can put filters in. For example, we can say where C category equals work and that will just return our category of work. So that’s great.
So with some basic data in there, let’s have a look at some of the options that we have. So the first we’ll look at is replication. So when we configured this, we told it not form any kind of replication either globally or otherwise. So in the settings, if we go to the replicate globally option, it’s in here where we can configure multiregion writes. So we have two options. The multiregion writes or availability zones. So if we take Availability Zones, it would then simply replicate our database within Availability Zones within that region. The alternative is we can also say Multiregion Rights. If we say enable multiregion rights, it then gives us this option to add a region.
We can go ahead and select a region and click OK and then click Save. So this is now going to go and create a Cosmos DB account in the region that we’ve selected and start to replicate the data. That’ll take a few minutes. Once that’s completed, we can see we’ve now got replication with region abled across both these regions. And in fact, you can create additional regions. So you can have a click here, or indeed you can click on these pluses within the map. So that gives you a really great representational idea of where your database are going to get replicated. By doing this you get not only resilience, but you get to put the data that’s closest to your customers. Before we leave this, let’s just have a look at some of the other default options.
So one of them is default consistency. So again when we talked about the different consistencies you can have, this is where you can change these different consistencies by simply setting the options that you require. We can also configure the firewalls and virtual networks here. Again if you wanted to lock it down to set IPS or to internal ranges. And we now have a relatively new offering which is the private endpoint connections which is a way of assigning an internal IP address to these cosmodb clusters so you can access them from on premises. Before we leave this lecture, what I’m actually going to do is delete this account because we won’t be using it again. Deleting the account will delete not only this account but all its replicas as well. So go ahead and click delete and then once that’s finished it’s not going to be costing any more money.