DP-203: Data Engineering on Microsoft Azure Certification Video Training Course
The complete solution to prepare for for your exam with DP-203: Data Engineering on Microsoft Azure certification video training course. The DP-203: Data Engineering on Microsoft Azure certification video training course contains a complete set of videos that will provide you with thorough knowledge to understand the key concepts. Top notch prep including Microsoft Azure DP-203 exam dumps, study guide & practice test questions and answers.
DP-203: Data Engineering on Microsoft Azure Certification Video Training Course Exam Curriculum
Introduction
-
1. IMPORTANT - How we are going to approach the exam objectives3:00
-
2. OPTIONAL - Overview of Azure2:00
-
3. OPTIONAL - Concepts in Azure4:00
-
4. Azure Free Account5:00
-
5. Creating an Azure Free Account5:00
-
6. OPTIONAL - Quick tour of the Azure Portal6:00
Design and implement data storage - Basics
-
1. Section Introduction2:00
-
2. Understanding data4:00
-
3. Example of data storage2:00
-
4. Lab - Azure Storage accounts6:00
-
5. Lab - Azure SQL databases15:00
-
6. A quick note when it comes to the Azure Free Account4:00
-
7. Lab - Application connecting to Azure Storage and SQL database11:00
-
8. Different file formats7:00
-
9. Azure Data Lake Gen-2 storage accounts3:00
-
10. Lab - Creating an Azure Data Lake Gen-2 storage account9:00
-
11. Using PowerBI to view your data7:00
-
12. Lab - Authorizing to Azure Data Lake Gen 2 - Access Keys - Storage Explorer6:00
-
13. Lab - Authorizing to Azure Data Lake Gen 2 - Shared Access Signatures8:00
-
14. Azure Storage Account - Redundancy11:00
-
15. Azure Storage Account - Access tiers9:00
-
16. Azure Storage Account - Lifecycle policy3:00
-
17. Note on Costing5:00
Design and implement data storage - Overview on Transact-SQL
-
1. Section Introduction2:00
-
2. The internals of a database engine4:00
-
3. Lab - Setting up a new Azure SQL database3:00
-
4. Lab - T-SQL - SELECT clause3:00
-
5. Lab - T-SQL - WHERE clause3:00
-
6. Lab - T-SQL - ORDER BY clause1:00
-
7. Lab - T-SQL - Aggregate Functions1:00
-
8. Lab - T-SQL - GROUP BY clause4:00
-
9. Lab - T-SQL - HAVING clause1:00
-
10. Quick Review on Primary and Foreign Keys4:00
-
11. Lab - T-SQL - Creating Tables with Keys3:00
-
12. Lab - T-SQL - Table Joins5:00
Design and implement data storage - Azure Synapse Analytics
-
1. Section Introduction2:00
-
2. Why do we need a data warehouse10:00
-
3. Welcome to Azure Synapse Analytics2:00
-
4. Lab - Let's create a Azure Synapse workspace3:00
-
5. Azure Synapse - Compute options3:00
-
6. Using External tables4:00
-
7. Lab - Using External tables - Part 19:00
-
8. Lab - Using External tables - Part 212:00
-
9. Lab - Creating a SQL pool7:00
-
10. Lab - SQL Pool - External Tables - CSV9:00
-
11. Data Cleansing4:00
-
12. Lab - SQL Pool - External Tables - CSV with formatted data3:00
-
13. Lab - SQL Pool - External Tables - Parquet - Part 14:00
-
14. Lab - SQL Pool - External Tables - Parquet - Part 27:00
-
15. Loading data into the Dedicated SQL Pool2:00
-
16. Lab - Loading data into a table - COPY Command - CSV11:00
-
17. Lab - Loading data into a table - COPY Command - Parquet3:00
-
18. Pausing the Dedicated SQL pool3:00
-
19. Lab - Loading data using PolyBase5:00
-
20. Lab - BULK INSERT from Azure Synapse6:00
-
21. My own experience6:00
-
22. Designing a data warehouse11:00
-
23. More on dimension tables5:00
-
24. Lab - Building a data warehouse - Setting up the database6:00
-
25. Lab - Building a Fact Table8:00
-
26. Lab - Building a dimension table6:00
-
27. Lab - Transfer data to our SQL Pool15:00
-
28. Other points in the copy activity2:00
-
29. Lab - Using Power BI for Star Schema6:00
-
30. Understanding Azure Synapse Architecture7:00
-
31. Understanding table types7:00
-
32. Understanding Round-Robin tables5:00
-
33. Lab - Creating Hash-distributed Tables5:00
-
34. Note on creating replicated tables1:00
-
35. Designing your tables4:00
-
36. Designing tables - Review4:00
-
37. Lab - Example when using the right distributions for your tables10:00
-
38. Points on tables in Azure Synapse2:00
-
39. Lab - Windowing Functions4:00
-
40. Lab - Reading JSON files5:00
-
41. Lab - Surrogate keys for dimension tables6:00
-
42. Slowly Changing dimensions4:00
-
43. Type 3 Slowly Dimension dimension2:00
-
44. Creating a heap table3:00
-
45. Snowflake schema1:00
-
46. Lab - CASE statement6:00
-
47. Partitions in Azure Synapse2:00
-
48. Lab - Creating a table with partitions11:00
-
49. Lab - Switching partitions7:00
-
50. Indexes6:00
-
51. Quick Note - Modern Data Warehouse Architecture2:00
-
52. Quick Note on what we are taking forward to the next sections2:00
-
53. What about the Spark Pool2:00
Design and Develop Data Processing - Azure Data Factory
-
1. Section Introduction1:00
-
2. Extract, Transform and Load2:00
-
3. What is Azure Data Factory5:00
-
4. Starting with Azure Data Factory2:00
-
5. Lab - Azure Data Lake to Azure Synapse - Log.csv file13:00
-
6. Lab - Azure Data Lake to Azure Synapse - Parquet files13:00
-
7. Lab - The case with escape characters8:00
-
8. Review on what has been done so far6:00
-
9. Lab - Generating a Parquet file5:00
-
10. Lab - What about using a query for data transfer6:00
-
11. Deleting artefacts in Azure Data Factory3:00
-
12. Mapping Data Flow5:00
-
13. Lab - Mapping Data Flow - Fact Table14:00
-
14. Lab - Mapping Data Flow - Dimension Table - DimCustomer15:00
-
15. Lab - Mapping Data Flow - Dimension Table - DimProduct10:00
-
16. Lab - Surrogate Keys - Dimension tables4:00
-
17. Lab - Using Cache sink9:00
-
18. Lab - Handling Duplicate rows8:00
-
19. Note - What happens if we don't have any data in our DimProduct table4:00
-
20. Changing connection details1:00
-
21. Lab - Changing the Time column data in our Log.csv file8:00
-
22. Lab - Convert Parquet to JSON5:00
-
23. Lab - Loading JSON into SQL Pool5:00
-
24. Self-Hosted Integration Runtime3:00
-
25. Lab - Self-Hosted Runtime - Setting up nginx9:00
-
26. Lab - Self-Hosted Runtime - Setting up the runtime7:00
-
27. Lab - Self-Hosted Runtime - Copy Activity7:00
-
28. Lab - Self-Hosted Runtime - Mapping Data Flow16:00
-
29. Lab - Processing JSON Arrays8:00
-
30. Lab - Processing JSON Objects6:00
-
31. Lab - Conditional Split6:00
-
32. Lab - Schema Drift12:00
-
33. Lab - Metadata activity14:00
-
34. Lab - Azure DevOps - Git configuration11:00
-
35. Lab - Azure DevOps - Release configuration11:00
-
36. What resources are we taking forward1:00
Design and Develop Data Processing - Azure Event Hubs and Stream Analytics
-
1. Batch and Real-Time Processing5:00
-
2. What are Azure Event Hubs5:00
-
3. Lab - Creating an instance of Event hub7:00
-
4. Lab - Sending and Receiving Events10:00
-
5. What is Azure Stream Analytics2:00
-
6. Lab - Creating a Stream Analytics job4:00
-
7. Lab - Azure Stream Analytics - Defining the job10:00
-
8. Review on what we have seen so far8:00
-
9. Lab - Reading database diagnostic data - Setup4:00
-
10. Lab - Reading data from a JSON file - Setup6:00
-
11. Lab - Reading data from a JSON file - Implementation5:00
-
12. Lab - Reading data from the Event Hub - Setup7:00
-
13. Lab - Reading data from the Event Hub - Implementation8:00
-
14. Lab - Timing windows10:00
-
15. Lab - Adding multiple outputs4:00
-
16. Lab - Reference data5:00
-
17. Lab - OVER clause8:00
-
18. Lab - Power BI Output10:00
-
19. Lab - Reading Network Security Group Logs - Server Setup3:00
-
20. Lab - Reading Network Security Group Logs - Enabling NSG Flow Logs8:00
-
21. Lab - Reading Network Security Group Logs - Processing the data13:00
-
22. Lab - User Defined Functions9:00
-
23. Custom Serialization Formats3:00
-
24. Lab - Azure Event Hubs - Capture Feature7:00
-
25. Lab - Azure Data Factory - Incremental Data Copy11:00
-
26. Demo on Azure IoT Devkit5:00
-
27. What resources are we taking forward1:00
Design and Develop Data Processing - Scala, Notebooks and Spark
-
1. Section Introduction2:00
-
2. Introduction to Scala2:00
-
3. Installing Scala6:00
-
4. Scala - Playing with values3:00
-
5. Scala - Installing IntelliJ IDE5:00
-
6. Scala - If construct3:00
-
7. Scala - for construct1:00
-
8. Scala - while construct1:00
-
9. Scala - case construct1:00
-
10. Scala - Functions2:00
-
11. Scala - List collection4:00
-
12. Starting with Python3:00
-
13. Python - A simple program2:00
-
14. Python - If construct1:00
-
15. Python - while construct1:00
-
16. Python - List collection2:00
-
17. Python - Functions2:00
-
18. Quick look at Jupyter Notebook4:00
-
19. Lab - Azure Synapse - Creating a Spark pool8:00
-
20. Lab - Spark Pool - Starting out with Notebooks9:00
-
21. Lab - Spark Pool - Spark DataFrames4:00
-
22. Lab - Spark Pool - Sorting data6:00
-
23. Lab - Spark Pool - Load data8:00
-
24. Lab - Spark Pool - Removing NULL values8:00
-
25. Lab - Spark Pool - Using SQL statements3:00
-
26. Lab - Spark Pool - Write data to Azure Synapse11:00
-
27. Spark Pool - Combined Power2:00
-
28. Lab - Spark Pool - Sharing tables4:00
-
29. Lab - Spark Pool - Creating tables5:00
-
30. Lab - Spark Pool - JSON files6:00
Design and Develop Data Processing - Azure Databricks
-
1. What is Azure Databricks4:00
-
2. Clusters in Azure Databricks6:00
-
3. Lab - Creating a workspace3:00
-
4. Lab - Creating a cluster14:00
-
5. Lab - Simple notebook3:00
-
6. Lab - Using DataFrames4:00
-
7. Lab - Reading a CSV file4:00
-
8. Databricks File System2:00
-
9. Lab - The SQL Data Frame3:00
-
10. Visualizations1:00
-
11. Lab - Few functions on dates2:00
-
12. Lab - Filtering on NULL values2:00
-
13. Lab - Parquet-based files2:00
-
14. Lab - JSON-based files3:00
-
15. Lab - Structured Streaming - Let's first understand our data3:00
-
16. Lab - Structured Streaming - Streaming from Azure Event Hubs - Initial steps8:00
-
17. Lab - Structured Streaming - Streaming from Azure Event Hubs - Implementation10:00
-
18. Lab - Getting data from Azure Data Lake - Setup7:00
-
19. Lab - Getting data from Azure Data Lake - Implementation5:00
-
20. Lab - Writing data to Azure Synapse SQL Dedicated Pool5:00
-
21. Lab - Stream and write to Azure Synapse SQL Dedicated Pool5:00
-
22. Lab - Azure Data Lake Storage Credential Passthrough10:00
-
23. Lab - Running an automated job6:00
-
24. Autoscaling a cluster2:00
-
25. Lab - Removing duplicate rows3:00
-
26. Lab - Using the PIVOT command4:00
-
27. Lab - Azure Databricks Table5:00
-
28. Lab - Azure Data Factory - Running a notebook6:00
-
29. Delta Lake Introduction2:00
-
30. Lab - Creating a Delta Table5:00
-
31. Lab - Streaming data into the table3:00
-
32. Lab - Time Travel2:00
-
33. Quick note on the deciding between Azure Synapse and Azure Databricks2:00
-
34. What resources are we taking forward1:00
Design and Implement Data Security
-
1. Section Introduction1:00
-
2. What is the Azure Key Vault service5:00
-
3. Azure Data Factory - Encryption5:00
-
4. Azure Synapse - Customer Managed Keys3:00
-
5. Azure Dedicated SQL Pool - Transparent Data Encryption2:00
-
6. Lab - Azure Synapse - Data Masking10:00
-
7. Lab - Azure Synapse - Auditing6:00
-
8. Azure Synapse - Data Discovery and Classification4:00
-
9. Azure Synapse - Azure AD Authentication3:00
-
10. Lab - Azure Synapse - Azure AD Authentication - Setting the admin4:00
-
11. Lab - Azure Synapse - Azure AD Authentication - Creating a user8:00
-
12. Lab - Azure Synapse - Row-Level Security7:00
-
13. Lab - Azure Synapse - Column-Level Security4:00
-
14. Lab - Azure Data Lake - Role Based Access Control7:00
-
15. Lab - Azure Data Lake - Access Control Lists7:00
-
16. Lab - Azure Synapse - External Tables Authorization via Managed Identity8:00
-
17. Lab - Azure Synapse - External Tables Authorization via Azure AD Authentication5:00
-
18. Lab - Azure Synapse - Firewall7:00
-
19. Lab - Azure Data Lake - Virtual Network Service Endpoint7:00
-
20. Lab - Azure Data Lake - Managed Identity - Data Factory6:00
Monitor and optimize data storage and data processing
-
1. Best practices for structing files in your data lake3:00
-
2. Azure Storage accounts - Query acceleration2:00
-
3. View on Azure Monitor7:00
-
4. Azure Monitor - Alerts8:00
-
5. Azure Synapse - System Views2:00
-
6. Azure Synapse - Result set caching6:00
-
7. Azure Synapse - Workload Management4:00
-
8. Azure Synapse - Retention points2:00
-
9. Lab - Azure Data Factory - Monitoring7:00
-
10. Azure Data Factory - Monitoring - Alerts and Metrics4:00
-
11. Lab - Azure Data Factory - Annotations3:00
-
12. Azure Data Factory - Integration Runtime - Note7:00
-
13. Azure Data Factory - Pipeline Failures3:00
-
14. Azure Key Vault - High Availability2:00
-
15. Azure Stream Analytics - Metrics3:00
-
16. Azure Stream Analytics - Streaming Units2:00
-
17. Azure Stream Analytics - An example on monitoring the stream analytics job11:00
-
18. Azure Stream Analytics - The importance of time7:00
-
19. Azure Stream Analytics - More on the time aspect6:00
-
20. Azure Event Hubs and Stream Analytics - Partitions5:00
-
21. Azure Stream Analytics - An example on multiple partitions7:00
-
22. Azure Stream Analytics - More on partitions4:00
-
23. Azure Stream Analytics - An example on diagnosing errors4:00
-
24. Azure Stream Analytics - Diagnostics setting6:00
-
25. Azure Databricks - Monitoring7:00
-
26. Azure Databricks - Sending logs to Azure Monitor3:00
-
27. Azure Event Hubs - High Availability6:00
About DP-203: Data Engineering on Microsoft Azure Certification Video Training Course
DP-203: Data Engineering on Microsoft Azure certification video training course by prepaway along with practice test questions and answers, study guide and exam dumps provides the ultimate training package to help you pass.
Design and implement data storage – Basics
12. Lab - Authorizing to Azure Data Lake Gen 2 - Access Keys - Storage Explorer
Hi, and welcome back. Now in this chapter, I just want to show you how you can use a tool known as the Storage Explorer to explore your storage accounts. So if you have employees in an organisation that only need to access storage accounts within their Azure account, instead of actually logging into the Azure Portal, if they only want to look at the data, they can make use of the Azure Storage Explorer.
This is a free tool that is available for download, so they can go ahead and download the tool. It's available for a variety of operating systems. I've already gone ahead and downloaded and installed the tool. It's a very simple installation. Now, as soon as you open up Microsoft Azure Storage Explorer, you might be prompted to connect to an Azure resource. So, if you don't see the screen, you can log in using the subscription option. If you're wondering what the Azure Store Explorer looks like, this is it. You can go on to the Manage Accounts section over here and click on "Add an account," and you'll get the same screen. I'll choose a subscription. I'll choose Azure.
I'll go on to the next one. You will need to sign on to your account. So I'll use my account information as your admin account information. Now, once we are authenticated, I'll just choose my test environment subscription. I'll press "apply." So I have many subscriptions in place. Now, under my test environment subscription, I can see all of my storage accounts. If I actually go on to Data Store 2000 here, I can see my blog containers, and I can go on to my data containers. I can see all of my image files. If I go on to do Lake 2000, onto that storage account, onto Blob containers, onto my data container, onto my raw folder, I can see my JSON file. Here, I can download the file. I can upload new objects onto the container. So the Azure Storage Explorer is an interface that allows you to work with not only your Azure storage accounts but also with your Data Lake storage accounts as well.
Now that we have logged in as your administrator, There are other ways you can authorise yourself to work with storage accounts. One way is to use access keys. We can see all of the storage accounts here. But let's say you want a user to only see a particular storage account. One way is to make use of the access keys that are linked to a storage account.
If I go back onto my Data Lake Generation 2 storage account here, if I scroll down onto the Security and Networking section, there is something known as access keys. If I go on to the access keys, let me go ahead and just hide this. I click on "Show keys," and here I have Key 1. So we have two keys in place for a storage account. You have key one, and you have key two. A person can actually authorise themselves to use the storage account using this access key.
So here I can take the key I copied to the clipboard and open it in your storage explorer. I'll go back on to manage accounts. Here, I'll add an account. I'll choose a storage account. It says account name and key here. I'll go on to the next one. I'll paste in the account key. You'll need to give the account name. I'll go back to Xiao. I can copy the account name from here. I can place it over here. place the same as the display name. Go on to Next and hit Connect. Now, here in the local and attached storage accounts, I can see my data lake Gen 2 storage account, so I can still have a view of all of my storage accounts that are part of my Azure admin over here. But at the same time, I can only see my data lake Gen 2 storage account.
If I go onto my blog containers, onto my data containers, onto my raw folder here, I can see my JSON file. I said, if you want, you can go ahead and even download the JSON file locally so you can select the location, click on the select folder, and it will transfer the file from the data lake in your Gentle Storage account. So this is one way of allowing users to authorise themselves to use the Gen 2 storage account.
13. Lab - Authorizing to Azure Data Lake Gen 2 - Shared Access Signatures
Now in the private chapter, I've shown how we could connect to a storage account. That is basically our deal. Storage Account using Access Keys As I mentioned before, there are different ways in which you can authorise access to a data lake storage account.
Now, when it comes to security, if you look at the objectives for the exam, the security for the services actually falls in the section of "design and implement data security." But at this point in time, I want to show the concept of using something known as shared access signatures to authorise or use an account as your daily lake storage account. The reason I want to show this at this point in time is because when we look at Azure Synapse, we are going to see how to use access keys and share access signatures to connect and pull out data from an Azure data lake generation 2 storage account.
And that's why, at this point in time, I want to show how we can make use of shared access signatures for authorising ourselves to use your Gen 2 storage account. So, going back to our resources, I'll go on to our data lake storage account. Now, if I scroll down, in addition to access keys when it comes to security and networking, we also have something known as a shared access signature. I'll go on to it; let me go ahead and hide this. Now, with the help of a shared access signature, you can actually grant selective access to the services that are present in your storage account with an access key.
So remember, in the last chapter, we had gone ahead and connected via an access key to a storage account. Now with the access key, the user can go ahead and work with not only the Blob service but also file shares, queues, and the table service as well. So these are all the services that are available as part of the storage account. But if you want to limit the access to just a particular service, let's say that you are going to get the shared access signature onto a user, and you want that user to only have the ability to access the Blob service in the storage account. So with that, you can actually make use of shared access signatures. Here, what you'll do is that in the allowed services, you will just unselect the file queue and the table service so that the shared access signature can only be used for the Blob service.
In the allowed resource types, I need to get access to the service itself, I need to give access for the user to have the ability to see the container in the Blob service, and I also need to give access to the objects themselves. So I'll select all of them. In terms of the allowed permissions, I can go ahead and give selective permissions. So in terms of the permissions, I just want to use it to have the ability to list the blocks and read the blogs in the Azure Data Lake Gentle Storage account.
I won't do anything or give permissions when it comes to enabling the deletion of versions. So I'll leave it as it is. With the shared access signature, you can also give a start and expiration date time. That means that after the end date and time, this shared access signature will not be valid anymore. You can also specify which IP addresses will be valid for this shared access signature. At the moment, I'll leave. Everything has this. I'll scroll down here. It will use one of the access keys from the storage account to generate the shared access signature.
So here I'll go ahead and click on this button for "Generate SAS and Connection String." And here we have something known as a connection string, the SAS token, and the Blob service SAS URL. The SAS token is something that we are going to use when we look at connecting onto the Data Lake Gen 2 storage account from Azure Synapse. At this point, let's see how to now connect to this Azure Data Lake Gen 2 storage account using a shared access signature. If I return to the Storage Explorer, the first thing I'll do is right-click on the attached storage account, which we've already done, while the access key is selected, and select Detach. So I'll say yes. Now I want to again connect to the storage account, but this time using the shared access signature.
So I'll go on to manage accounts. I'll add an account. Here I'll choose the storage account. And here I'll choose "share access signature." I'll continue to next year, but you must provide the SAS connection string. So I'll either copy this entire connection string or I can also go ahead and copy the Blob service SAS URL. So let me go ahead and copy the service's SAS URL. I'll place it over here. I'll just paste it. You can see the display name. I'll go on to the next page, and I'll go ahead and hit Connect.
So, in terms of the Data Lake, you can now see. I am connected by the SAS shared access signature. And as you can see, I can only access the Blob containers. I don't have access to the table service, the queue service, or the file sharing service. As a result, we are now restricting access to the Blob service only. at the same time. Remember that I mentioned that this particular shared access feature would not be valid after this date and time? So if you want to give some sort of validity to this particular shared access signature, something that you can actually specify over here, So I said the main point of this particular chapter was to explain to students the concept of a shared access signature. So there are different ways in which you can authorise yourself to use a storage account.
When it comes to Azure services, there are a lot of security features available for how you can access the service. It should not be the case that the service is open to everyone. There has to be some security in place, and there aren't different ways in which you can actually authorise yourself to use a particular service in Azure. Right, so this marks the end of this chapter. As I mentioned before, we are looking at using a shared access signature. In later chapters, we look at it as your synapse.
14. Azure Storage Account – Redundancy
Hi, and welcome back. Now in this chapter, I want to go through the concept of Azure Storage account redundancy. So when it comes to Azure Services, they always build the service with high availability in mind. And the same is true when it comes to the Azure Storage account. So by default, when you store data in an Azure Storage account—let's say you restore data using the Blob service—multiple copies of your data are actually stored. This actually helps to protect against any planned or unplanned events. View your data after uploading it to an Azure Storage account. In the end, it's going to be stored on some sort of storage device in the underlying Azure data center.
The data centre houses all of the physical infrastructure required to host your data and provide services. And no one can actually guarantee the 100% availability of all physical infrastructure. Something can go wrong. Something can actually go wrong because there are points of failure. There could be a network failure. There could be a hard drive failure, or there could be a power outage. So there are so many things that can actually happen. So in such events, there are different redundancy options to keep your data in check. We had actually seen this redundancy option when you are creating Azure data.
Lake Gentle Storage account. So if I go back onto Azure quickly, if I go ahead and create a new resource, I'll scroll down and choose Storage account. So when it came to redundancy, there were many options in place. You had locally redundant storage, georedundant storage, zone redundant storage, and geo-zone redundant storage. So many options are in place. And I'm going to give an overview of what all of these options mean. So, first, we have locally redundant storage. When you have an Azure Storage account, let's assume the storage account is in the central US. Location. When you upload an object to the storage account, three copies of your data are made. All of this data is within one data center. So this helps to protect against server rack or drive failures.
So if there is any sort of drive failure, So let's say one storage device were to go down within the data center; the other storage devices would still be available and have copies of your data. which means that in the end, your data is still available. So the lowest redundancy option that is available is locally redundant storage. But obviously, companies are looking for much more redundancy when it comes to critical data. So that's why there are other options that are also in place. Zone redundant storage is one option available. With locally redundant storage, what happens if the entire data centre were to go down? That means your object will not be available. But in the case of zone-redundant storage, your data is replicated synchronously across three availability zones.
Now, an availability zone is just a separate physical location that has independent power, cooling, and networking. So now your object is actually distributed across different data centers. These data centres are displayed across these different availability zones. So now, even if one data centre were to go down, you would still have your object in place. But now let's say that the entire region goes down to the central US. That means, again, all your ability zones are no longer available. And as I mentioned, for companies that are hosting critical data, it is very important for them to have their data in place all the time, so they can opt for something known as "geo-redundant storage." What happens now is that your data is replicated to a different region entirely.
So, if your primary location is in the central United States, the LRS technique is used to create three copies of your data. That's the locally written and stored technique. At the same time, your data is copied to another paired location. So over here, the Central US location is actually paired by Azure with the East US location. So now your data is also available in another region. And over here again, in this secondary region, your data is copied three times using the LRS technique. As a result, even if the central US location went down, you could still access your data in the east US location. So, in the background, the storage service will switch from the central United States to the east United States.
So we have a lot of replication options and redundancy options in place. But remember, in all of these options, cost is also a factor. Over here, you'll be paying twice the cost for storage. So by storing your data in the primary location and storing your data in the secondary location, you will also be paying for bandwidth costs. So the data transfer that is happening from the primary location to the secondary location is something that you also need to pay for. When I said that for large organisations that require data to be available at all times in order to function properly, the benefit of having this in place far outweighs the cost of having geo-redundant storage in place. So it all depends on the needs of the business. Another type of geo-redundant storage is basically read access. Geo Redundant Storage:
The primary distinction here is that in plain georedundant storage, the data in the secondary location is only made available if the primary region fails. Whereas if you look at Access geo-redundant storage here, your data is available at the same time in both the primary and secondary location. So your applications can read data not only from the primary location but from the secondary location as well. So this is the biggest difference. And then we have something known as "GeoZone redundant storage."
Also, read Access. GeoZone Redundant Storage In GeoZone redundant storage, the primary fact is that in the primary region itself, your data is distributed across different availability zones. If you actually looked at plain geo-redundant storage here in the primary region, your data was copied three times using LRS. But in the zone of redundant storage in the primary region, your data is copied across multiple availability zones. So over here, the data is actually made much more available in the primary region, whereas in the secondary region, it is again just replicated using LRS. So again, there are different options when it comes to data redundancy. So I said that if you go on to your storage account, you can actually go ahead and basically choose what redundancy option you want for an existing storage account.
If I go on to all resources, if I go on to my view, if I go on to my daily Lake Gen 2 storage account, then currently the replication technique is locally redundant storage. If I go ahead and scroll down, if I actually go onto the configuration, this is under settings over here. In terms of the replication, I can change it to either geo-redundant storage or read-only geo-redundant storage. Over here, I can't see zone redundant storage because there are some limitations when it comes to switching from one replication technique to another. There are ways in which you can actually accomplish this. But at this point in time, when it comes to our data like the Gentle Storage account, these are the options that we have when it comes to changing the replication technique. Right now in this chapter, I just want to actually go through data redundancy.
15. Azure Storage Account - Access tiers
Hi, and welcome back. Now, in this chapter, I want to go through the Access Tier feature, which is available for storage accounts. So if I go ahead and create a storage account, please know that this is also available for Deerly.com Gen 2 storage accounts. If I go ahead and scroll down and choose storage account, if I go on to the advanced section, and if I go ahead and scroll down here, we have something known as an Access Tier feature. Here, we have two options.
We have the "hot access" tier. This is used to frequently access data. And then we have the Cool Access tier. This is used for infrequent data access. We also have a third option known as the "Archive Access Tier." So this is good for archiving your data. This is basically a feature at the storage account level. This is available at each individual blob level. So if I go on to my containers, if I go on to an existing container, if I go on to a directory, if I go on to one of the files that I have here, I have the option of changing tier.
And in the tier, I have the Hot, the Cool, and the Archive Access tiers. So this is an additional tier that is actually available at the Blob level. At the storage account level, if I go back onto the storage account, if I actually go on to the configuration settings for the storage account, and if I scroll down here in the Blob access tier, the default is the Hot. I told you we could go ahead and select the Cool access tier.
So what exactly are these different access tiers that are actually available for this particular storage account? So when it comes to your Azure storage account, and I said that this is also applicable when it comes to your data lake, Gen 2 storage account, one thing that you actually pay for is the amount of storage that you actually consume. Now, here I'm showing a snapshot of the pricing page that is available when it comes to storing your objects in an Azure storage account. Here you can see the different access tiers, and you can also see that the price becomes lower when you are storing objects in either the code or the archive access tier. In fact, it's very low when it comes to the archive access tier. And when it comes to a data lake, remember that I mentioned that companies will store lots of data.
So you're probably talking about terabytes and even petabytes of data in a storage account. And storage becomes critical at that point. The storage cost becomes very important. So that's why you have these different access levels in place where companies can actually go ahead and look at reducing their storage costs. If they have an object that is not accessed that frequently, they can actually change the access tier of that object to the cool access tier. And if they feel that the object is not going to be accessed at all but they still need to have a backup of the object in place, they can go ahead and choose the Archive Access Tier for that particular object.
And I mentioned that the Archive Access Tier can only be enabled at the individual blob level. So then you might ask yourself, "Why can't we just archive all of our objects?" Because the storage cost is lower, and because of a caveat that exists if you store an object in the Archive Access Tier, you must perform a process known as rehydration if you want to access that object again. So you have to rehydrate that file in order to access the file. So you must go ahead and change the file's access tier, either to HotAccess or CoolAccess, and it takes time to rehydrate the file. So if you need the file at that point in time, you should not choose the Archive Access Tier.
You should choose either the hot or the cool access tier. Next is when it comes to the pricing of objects in either the Hot, the Cool, or the Archive Access Tier. When it comes to the cost of your storage account, there are different aspects when it comes to the costing. One aspect is the underlying storage cost. The other aspects are the operations that are performed on your objects. For example, over here again, I'm showing a snapshot of the documentation page. When it comes to the pricing here, you can see that when it comes to read operations, the read operation of an object in the Cool Access Tier is much higher than an object in the Hot Access Tier, and it gets even worse for objects in the Archive Access Tier.
Next is a concept known as the "early deletion fee." Now, the Cool Access Tier is only meant for data that is accessed infrequently and stored for at least 30 days. If you have a block in the Cool Access Tier and switch to the Hot Access Tier before 30 days, you will be charged an early deletion fee. The same thing goes for the archive access tier. This is used for rarely accessed data that is stored for at least 180 days. And the same idea applies here. If you have a blob in the archive access tier and you change the access tier of the blob earlier than 180 days, you are charged an early deletion fee.
So when you're deciding on the access tier of a particular object, you have to decide based on how frequently that object is being used. If the object is being used on a daily basis, you should choose the Hot Access Tier. If you have objects that are not being accessed that frequently, you can go ahead and choose the cool access tier. And if you want to go ahead and archive objects, you can go ahead and choose the archive accessed here. Now let's quickly go on to Azure. I'll go on to my deal. Gen Two storage account I'll go on to my containers, I'll go on to my data container, I'll go on to my raw folder, I'll go on to my object, and here I'll just change it here to the archive access tier, and I'll go ahead and click on Save. So, remember that we are now saving money on storage costs for the file. But here you can see that this blob is currently being archived and can't be downloaded. You have to go ahead and rehydrate the blob in order to access it.
So here you are if you want to go ahead and access the file. Because even if I go on to edit, I will not be able to see the contents of the file. So I have to go back to my file. And here I have to go ahead and change the tier. I have to change the tier to either the hot or the cool access tier. If I choose either tier, you can see that the reminone has a rehydrate priority. You have two options: Standard and High. In Standard. The object will take some time to be converted back to the Cool access tier. You can see that it may take up to 7 hours to complete. If you choose "high," then it could be completed at a much faster pace. But in either case, it will still take time. So if you have an object that needs to be accessed at any point in time, don't choose the archive access tier. So I just go ahead and cancel this, right? So, in this chapter, we'll go over the various accesses to your storage accounts that the Blob service provides.
Prepaway's DP-203: Data Engineering on Microsoft Azure video training course for passing certification exams is the only solution which you need.
Pass Microsoft Azure DP-203 Exam in First Attempt Guaranteed!
Get 100% Latest Exam Questions, Accurate & Verified Answers As Seen in the Actual Exam!
30 Days Free Updates, Instant Download!
DP-203 Premium Bundle
- Premium File 379 Questions & Answers. Last update: Dec 16, 2024
- Training Course 262 Video Lectures
- Study Guide 1325 Pages
Free DP-203 Exam Questions & Microsoft DP-203 Dumps | ||
---|---|---|
Microsoft.testking.dp-203.v2024-12-03.by.florence.124q.ete |
Views: 346
Downloads: 425
|
Size: 2.59 MB
|
Microsoft.actualtests.dp-203.v2021-11-02.by.captainmarvel.105q.ete |
Views: 200
Downloads: 1266
|
Size: 2.51 MB
|
Microsoft.testking.dp-203.v2021-08-10.by.blade.64q.ete |
Views: 402
Downloads: 1441
|
Size: 1.73 MB
|
Microsoft.testking.dp-203.v2021-04-16.by.lucas.36q.ete |
Views: 650
Downloads: 1653
|
Size: 1.3 MB
|
Student Feedback
Can View Online Video Courses
Please fill out your email address below in order to view Online Courses.
Registration is Free and Easy, You Simply need to provide an email address.
- Trusted By 1.2M IT Certification Candidates Every Month
- Hundreds Hours of Videos
- Instant download After Registration
A confirmation link will be sent to this email address to verify your login.
Please Log In to view Online Course
Registration is free and easy - just provide your E-mail address.
Click Here to Register