DA-100 Microsoft Power BI – Part 3 Level 1 – An introduction to DAX functions, including
- Welcome to Part 3 – Modeling and DAX functions
Hello and welcome to part three of this course, which is all about modeling the data. And you can see in blue the requirements of the DA Dash 100 exam. And I think it’s a good template to follow. So we’ll be designing a data model, we’ll be developing it, we’ll be creating measures by using DAX and we’ll be optimizing the model performance. So what, what is this section all about? Well, we have got the data, we have transformed the data. However, there may be additional bits you need to tweak. For example, you may be able in Get and Transform to add a column to every single row with a calculation. However, what if you wanted to add a measure that’s not necessarily adding something to every column and then adding it up? And if it were, that could be really bad performance, because then let’s say you’ve got a billion rows, you’d have to add a billion values and then do a calculation on those billion values.
Those billion values would have to be stored permanently in memory. Whereas for a measure, all you’d have to do is do the calculation and the computer can speed that up because it may go actually, there is a quicker way of doing this. Then we’ve got all of these different tables, all of these different queries that you’ve got, well, they may need connecting together, so one table might need to be joined to another table. So, yes, you could do some of that and Get and Transform, you can merge, you can append, and that’s good if you wanted a formal state, if you actually wanted one table. But sometimes it may be better to have it in lots of individual tables and then join it together only when you need them. Sometimes you might just want one table and not everything else.
So we’ll be looking at all of the joining between the tables so that becomes a model. And then we’ll be looking in detail at the DAX language to create measures. So you just have a look at what it says. Use DAX to build columns, complex measures. You can imagine that that can take some significant time. So, let’s start part three, all about modeling data.
- Get multiple data sets, and connecting them together
Hello and welcome to part three of this course. So, part one was all about visualizing data, creating all of our visualizations. Part two was about Get data and all of the transformations you can do, including get the M formulas. Well, part three is about adding modeling. So we’ve got the data, but we need to make some adjustments or we need need to correlate multiple data sets together. And some of this will include using calculated columns and calculated measures. And for that, we’re going to use yet another language called DAX. DAX. So previously we had M formulas for Getdata, which was like text dot something, or DAX formulas uses its completely own language.
Unfortunately, now there is a bit of overlap between the two. Both of them are transforming data. So, for instance, in the Get data, you could use a calculated column to add some measure, but you can also do that and sometimes more effectively in the modeling section using Danks. So, with regards to the certification, we have done in part two, the consuming and transforming data by using Power Bi. So we connected to various data sources, we performed transformations and we cleansed the data. In part one, we did the visualizing data. So we were looking at things like creating hierarchies and creating and formatting interactive visualizations and using KPIs gauges and cards. Well, this part is all about this first section, creating and optimizing data models, and creating calculated columns, calculated tables and measures, including DAX formulas.
So, in this section, we’re not going to be continuing our look at the M language. We’re going to assume that the data that we have got, we have used as much of the Get data transformations as we want. Now we’re going to be using one data set for this particular part, or rather, may I should say, four data sets. So let’s get those data sets now. So, get data from Excel, it’s our source, the Power Bi data that you can download from way back at the beginning of this course.
And what we’ll be using are these four with dim and fact at the beginning. So, fact is a series of measures and dims are a series of descriptions about those measures. So I’m going to click on each of those. I’m not going to transform the data. I’m going to assume that what we’ve got here is basically what we need and click on Load. So all of that data is coming into Power Bi. And you’ll see it on the right hand side with all of these fields. Now there is some connections, or there are some connections already within the data. For instance, I could get the category name and get the subcategory name. And you’ll see here, for instance, we’ve got the categories and the subcategories successfully joined together. I’ll show you how the computer has done that automatically in just a moment. However, there is a bit of discontinuity as well.
If I go to fact Internet sales and I click on the sales amount, you can see the computer has no idea how to connect Fact Internet Sales with these two dim product categories and Dim Product Subcategories. So now let’s take a quick look at the data that we’ve got. And we’ll go to the data icon on the left hand side. So here is the dim product. So this tells us information about a product. So here, for instance, is the product name dim category. Very short table, just translating a product category key into a category name, similar for the subcategory and then Internet Sales. This shows the nitty gritty of what people bought. Now, if we go into the model, this shows the four tables. And you can see at the bottom we’ve got zoomed. So we can zoom in and out and you can see the disconnect.
Dim product is joined to fact Internet sales. Dim product category is joined to Dim product subcategory. It’s successfully worked those out. And you can see that there is a number one and an infinity, a star sign. What this means is that there is one Dim Product which is used multiple times, could be zero times, could be one time, could be more than one time, in fact Internet Sales. And similarly there is a category. One category is used multiple times. So if I go to categories, there is a one category called bikes with a product key of one which is used multiple times in subcategories. So here we can see that it is used for these first three items. So that’s what it means from going one to infinity. So we need a link between dim product and dim product subcategory.
So make sure both of those are on the screen and the link is the product subcategory key from Dim Product needs to go to the product subcategory t key of Dim Product subcategory. So to make this link between the two, go from one direct to the other. And there is our link. Now it looks a bit of a mess. So let’s now get this in order with fact internet sales on the left hand side, dim Product Next, dim Product subcategory next and Dim Product category at the end. So now if I was to go into the data and if I was to add again the sales amount or it’s already there, you can see it is calculated correctly. So previously it was just telling me all of these categories had each 29 million. Now it’s telling me the right amount. That’s because our model is connected together. Now, in the next video, what we’re going to do is have a look at this particular arrow and see why it is so important and how it can give you the wrong information, the wrong analysis.
- The problems with direction of relationships between data sets
So what is this arrow between two different data sources? What this shows where the data flows, the analysis flaws. So in this case it’s going from the Dim Product to Fact Internet Sales and it always goes from the one to the infinite. Now highlighting it also highlights the keys that are involved. Or you’ll notice the one is placed nowhere near the product key. Neither is the infinity sign. Now if I double click on it, you can alter edit the relationship and indeed if you wanted to add a new relationship, you would get a similar blog box if you go to modeling manage relationships and then you can click on Edit or you can click on New. So let’s have a look at this particular one. So it goes from fact, internet sales and product key. We’ve highlighted Product key and Dim product and we’ve highlighted Product Key there and it’s a many to one cardinality. In other words, on the one side it is guaranteed that there is only one version. So there is only one product number two, but we might sell it multiple times.
So for instance, product ID five two eight all exists one time in Dim product but we sell it multiple times. Make this relationship active. In DAX model you can only have one active relationship between a pair of tables. So between A and B you can only have one active relationship and this relationship can only be on one set of columns. So I can’t highlight four columns for instance, and have a relationship like I can in SQL Server for instance. So if you do need to have a multiple a compound key, then you’ll need to create a calculated column that combines it together and then use that. It’s this that I really want to talk about in this video. The cross filter direction, it is defaulting to single and the reason it defaults to single it flaws from the Dim product to the Fact Internet Sales is because that is a lot more efficient in terms of the actual calculations of the visualizations and most of the time you won’t need it to go anything else. You won’t need it to say both. But let’s have a look at an example. When you do so, we’ll look at two examples, one that works and one that doesn’t work.
So let’s go into the report and we create a new visualization with Dim product category and Dim product subcategory. So I’m going to click on the English product name and I will focus on this particular visualization and then I will go down to well let’s go down all the way to Fact Internet Sales and I’m going to ask for the total amount. So the sales amount. So click on that and you can see it works. And the reason that it works is because the data is flowing the correct direction. So we start off in category and we’re flowing down. Look at where the arrows going towards the fact internet sales and then doing calculations based on fact internet sales. Now of course, we didn’t need to flow all the way to the end. We could have done calculations based on subcategory or dim product. So for instance, I could go into dim product, dim means dimension by the way, and ask for the total weight if that was a measure that was being used. So that’s not the total weight, that’s each individual weight.
So I’m just going to count it and you can see it works. It’s not giving me the actual weight, it’s just giving me how many products essentially there are. But again, the reason it works is because we are flowing it down. So we start with anywhere. We could start with subcategory but as long as we are flowing in the right direction, this will work. So rather than do account of weight, what I’m going to do instead is account of the number of product IDs. So if I just click on products key you can see it’s giving me each individual product key. But if I click on this drop down and got a count, it’s now counting the number of product keys, there are 606 in total, of which 209 don’t have a category, 35 in accessories, 125 in bikes and so on. Right, so let’s clear this visualization and start again. So we’re going to have a visualization that starts off in fact internet sales which is all the way on the left.
So let’s take something about fact internet sales and let’s take for instance the order date. So I’m going to add the order date into the field, I’ll just click on it, there we go. And then I’m going to go from fact internet sales to dim product and I’m going to get the product key. And here we can see for instance, on the 1 July 2005 we sold product key 310, product key 336, product key 346. So, so far so good. But what I’m going to do now is count the number of product keys. So there are three on the first day. So I click on the drop down, go to count and you will see that actually there are 606 on every single day and there is a total of 606. That does not make sense. And it says if there is no connection between these two and in reality there is a connection, it’s just that it’s going the wrong way.
We are trying to go from product key or the fun fact internet sales and do an analysis of dim product and that is not allowed with the direction flowing the opposite way. So how to get around this? Well, there are two ways. The first way is to change this cross photo direction from single to both. And now if we go back now we can see that on the 1 July 2005 there were three different product keys. However, this is a major thing, changing the directional relationship to both. And now you can see the arrows going in both directions can really have negative consequences for the performance of your model. So if you find that there is a it’s going slowly and you don’t know why, have a look and see whether there are any cross filter directions which are on the ball side. And ideally, what you should do is change your model accordingly. For instance, in this version, I could instead of using product key in Dim product, I could use product key in fact internet sales. It is in fact a particular field. So let’s change this back to single. So here we have the wrong analysis, but you can see very easily it’s the wrong analysis and we’ll change that to Product key within the Fact Internet Sales.
And now you can see that it is working. And you’ll also notice, incidentally, the answer is a bit different. We had three product keys, but five over here in Fact Internet Sales. So what is happening is, if you have a look at the data now that doesn’t work, that just gets me literally where we are. We’re in a table already. If we go into the data tab for Fact Internet Sales and we have a look at that one particular date, you will be able to see when I go over to the product key, that there are five items that were ordered, but of them only three were unique. It’s the far left hand side. So here we have three, four, six being ordered by three different people and then 143-10-1336. So Dim Product will give us an answer free because there were three products there’s, three links as it were, going from here, because it’s product three, four, six, maybe three times. But according to Dim Product, there’s only one product called three, four, six, but five items were sold.
So this is an interesting example to show that if you have the multiple fields in multiple columns, multiple tables, be very careful about interpreting the answer. If you wanted to say oh, I can go on, there were only three different items, then you can change the count from count to count distinct and that will give us the free answer. However, if you want to count how many actual items there were, then you can say there were five. There is a second way of getting the answer from a related table and that’s using a DAX function called cross filter, which is fairly advanced. But later on we’ll be looking at filter and value functions in this part of this course. So data flaws from one table to another in a particular direction and you need to bear that in mind when you are building your model and only override it when you absolutely need to. And advantage of using filter functions is that it will only be necessary for that one particular column. So if I was to change this to both then it does in both directions for all of the columns. And it really can degrade performance enormously. So design your model so that the data flows one particular direction where possible.
- DAX functions – A useful Resource
Now that we’ve got our model set up, we will now be creating calculated columns in this section and calculated measures. But as I said previously, it uses a different language and it’s called DAX. DAX. So what I’ve done is attached as a resource to this course. Right near the beginning is this spreadsheet and it’s a list of the DAX functions and there’s around 200 and 2230 of them. Now obviously going over this many functions is going to be difficult, if not impossible.
So what I’ll be doing in the next few sections is taking a category at a time and then talking about a particular element. For instance, I could be talking about all of the rounding functions. So for information on any particular function, you’ll see that column B is hyperlinked.
So if you click on it, you will get information about this particular function and how you can use it. Don’t worry about any headers you might see about no longer updating this content. This information is still relevant. Now this is the official Microsoft documentation.
So what I’ve done is I’ve taken the description and put that into column D and I’ve taken the syntax how you can actually use it and put it in column C. Now I’ve tried to reduce the amount that you’ve got to read by not swamping you with obvious stuff. For instance, if I describe what the closing balance month is and you’ve got the description, then you probably can work out fairly easily what the closing balance quarter and closing balance year are.
So if you see some blanks, that’s probably what I’ve done there. So I recommend that you download the spreadsheet and we’ll be going through the various functions in the remainder of this part of the.
- Calculated columns – an introduction
So where can you use all of these DAX functions? Well, there are two places you can use them and I’m going to introduce one right now. And this is a calculated column. So let’s take an example of a calculated column. Let’s suppose that this sales amount is in British pounds and I want another column which shows the sales sales amount in US dollars. And I’m going to assume that the exchange rate has been very constant and is around $1. 60 to the pound.
So what I can do is create a new column. Now there are several places to do it. First of all, in the modeling menu you have new column. Secondly, you can right and click on a column and you’ve got new column. Thirdly, you can click on the dot next to a field and got a new column. So that’s the case whether you’re in the data or whether you are in the report with the visualizations, you still have access to two of those places. So let’s create this new column.
So however you do it, new column column goes on the right hand side and the first thing it says, okay, this is your column. What’s it going to be called? Well, I’m going to call it sales amount USD. So it is equal to and it’s going to be equal to the sales amount. Notice what happens as soon as I type in the letter S. It’s giving me the functions, the autocomplete and it’s giving me the relevant columns that I can use. So I’m going to click on sales amount.
So you notice it’s called itself Fact. Internet sales were throwing to the table and then a hard bracket sales amount. So now I’m going to multiply it by 1. 6. Now at the end I can say no, I don’t want to do any of that, but I’ve just done cancel it or I’ve got a check mark to commit. So once I click on that there you can see the Sales Amount USD.
And if I scroll down you can now see that it is included in the fields and there’s a little icon next to it saying that it is a calculated column. Now because it’s a field, it’s got all of the properties that a field can have. For instance, you can change the default summarization. Some sounds fairly reasonable and if this was a data category you could also change that, but you can also change the formatting. So I’ll put this in US dollars for instance, and I’ll change the sales amount to British pounds just to distinguish them.
And then if you wanted you can then go to the report and you can create a visualization based on that. So if I drag in US dollars so that’s the total amount of US dollars brought in on that particular day. And just one more note, if you want to rename it, you’ve got the same option as you have with any column so you could go to the dot, dot, dot and click on rename. You can double click on the column and rename it there and you notice it renames it in the formula, which is the other place that you can rename it so you’re not stuck to the first version of the.