DA-100 Microsoft Power BI – Part 3 Level 8 – Other Modeling and DAX Topics for the DA-1 Part 2
- Resolve many-to-many relationships – Different types of granularity
In this video we’re going to have a look at another reason why you might have a manytomany relationship and that is detailed data on both sides of the relationship. So if we go into our data and have a look at the other two spreadsheets which are marked M to M, which are actual and target, you can see what we’ve got in the target. We have have this company which operates out of Western Europe and it’s anticipating the amount of sales it will get in three different countries for whether it is in store purchases in a physical location or whether it’s Internet or some of the way through port, maybe people phone in their orders.
So we’ve got that for England, France and Germany. So we’ve got two lines for each. In the actual we have got the breakdown of the stores, the locations, so England, London England, Birmingham England, Manchester. So we got three England bases, we don’t know of their physical stores, we don’t know if they’re internet. There could be a combination of the two. In France we’ve got Paris and in Italy we’ve got Milan and Rome, Italy, but we had Germany in the target. So we’ve got a mismatch of the countries. So we’ve got England, France and Germany. And England, France and Italy. So let’s load that data and have a look at what happens when we join the model. So we have got these two tables and we can relate them using a many to many relationship with the country. Again, cross filtering direction both is quite good. So let’s see what we’ve got now.
So if I create a table I’ll just add the size to a bit higher but I can’t actually do that until I’ve added some data. So I am going to add the country from the target. So now I can make this a bit bigger. Going to grid. So we have got three countries in the target and we have the amount that they were going to get and the amount that they actually got. So we have 15,000 10,012 thousand making 37,000, that’s all fine. And then in the target we have got 23,004 thousand making 43. What’s missing is Italy. Italy is not there, italy is there in the actuals, it’s not there in the target and we have got the country from the target table.
If I was to deselect that and put in the country from the actual then we wouldn’t have Germany and it would be even worse if I put in both countries, we wouldn’t have either France, either Germany or Italy, we just have France and England. So you can see doing it this way, although it looks a similar solution to when we did the bank accounts. We’re actually missing data. We’re missing data that exists in one table but doesn’t exist in another. So what can we do? Well, we need to create a third table. This is called a Bridging table. It bridges just imagine a bridge going from one table to another.
And what I’m going to do is combine these two tables together as an append query. So I’ll append these queries as new. So now we have lots of additional columns which we don’t want. We just want the country. So I’m going to leave the country, I’m going to remove all the other columns and then I’m going to deduplicate. So that is remove rolls, remove rolls, remove duplicates. So now we have the duplicates removed. We just have England, France, Italy and Germany. So I’m going to call this my rich. So go to Clause and apply, go back into the model. We get rid of the relationship that’s there. And you can see the computer has already put in another relationship.
It’s dotted it out, it’s made it inactive. I want it to be active. We’ve now got rid of this other relationship. The reason why I was inactive is we had a relationship from actual to target, which is sort of the same as going actual bridge target. So we couldn’t have both of those at once. So I need to make this an active relationship. So now we have this bridge going to actual and going to target. Let’s go back into here. Can’t display the visual, it doesn’t know what’s happening. So I’m going to get rid of everything. But now in terms of the countries, but now I’m going to put in the country from the bridge. And here we can see the answer we really want. We want all of the countries that’s there, england, France, Germany and Italy. And we’ve got gaps for the actual and the target. But that’s absolutely fine. We want the gaps. We want to see missing data. And equally importantly, the totals now all add up.
So now what I’ll do to help the end user is to get rid of or at least to hide the country from target and actual. So we’ve now only got one country. We’re not going to have three country fields when we don’t need it. So you can see this is another reason for having a many to many relationship. We have got different types of data from the two tables. And the best way, even though we can use a many to many relationship to connect them, the problem with that is we’re missing then countries from one table to the other. So the best way is to have a bridge. And the great thing about how we’ve done this bridge, we’ve not just typed in a bunch of countries, we’ve actually made it dynamic. So if the data in actual or target changes, we now move into Norway, for instance, then this bridge will automatically regenerate and add in Norway. So this is another way to resolve many.
- Improve cardinality levels through summarization and by changing data types
In the past two videos we’ve had a look at how to deal with many to many relationships. Either you create a cardinality of many to many or you create a bridge that goes in between the two. But what other ways could there be? Well, let’s have a look at the data itself. So if we go into the actual data, we have got the country, the location and the actual. What if you didn’t need the location at all? Well, you could just summarize it. Now one way you could do that is here in the Get and Transform or Power Query Editor so we can group by. So we’re going to group this table by country and have the sum of the actual. So this is my actual by country.
So there we have got fewer roles to deal with and country is now a one. So if I go back into my data, everything gets refreshed. We no longer need the bridge because country which I’m going to unhide is now unique and I take this to the target as a one to many. Now this could be perfect not for the fact that we don’t actually have all of the countries in actual that there are in target. So if I just delete that, start afresh, put in the country, putting the actual and put in the target there we have a different outcome this time we do have all of the countries. At least there is a blank for Germany because Germany is not in the actual. So this is a way of improving cardinality though you do actually lose some of the existing data but it depends on what you want to drill down that might be perfect. Now let’s add a new table. Let’s add a target table. So we do that by going to table tools, new table.
And so I want this to be the target rolled up. So here’s my target and I’m going to have from this target table going to use the function called summarize columns that creates this summary table. So I want to group by the country and then I don’t want any more grouping, I want to have a new column. Notice this is a fairly complicated formula, there are three different types of syntax. I’m going for this syntax next, I want to create a new column called Total Target and that is going to be the sum of the target. So that gives us our target as a single table. So just to look at that again, I’m summarizing a table, I’m grouping by the country and then after I do all the group by so I might be grouping by lots of different things.
So I have all of the group by separated by commas and then I’m introducing a new column which happens to be the sum x of the target. And again there I can have my relationship between target and actual, as in this case a one to one it thinks and then what I could do is create a visualization based on that. One to one is right, I’m just always a bit iffy with one to one because we’re still missing a particular country in one of them, whichever table you take as the principle, we’re missing the country in the other. So here we have the total target and here we have the actual and again we’re missing something there, we’re missing Italy. So there’s two ways to do this summarization. You can either use the group by in the edit query which reduces your table downwards or you can create a new summarized table using summarize columns.
Now what other ways could there be? Well, you could theoretically change data types perhaps. So for instance, let’s say we had a primary key of AB, 123456. Well, we could split that into two AB and 123456, if that would be helpful in business terms, if it actually means something, because this is a string, whereas we are now dividing into a string of a small string and then a number, an integer in this case. And if that can be related, if that can be used as our primary key, if the prefix, say, is meaningless, it’s always AB, then we could change this so that it becomes an integer. An advantage of this is firstly, suppose we had another table that just had 123456 as the primary key or the foreign key other side of the relationship.
Well then we can more easily have this string be related to this integer. If this first bit is meaningless we could truncate the string and just have an integer to integer. Secondly it saves memory if we go from a wildly expansive data type like string to something which is very narrow. Integer just takes a few bytes where strings could take a huge number of bytes that could actually increase performance the smaller the data type is. And then finally this topic is improve cardinality levels. What are the different levels? So we’ve seen all of the types one to one, many to one, many to many. But what are the actual levels? Well, this is an unofficial list but basically you’ve got three major types. First of all you have high cardinality, this is where you have no duplication. So these are your primary keys, these are your one side of a relationship.
So you are looking at things that highly specify that particular role. So it could be an ID or it could be a combination of columns. The cardinality, a high cardinality means that they’re not duplicated or they are little duplicated if you’re thinking about higher cardinality. But the very highest cardinality is there are no duplicates. The very lowest cardinality is when you’ve only got a limited number of values. So true false you’ve only got three values true false and unknown or null or nothing. You could have age of schoolchildren that’s only going to be within a narrow range of five to 18, five to 16, and then everything else with names, for instance, street addresses, we have your normal cardinality.
So just imagine trying to have something which is low cardinality on one side or the other of a relationship that’s going to have huge number of roles. Let’s say you’ve got a million rows on a table. Well, with low cardinality, you’re probably going to have half a million rows for each value. Well, if high cardinality, you’re only going to have one row per unique value. And so when you do a relationship, then it’s one row to however many rows on the other side. But if it’s low cartinate, it’s many rows nearly the entire table. And this might impact on the methodology that the engine behind Power Bi has to actually do your analysis. In the next video, we’re going to have a look at how to identify poorly performing edges, relationships when there’s a cardinality and whether your visuals are taking a long time.
- Identify poorly performing measures, relationships, and visuals
Well, congratulations. We’re now three quarters of the way through the cost and you have now had a look at modeling, especially using Danks. So we’ve looked at how all of these tables, these queries that we’ve got from Get and Transform, how we can relate them together. So how we can do joins or maybe we flatten out a hierarchy to create one bigger table by using. We’ve looked at how we can develop a data model and we’ve looked at optimizing model performance.
Now, creating measures by using DAX staff has taken a fair amount of this part because it says use DAX to build complex measures and use Calculate to manipulate filters. Those are not trivial DAX functions. And so we have had to spend a fair bit amount of time with this. But have a look at all of these items and hopefully you will say I feel confident being able to do a huge number, if not all of these items. Right, let’s go. We have one part left, part four, all about the Power Bi service.
- End of Part 3
Well, congratulations. We’re now three quarters of the way through the cost and you have now had a look at modeling, especially using Danks. So we’ve looked at how all of these tables, these queries that we’ve got from Get and Transform, how we can relate them together. So how we can do joins or maybe we flatten out a hierarchy to create one bigger table by using. We’ve looked at how we can develop a data model and we’ve looked at optimizing model performance. Now, creating measures by using DAX staff has taken a fair amount of this part because it says use DAX to build complex measures and use Calculate to manipulate filters. Those are not trivial DAX functions. And so we have had to spend a fair bit amount of time with this. But have a look at all of these items and hopefully you will say I feel confident being able to do a huge number, if not all of these items. Right, let’s go. We have one part left, part four, all about the Power Bi service.