DA-100 Microsoft Power BI – Part 1 Level 8: Other Visualization Items for the DA-100 exam Part 2
- Use groupings and binnings
In this video, we’re going to group together values. And there’s two ways of doing this depending on whether you’re grouping together numerical or time data or whether you are grouping together text, for instance. So let’s create a very simple chart. So this is going to be a clustered column chart.
And we’re just going to have region name in the axis and sales volume in the values. So we’ve got Greater Manchester, West Midlands, West Yorkshire, Merseyside, South Yorkshire and Tyne and Wear. Okay, I have a hypothesis and I have an idea that all of the regions with west in their title will be better than all of the other regions. So what I’m going to do is I’m going to select these two regions by clicking on one, holding down control and clicking on another. That allows you to select individual items holding down the control. Now, I’ve got these two selected. I’m going to right and click and go to group data. Now, notice what happens. On the right hand side, we have a region names bracket groups. It adds it into the legend, so a different color, but that’s not necessarily what I want. So what I’m going to do is I’m going to remove region name and I’m going to drag region name groups up from legend to axis.
So now, if we have a look at the data and in fact, I’ll just turn this side on, we can see that there is West Midlands and West Yorkshire and other and you can see I was wrong with my idea, at least in total, totality. Maybe if I turn this back and put this into the legend and I put a year in the axis, maybe I’ll be right some year. And no, I’m not. We’ve got West Midlands and West Yorkshire being below all of the other items. Now, let’s have a look at this group. I’m going to click on this little arrow and go to edit groups. So here we can see the group, we can rename it, we can also rename it here if I double click on it. But we can also see what field it’s on. We can’t edit that, we can also see the group type. It’s a list. And we can see what groups we have. We have West Midlands and West Yorkshire.
Which includes, obviously, West Midlands, West Yorkshire. Those are the two that I clicked on. And then we have an other group and you can see include other group and that contains everything else that’s ungrouped. So maybe I don’t want the other group and I want to create a new group. So I’m going to highlight all of these by clicking on the first one and holding down shift and click on the last. And I’m going to click group and say the others. So this will do exactly the same thing, except I am manually doing it. So here are the others. Now, maybe I’ve got that hypothesis wrong. Maybe it’s not West Midlands and West Yorkshire. I’m going to ungroup all of these. I’m going to include the other group.
And instead of West Midlands and West Yorkshire, I’m going to say all of the Yorkshires. So I’m going to ungroup West Midlands and I want South Yorkshire to be part of West Yorkshire. So I’ve clicked on South Yorkshire, I’ve clicked on the group, west Yorkshire and I’ll group it together. So now we have the Group of South Yorkshire and West Yorkshire.
So now let’s see how these two are. And you can see even bigger disparity. But this just allows me to see what would happen if everything was combined. If I want to rename any of these groups, by the way, then I can double click on them and so I can say the Yorkshires, for instance, click. Okay, so we have the Yorkshires instead of the individual South Yorkshire and West Yorkshire. Now, the other type of grouping can be done with numerical values, and this is technically known as binning. So we put all of the numbers into various bins. So let’s give an example.
Let’s have the clustered column chart. And I’m going to have the sales volume in the axis and the sales volume in the values, except what I want is not the sum, but the count. So I want to know how many different instances of each of these sales volumes there are. You can see generally there’s one or two. There’s quite a few of zeros, because we have got the last two months all being zero sales volume, we don’t have the data at the moment.
And there’s six regions. Two times six is twelve. However, I’m more interested not in each individual figure. I’m not really interested in the sales volume 961, I’m more interested in ranges of figures. So I want to group together the sales volumes. So if I click on the little down arrow and go to new group here, we can see that we have a bin type. So we are able to do lists just like the previous one. But bins were used for numerical date fields. So you can see that we have a size of bin or number of bins. So we can either say, I want 24 bins, or you can say, I want each bin to be a certain size. So I want each bin to be not in bin size of 263.
So zero to 263, 263 to 526. That doesn’t work in real life. Let’s make it around 500. So we can see the minimum value of zero. Maximum value 6332. So in bins of 500, we can see a much more orderly possession. So we have the highest sales volume at around the 1500 ranges is 1500 to 1999, and then decreasing thereafter. And we can alter the bins, edit them by clicking on the drop down arrow and going to edit bins. So you can say, Actually, I want to see what it’s like at 400 bin size and see whether that gives me any more particular data or analysis that I might choose. So grouping you can group together textural information by clicking on them, holding down control and then you can right and click and go to group data. Alternatively, you can select a particular particular field and go to new group and then select whether you want a bin or a list.
- Use the Key Influencers to explore dimensional variances
So, so far we’ve got quite a lot of information about what happened. So we know that for this particular point we have a sales volume of 5400 and an average price of 135,000. This is called descriptive analytics. It tells you what has happened in the past.
So you can see what has happened in the past in terms of year and year sales compared to the previous year. This doesn’t tell me why and that is not descriptive analytics, but diagnostic. So we’re going to try and find a root cause for variations in the sales volume. And we can do this using the key influencers visualization. Now, the text can be quite small on this, so make sure you blow it up as much as you want. So there are three various fields that we can put in analyze. So this is the metric, the measure we want analyzing. So I want to know about sales volume. Now, what are going to be the key factors in sales volume? Well, I’m going to say it could be the year, it could be the month. And let’s have a look we’ve got what influences sales volume to increase? Well, the start of the new millennium when the year is between 2000 and 2007, the average sales volume increases by 1230 on average.
And you can see that the computer has done some bins, they’ve grouped together some of the values and here we have at 2000, 2000 or below 2000 to 2007 and some other year values and comparing them with the average. So we’ve seen how we can do all of these lines in our own visualizations. But here the computer has just done it for us automatically. So what about what causes sales volume to decrease? Well, we’re after 2008 or we’re in the period 2007 to 2008, or the month is January. So sales volume decreases by around 648 units compared to all of the other values of months. So it could be February, March, et cetera. But January is particularly lower or when the year is over to the year 2012. So here you can see an analysis. So you can just hover over this analysis and the computer gives you some information.
Now, top segments. So this is what happens when the computer divides up the data into various segments. So we’ve got two segments here, one with 288, we’re also 360. So let’s have a look at the 288. I’ve just clicked on it and this is a segment of in between 2009 and 2012. And you can see that the sales volume is 939 units lower than the overall average. Similarly we have one where it is less than or equal to 2008 or is greater than 2012, so not in that period and year is greater than 2007. So it defines it fairly oddly. And then what do we find? Sales volume being high in or we find it high in, greater than 2000 and less than or equal to 2007. Now we can then experiment, we can say well maybe the region name has something to do with this and we can see that region name is Greater Manchester. That influences sales volume to increase by an average of 1. 3 thousand units and region name being West Midlands or West Yorkshire also does and what causes it to decrease the other regions. But again, we still have the month of January being a factor and we have Merseyside, South Yorkshire and Thailand. We are being factors.
Now that we’ve got more than one item we can click on each of these to get a different graph. Now suppose we didn’t want a region name to be considered at all as an influencer. Well we could just get rid of it and we go back to year and month. The problem with that is now we have got our sales volume grouped by year and month. When we had it grouped by region name we had six times as many groupings, six times as many rows for the computer to consider. But if I drag it into the explained by it considers it as a key influencer. If I want to have it grouped by region names, have the data grouped by region name thus giving the computer more data to use. But I don’t want region name to be considered as a key influencer, I can drag it into the expand by. However, there is a problem when analyze sales volume is not summarized, it always runs at the row level. Well, I currently am not summarizing sales volume. Suppose I now summarize it by some and this then groups.
So what we’re having is the table. So if I just have a table that has the sum of the sales volume at one level and then by year, then by month well that gives me all of that data but now I want it to be also by region name as well so that gives me a lot more data. You can see that instead of just having one data point for 1995 January I now have six. But I need to have this sales volume aggregated, I need to have it in some sort of calculation of sum or average or min or max. So if I change this analyze from don’t summarize which therefore means it looks at all of the rows to sum then it has the equivalent of this table. It’s not considering region name as an influencer but it is taking it into account when it looks at the granularity. If I didn’t have this then it would be like giving the computer this much detail to have look at. If I do drag in region name I’m saying don’t use it as a key analyzer but I do want you to use it when you’re looking at the amount of data that you can investigate. So this is the key influencers visualization. It allows you to see what factors might influence a particular measure and have a look at the relative importance of these are some more important than others.
Now, just before we end this video on key influencers, let’s just have a quick look at the formatting. Now we’ve got things like the visual colors both for the drill and for the analysis. But in the analysis section near the bottom there is enable counts. Now, what this does have a look at the circle at the outside of the circle. It allows you to see how many rows are affected by any single statement. So if I spotlight on this, you can see this particular circle is around a quarter of the data, this very little. So you can see which one has more impact over the entirety of the data. Now the count type, absolute and relative. If you change it to relative, then the number of items on the page, which is the biggest, will be shown as 100% and everything will be shown as a percentage of that. So an absolute term terms, this is only a quarter of the data, maybe even a fifth of the data. And so this is a much smaller fraction in relative terms. Out of all of this data, this is a quarter of that data. So this is probably the most interesting of the options that you’ve got available here in the format section for key influencers.
- Use the decomposition tree visual to break down a measure
Now in this video we’re going to break down a measure. So a measure we’ve got, for instance, is the sales volume. So I’m going to insert a table, put it over here on the right hand side and this is just going to have the sales volume. Now we can break this down, for example, in years. So I’m going to get the years, I’m going to put it at the beginning so that breaks down this one overall measure of 3. 75 million into various years of these 22 years. We can also break it down then by month. So now we’ve broken it down into 22 years. Twelve months, that’s 264 months. So this is one way of breaking down a measure. And we can also, if I just get rid of year perhaps and just have a month, have this as a graph as well. So I’ve just copied and pasted it and I’m now going to change this into a small graph. Now we can break down the measure additionally using another visualization called the decomposition tree. So, sounds a little scary. Don’t worry, I’ll show you what it does.
So here we have our decomposition tree visualization. And just like the previous one, the key influences we have got analyze and explain by. So I want to analyze again the sales volume. So I’m going to drag that measure into analyze. And you can see we start off with a sales volume of 3. 75 million. Now, what factors could explain a sales volume or what breaks it down? Well, let’s put in the year, let’s put in the month, let’s put in the quarter. All of these different dates could explain it, perhaps. I’m going to put in months above. I don’t want it to be part of the hierarchy. I want it to be an individual value by itself. I think also what might explain it is region name that could explain it. So here are some fighters.
Now I’m going to break down, I’m going to expand this sales volume and I’m going to click on this plus, and it says, well, how do you want it to expand? Maybe you want it to expand by year. And now it will show all of the years with this bar being relative to the maximum year. It’s the highest number of sales. So you can see 20 07 20 06 20 04 20 03 2002. We had roughly the same amount of sales. And you can see real dips there. Okay, I can click on one of these 2008. Now notice what happens, it filters it down. So we had on these visualizations, 3. 75 million clicking on this. We now just have 123,000, so it can filter down the other visualizations. Here you can see it’s been used as a highlight. Well, let’s say I want it to be a full filter.
So I will go to format edit interactions, click on this decomposition tree. And I’m going to say I want this to be a filter on this particular one. So now when I click on, say, 2008, we don’t have a small bitch on, we have the entirety of the graph changing. Okay, how do we want to expand this? Well, let’s expand this just this value, 2010 into quarters. So here we have 2010 expanded into quarters. And again, you can see how it changes. And now let’s expand this second quarter into region names. I don’t have to do it in any particular order just because I’ve put explain by as year, quarter month, region name. I don’t have to expand it like that. This isn’t a hierarchy I’m doing. So I’m expanding it into just this 2010 quarter, two into region names. And again, I can click on any particular one.
Or maybe I don’t want quarter anymore. I can remove the level. And now I’ve got 2010 into all of these different region names. Now, if there is an explainer that you want to remain, you can put the lock symbol next to it so there’s no X next to it, so I can’t accidentally delete year. I have to unlock it first, and then I can delete it. So now I’ve got the sales volume just broken down by region name. And that’s pretty much all the decomposition tree is. It allows you to put in a measure, and it allows you to drill down, and you can put in as many different explainers as you want and then say, actually, I want to not explain by that, but I want to explain by something else. High value. That then shows you what of these has the highest value. So at the moment, quarters would have a higher value than, say, years, because each year would have a smaller amount of sales. Each month would have a smaller amount of sales.
So none of these would have as high a number of sales as a quarter does. And then equally, I could say low value. So this chooses the more granular thing than here. The granular thing is 2009. In the years that’s your lowest value out of all of these explainers. So the decomposition tree, it allows you to get a particular measure, in this case, sales volume. And it allows you to explore, it allows you to see perhaps why the sales volume is what it is. And it also allows interaction with other visualizations. So you can see it just a bit of a filter, but it’s also a tool for you to explore and drill in and then say, no, I don’t want to drill like that. I want to drill in with something else.
- R and Python Visualizations
In this video, we’re going to have a look at adding R and Python visuals. Now, if you don’t know what R is, if you’ve never used Python in your life and you want to take the DA 100 examination, don’t worry, you don’t need to. In fact, a full investigation of R would be a many hour course and a full investigation of Python would be many course and we don’t have to go into any sort of detail for this course or the exam. Now, if you are using R, then you can add your R script from here.
However, you cannot run an R script without some form of engine on your computer. Now, R, just in case you don’t know what it is, it’s a language for statistical computing and graphics. It supports statistical analysis and data science. Now, if you go to File and Options and Options, you’ll see on the left hand side we have R scripting. So this allows you to choose where your R engine is and you’ve got a hyperlink there how to install R and also choose where your IDE, your integrated development environment such as RStudio is. And again you’ve got a hyperlink if you want to learn more.
So once you’ve got all of that installed, then you can click on the R button. Now you will need to enable script visuals because this is a code that may contain security and privacy risks, as most codes do. If I click enable, then the screen changes. So you have your R script editor at the bottom and you can see it says drag fields into the values area to start scripting. So I could drag region name into there for instance, and it will start creating a data set frame and so forth. So you have a run script that will not be enabled if you haven’t got an engine. You’ve also got some script options and you can also edit your script in an external IDE such as RStudio.
So don’t need to go any deeper than this. For this particular exam, as I say, you will not be required to know R. R is at least at the beginners to hit quite a tricky language to start off with. You need to have everything in the right case. So it’s very case sensitive, but you can get some interesting visualizations. You can expand the language using packages. So if you are interested in data science or a different level of statistical analysis, then you might want to investigate that. So if we get rid of this and have a look at Python, so again, Python will require some sort of engine to process this. So if you go into file options options and go on the left hand side, you will see again we were looking for a home directory for Python and again we have hyperlinks for how to install Python as well as your IDE environment. Unlike all, there are quite a lot I would say, of IDE environments that you can choose from.
Now, what is Python? Python is a programming language which can be used, among other things, for developing desktop applications, websites, web applications. So there’s lots of functionality that you can import into Power Bi. And again, we’ll click on the visualization. If this was your first time, you’d be asked to enable the script visuals and again you would drag any values through. And we have the start of your data frame and the same buttons here. So R and Python allow you to extend programmatically in different ways your analysis. So if you’re familiar with either one, then you can use Power Bi to do the end visualizations for your R or Python scripts.
- End of Part 1
Well done. You’ve made it to the end of part one. All about visualizing data. Now, on screen you can see all of the requirements of the DA 100 exam. So you should have a look at all of these items that are in yellow and show yourself. Yes, I can do all of these. So, looking at the top section, for instance, creating reports, we spent a lot of time on various visualizations, so you should be able to choose appropriate types and format and configure it, design it for accessibility, apply slicing and filtering and all the rest. Enriching reports for Usability.
You’ve seen how to configure BOOKMARKS, configure interactions between visuals, do sorting, configure sync slices, export report data and all the other things that you can see there. At eleven we are enhancing reports to export insights or conditional formatting slices and filtering. Again performing a top end analysis using a Q and A visuals, creating reference lines, and then at twelve, performing an advanced analysis. So this is largely what we’ve been doing in the previous section. So identifying outliers using groupings and binnings and key influences and decisions, composition, tree visuals. So take a moment, just have a look at this list, just to say I have learned all of this and then take a deep breath and I’ll see you in part two.